Navigation

Step-by-Step MarkLogic Processor Guide

Initial Setup for Guide

See the Getting Started page for instructions on setting your environment up for this tutorial.

Review MarkLogic Database State

In order to view the status of the MarkLogic database, we’ll be using the MarkLogic QConsole. You can read more about the tool in the Query Console User Guide.

Go to the MarkLogic QConsole at http://localhost:8000/qconsole and run the following script.

'use strict';
                    let collection = 'iot-data';
                    let collectionQuery = cts.collectionQuery(collection);
                    [
                      `${collection} count ${cts.estimate(collectionQuery)}`,
                      'Sample of data',
                      fn.subsequence(cts.search(collectionQuery), 1, 3)
                    ];
        

You should see output similar to the following.

["iot-data count 0", "Sample of data", null]
      

IoT Example Dataset

For this step-by-step guide we’ll be using compressed aggregate JSON data. The data is a mock-up of IoT power data from different devices. The data can be downloaded here.

Go to the NiFi interface at http://localhost:8080/nifi.

Setup MarkLogicDatabaseClientService

In the NiFi UI, click the cog in the lower left of the screen to configure the NiFi flow. In the new modal select the CONTROLLER SERVICES tab. Click the + button on the right and filter for the type DefaultMarkLogicDatabaseClientService and double-click the Controller Service Type in the table results.

Now click on the cog next to our new Controller Service and go to the PROPERTIES tab. Now ensure MarkLogic credentials are added and the other properties are set to their expected values.

Click the APPLY button and then select the lightning bolt next to the Controller Service to enable it.

MarkLogic Controller Service Setup

MarkLogic Put Processor

First, we’ll step through using the PutMarkLogic processor. The following are detailed steps. If you’d like to skip through the detailed setup, you can import the NiFi template and fill in the key following key information:

  • Folder location of the IOT-Data.json.zip in the ListFile configuration
  • MarkLogic credentials to the DatabaseClient Service associated with the PutMarkLogic Processor

Add ListFile Processor

Drag the Processor icon next the NiFi logo into the template grid. Filter for the ListFile Processor click the ADD button.

With the ListFile Processor on your grid, right-click the processor select Configure from the menu. On the PROPERTIES tab, set the Input Directory property the directory where IOT-Data.json.zip lives. If other files are in the same directory, you’ll also want to set the File Filter to the filename IOT-Data.json.zip to ensure additional files aren’t processed.

Add FetchFile Processor

Add the FetchFile processor to the grid and go to the processors configure screen. On the SETTINGS tab, select the check boxes to automatically terminate the failure, not.found and permission.denied relationships. Apply those changes.

Click and hold the ListFile processor and drag an arrow to the FetchFile processor. A modal will appear with details of the new relationship you are creating. Click the ADD button.

Add UnpackContent Processor

Add the UnpackContent Processor to the grid. Configure the processor so that the failure and original relationships are automatically terminated and set the Packaging Format property to zip.

Add a success relationship from the FetchFile processor to the UnpackContent processor.

Add SplitText Processor

Add the SplitText Processor to the grid. Configure the processor so that the failure and original relationships are automatically terminated and set the Line Split Count property to 1.

Add a success relationship from the UnpackContent processor to the SplitText processor.

Add PutMarkLogic Processor

Add the PutMarkLogic Processor to the grid. Configure the processor so that the failure and success relationships are automatically terminated.

Set the DatabaseClient Service to the Controller Service we created previously in Setup MarkLogicDatabaseClientService.

Set Collections to iot-data, URI Prefix to /, and URI Suffix to .json.

Add a split relationship from the SplitText processor to the PutMarkLogic processor.

For more details on the available properties, see PutMarkLogic Processor.

Run Ingest

Hold the shift key and click and drag to select all the processors on the grid. In the lower left select the Play button to start ingest.

After some time to allow the data to be ingested, go back to QConsole and run the script in Review MarkLogic Database State to see the ingested documents.

MarkLogic Query Processor

This section will cover the QueryMarkLogic processor. The following are detailed steps. If you’d like to skip through the detailed setup, you can import the NiFi template and fill in the key following key information:

  • Directory and Owner properties of the PutFile processor
  • MarkLogic credentials to the DatabaseClient Service associated with the QueryMarkLogic Processor

Add QueryMarkLogic Processor

Add the QueryMarkLogic Processor to the grid. Configure the processor so that the Query Type property is set to Collection Query, the Query property to iot-data, and set the DatabaseClient Service to the Controller Service we created previously in Setup MarkLogicDatabaseClientService.

Different Query Approaches

There are other Query Type options that can be used. Below are examples of Query property values for each of those Query Type options. See Additional Query Templates for templates with the different approaches.

Combined Query (JSON)
{ "ctsquery": { "collectionQuery": { "uris": ["iot-data"] } } }
    
Combined Query (XML)
<cts:collection-query xmlns:cts="http://marklogic.com/cts">
          <cts:uri>iot-data</cts:uri>
        </cts:collection-query>
  
String Query

Refrigerator

Structured Query (JSON)
{
      "query": {
        "queries": [{
          "collection-query": {
            "uri": ["iot-data"]
          }
        }]
      }
    }
Structured Query (XML)
<query xmlns="http://marklogic.com/appservices/search">
  <collection-query>
    <uri>iot-data</uri>
  </collection-query>
</query>

For more details on the available properties, see QueryMarkLogic Processor.

Add UpdateAttribute Processor

Add the UpdateAttribute Processor to the grid. This will let us place the document as direct children in the directory we are outputting to. Configure the processor by adding the filename property to ${filename:substringAfterLast('/')}. The property value uses the NiFi Expression Language to trim down filename property of the FlowFile down to just the local name.

Add a success relationship from the QueryMarkLogic processor to the UpdateAttribute processor.

Add PutFile Processor

Add the PutFile Processor to the grid. Configure the processor so that the failure and success relationships are automatically terminated.

Set Directory to the desired output directory, URI Prefix to /, and URI Suffix to .json.

Add a success relationship from the UpdateAttribute processor to the PutFile processor.

Run Export

Hold the shift key and click and drag to select all the processors on the grid. In the lower left select the Play button to start ingest.

After some time to allow the data to be exported, go to the output directory to see the exported documents.

Additional Query Templates