Navigation

Run Data Hub-4 Input Flow

This template demonstrates how to ingest a document and transform it with a Data Hub Framework input flow.

The example DHF Tutorial can be found at: Data Hub Framework Tutorial 4.X This template will follow the Order Entity example from DHF 4.X tutorial.

You can download the NiFi template here.

DHF version : 4.3.1

NAR version : MarkLogic NiFi 1.8.0.1

Apache NiFi version : 1.8.0

Input Data

The input data is a CSV file. Looking at MLCP command for the input flow, we can derive important details for the PutMarkLogic process.

MLCP import command

These are the parameters that map to properties in PutMarkLogic:

-output_collections "Order,LoadOrders,input"
            -transform_module "/data-hub/4/transforms/mlcp-flow-transform.sjs"
            -transform_param "entity-name=Order,flow-name=Load%20Orders"
          

Because the transform module is SJS, the Server Transform property value is ml:sjsInputFlow. If it were XQuery, we would use ml:inputFlow.

The transform parameters are given as separate custom properties prefixed with trans:. (see below under PutMarkLogic)

Processors

GetFile

Reads files from a watched directory

Properties

Input Directory
/path/to//data-hub/input/orders
Keep Source File
true

Scheduling

Run Schedule
10000 days

InferAvroSchema

Properties

Schema Output Destination
flowfile-attribute
Input Content Type
csv
Get CSV Header Definition From Data
true
Avro Record Name
MyCSV

Settings

Automatically Terminate Relationships
failure, original, unsupported content

ConvertCSVToAvro

Properties

Record Schema
${inferred.avro.schema}

Settings

Automatically Terminate Relationships
failure, incompatible

SplitAvro

Properties

(all default)

Settings

Automatically Terminate Relationships
failure, original

ConvertAvroToJson

Properties

(all default)

Settings

Automatically Terminate Relationships
failure

PutMarkLogic

Properties

DatabaseClient Service
(MLStagingService, pointing to data-hub-STAGING database and the corresponding HTTP port)
Collections
Order,LoadOrders,input
Server transform
ml:sjsInputFlow
URI attribute name
uuid
trans:entity-name (custom property)
Order
trans:flow-name (custom property)
Load Orders

Template

MLCP import command