Run Data Hub-4 Input Flow
This template demonstrates how to ingest a document and transform it with a Data Hub Framework input flow.
The example DHF Tutorial can be found at: Data Hub Framework Tutorial 4.X This template will follow the Order Entity example from DHF 4.X tutorial.
You can download the NiFi template here.
DHF version : 4.3.1
NAR version : MarkLogic NiFi 1.8.0.1
Apache NiFi version : 1.8.0
Input Data
The input data is a CSV file. Looking at MLCP command for the input flow, we can derive important details for the PutMarkLogic
process.
These are the parameters that map to properties in PutMarkLogic:
-output_collections "Order,LoadOrders,input"
-transform_module "/data-hub/4/transforms/mlcp-flow-transform.sjs"
-transform_param "entity-name=Order,flow-name=Load%20Orders"
Because the transform module is SJS, the Server Transform
property value is ml:sjsInputFlow
. If it were XQuery, we would use ml:inputFlow
.
The transform parameters are given as separate custom properties prefixed with trans:
. (see below under PutMarkLogic)
Processors
GetFile
Reads files from a watched directory
Properties
- Input Directory
- /path/to//data-hub/input/orders
- Keep Source File
- true
Scheduling
- Run Schedule
- 10000 days
InferAvroSchema
Properties
- Schema Output Destination
- flowfile-attribute
- Input Content Type
- csv
- Get CSV Header Definition From Data
- true
- Avro Record Name
- MyCSV
Settings
- Automatically Terminate Relationships
- failure, original, unsupported content
ConvertCSVToAvro
Properties
- Record Schema
- ${inferred.avro.schema}
Settings
- Automatically Terminate Relationships
- failure, incompatible
SplitAvro
Properties
(all default)
Settings
- Automatically Terminate Relationships
- failure, original
ConvertAvroToJson
Properties
(all default)
Settings
- Automatically Terminate Relationships
- failure
PutMarkLogic
Properties
- DatabaseClient Service
- (MLStagingService, pointing to data-hub-STAGING database and the corresponding HTTP port)
- Collections
- Order,LoadOrders,input
- Server transform
- ml:sjsInputFlow
- URI attribute name
- uuid
- trans:entity-name (custom property)
- Order
- trans:flow-name (custom property)
- Load Orders