4-1 Read MarkLogic XML, Write to CSV
This example uses the MarkLogic QueryBatchProcessor processor to read XML from a MarkLogic database, then writes certain element values to CSV. Using 8 processors and 4 controller services, this is really more complicated than it should be.
More about the "record" concept in NiFi: blog post, slide deck.
Note: QueryBatchProcessor has been superceded by QueryMarkLogic. Presently, QueryMarkLogic does not support the URI pattern property shown below.
Controller Services:
- DefaultDatabaseClientService
- Used by the MarkLogic QueryBatchProcessor. Configuration covered elsewhere.
- AvroSchemaRegistry
- Properties
- Geoname: (paste the contents of the Avro schema file linked above)
- Properties
- JsonPathReader
- Properties
- Schema Registry: AvroSchemaRegistry
- Schema Name: Geoname
- Properties
- CSVRecordSetWriter
- Properties
- Schema Write Strategy: Do Not Write Schema
- CSV Format: Microsoft Excel
- Properties
Processors:
- QueryBatchProcessor – queries MarkLogic
- Scheduling
- Run Schedule: 1 day
- Properties
- DatabaseClient Service: (your database client service)
- URI pattern: /geonames/*
- Scheduling
- EvaluateXPath - Store values from XML in FlowFile properties
- Properties
- Destination: flowfile-attribute
- Id: string(/*[local-name()='geoname']/*[local-name()='id'])
- Name: string(/*[local-name()='geoname']/*[local-name()='names']/*[local-name()='name' and @tag='main'])
- Position: string(/*[local-name()='geoname']/*[local-name()='Point']/*[local-name()='pos'])
- CountryCode: string(/*[local-name()='geoname']/*[local-name()='country-code'])
- Settings
- Check "failure" and "unmatched" under "Automatically Terminate Relationships".
- Properties
- UpdateAttribute – to split the Position value into Latitude and Longitude
- Properties
- Latitude: ${ Position:substringBefore(' ') }
- Longitude: ${ Position:substringAfter(' ') }
- Properties
- AttributesToJSON – write the FlowFile attributes, including the values extracted from XML, to JSON
- Properties
- Destination: flowfile-content
- Properties
- ConvertRecord
- Properties
- Record Reader: JsonPathReader
- Record Writer: CSVRecordSetWriter
- Properties
- MergeContent
- Properties
- (all defaults)
- Properties
- UpdateAttribute – to set the filename
- Properties
- filename: ${filename}.csv
- Properties
- PutFile
- Properties
- Directory: c:\some\directory
- Properties