Navigation

4-1 Read MarkLogic XML, Write to CSV

This example uses the MarkLogic QueryBatchProcessor processor to read XML from a MarkLogic database, then writes certain element values to CSV. Using 8 processors and 4 controller services, this is really more complicated than it should be.

More about the "record" concept in NiFi: blog post, slide deck.

Note: QueryBatchProcessor has been superceded by QueryMarkLogic. Presently, QueryMarkLogic does not support the URI pattern property shown below.

Download Template

geoname-2980800.xml (Example Geoname XML data file)

geoname-avro-schema.json

Controller Services:
  • DefaultDatabaseClientService
    • Used by the MarkLogic QueryBatchProcessor. Configuration covered elsewhere.
  • AvroSchemaRegistry
    • Properties
      • Geoname: (paste the contents of the Avro schema file linked above)
  • JsonPathReader
    • Properties
      • Schema Registry: AvroSchemaRegistry
      • Schema Name: Geoname
  • CSVRecordSetWriter
    • Properties
      • Schema Write Strategy: Do Not Write Schema
      • CSV Format: Microsoft Excel
Processors:
  • QueryBatchProcessor – queries MarkLogic
    • Scheduling
      • Run Schedule: 1 day
    • Properties
      • DatabaseClient Service: (your database client service)
      • URI pattern: /geonames/*
  • EvaluateXPath - Store values from XML in FlowFile properties
    • Properties
      • Destination: flowfile-attribute
      • Id: string(/*[local-name()='geoname']/*[local-name()='id'])
      • Name: string(/*[local-name()='geoname']/*[local-name()='names']/*[local-name()='name' and @tag='main'])
      • Position: string(/*[local-name()='geoname']/*[local-name()='Point']/*[local-name()='pos'])
      • CountryCode: string(/*[local-name()='geoname']/*[local-name()='country-code'])
    • Settings
      • Check "failure" and "unmatched" under "Automatically Terminate Relationships".
  • UpdateAttribute – to split the Position value into Latitude and Longitude
    • Properties
      • Latitude: ${ Position:substringBefore(' ') }
      • Longitude: ${ Position:substringAfter(' ') }
  • AttributesToJSON – write the FlowFile attributes, including the values extracted from XML, to JSON
    • Properties
      • Destination: flowfile-content
  • ConvertRecord
    • Properties
      • Record Reader: JsonPathReader
      • Record Writer: CSVRecordSetWriter
  • MergeContent
    • Properties
      • (all defaults)
  • UpdateAttribute – to set the filename
    • Properties
      • filename: ${filename}.csv
  • PutFile
    • Properties
      • Directory: c:\some\directory