Navigation

1-7 Split XML Files Into Multiple Documents

This example introduces the SplitXml processor to split an aggregate XML file into multiple documents. We will use the input data and URI structure of the same use case from the MLCP Guide. SplitXml takes a different approach to splitting XML than mlcp. With SplitXml you specify only the numeric level to split on, instead of specifying the namespace and localname of the element that will become the root element of the split documents. One means split on the children of the root, 2 means split two levels down, etc.

See the earlier EvaluateXPath example for notes on namespace support and the workaround.

  • Download Template
  • Processors:
    • GetFile – reads files from a watched directory
      • Properties
        • Input Directory: /some/path
    • SplitXml - splits XML into multiple files at the specified numeric level.
      • Properties
        • Split Depth: 1 (default)
    • EvaluateXPath - Store values from XML in FlowFile properties
      • Properties
        • Destination: flowfile-attribute
        • last.name: string(//*[local-name()='last']) (custom property)
      • Settings
        • Check "failure" and "unmatched" under "Automatically Terminate Relationships".
    • PutMarkLogic
      • Properties
        • DatabaseClient Service: (your MarkLogic DatabaseClient Service)
        • URI Attribute Name: last.name
        • URI Prefix: /people/
        • URI Suffix: .xml
      • Settings
        • Automatically Terminate Relationships: failure, success