2-2 Load PDF as Binary and Extracted Metadata as JSON
This example shows how to use the ExtractMediaMetadata processor to extract the properties from a PDF file and AttributesToJSON to convert the FlowFile attributes, including the extracted PDF properties, to a JSON file. This is also the first example to show multiple uses of the same relationship, in this case the "success" relationship of GetFile, to create two sub-flows.
- Download Template
- Processors:
- GetFile – reads files from a watched directory
- Properties
- Input Directory: /some/path
- Properties
- UpdateAttribute (after GetFile)
- Properties
- marklogic.uri: /pdfs/${filename}(custom property)
- Properties
- ExtractMediaMetadata
- Properties
- (all defaults)
- Properties
- AttributesToJSON
- Properties
- Destination: flowfile-content
- Properties
- UpdateAttribute (after AttributesToJSON)
- Properties
- marklogic.uri:/pdfs/${filename}.json(custom property)
- Properties
- PutMarkLogic
- Properties
- DatabaseClient Service: MarkLogicClientService – Localhost / Documents
- URI attribute name: marklogic.uri
- Settings
- Automatically Terminate Relationships: FAILURE and SUCCESS
- Properties
- GetFile – reads files from a watched directory