MarkLogic Community Recipes
Data Hub-4 Input Flow
Use a MarkLogic Data Hub Input Flow to transform documents while loading.
Run Data Hub-4 Harmonize Flow
Call the DHF Harmonization flow using EvaluateCollector Processor
Run Data Hub-4 Flows (Input and Harmonize)
Orchestration of DHF input and Harmonize flow in single NiFi template
Read Files from Directory, Write to MarkLogic
This example watches a directory for files, imports them into MarkLogic, then deletes them. The MarkLogic URI is /files/ followed by the filename.
Extract values from JSON data
This example introduces the EvaluateJsonPath processor and demonstrates how to extract an ID value from JSON data to use in constructing the URI.
Extract values from XML data
This example introduces the EvaluateXPath processor to extract an ID value from XML data to use in constructing the URI. The XPath value is stored in a FlowFile property which is used later in InvokeHTTP to construct the document URI.
Convert Data To UTF-8
This example demonstrates the ConvertCharacterSet processor and shows how to convert data from another character set to UTF-8.
Handling Multiple Types of Content
This example demonstrates how tow handle multiple content types.
Ingest Line-Delimited JSON
This example demonstrates the SplitText processor and shows how to ingest line-delimited JSON. Like the previous JSON example, we will construct the URI from an ID property in the JSON.
Split XML Files Into Multiple Documents
This example introduces the SplitXml processor to split an aggregate XML file into multiple documents.
Loading Documents From Compressed Files
This example demonstrates the UnpackContent processor and shows how to load content from one or more compressed files.
Generate Documents from CSV Files
This example introduces the EvaluateXPath processor to extract an ID value from XML data to use in constructing the URI.
Load PDF as Binary and Extracted Metadata as JSON
This example shows how to use the ExtractMediaMetadata processor to extract the properties from a PDF file and AttributesToJSON to convert the FlowFile attributes.
Call a Web Service
This example introduces the GenerateFlowFile processor and demonstrates how to consume JSON data from a paged web service.
Augment XML content with data from a Web Service
The example introduces xhtml fragment ingestion
Modify NiFi Attributes with Custom Scripting
This example introduces the ExecuteScript processor and demonstrates how to add an attribute with a Groovy script.
Get Files by FTP
This example uses the GetFTP processor to get a single file from an anonymous ftp server.
Extract Text from PDFs and Office Documents
This example uses the ExtractTextProcessor which is not included with NiFi but was developed by Hortonworks.
Get Data from a Relational Database
This example demonstrates ingesting data from a relational database
Create View, use GenerateTableFetch
Executes any query against a database. Does not support paging. Gets the entire resultset as a single Avro result that needs to be split.
Count Rows, Construct Paged SQL SELECTs
Designed for paging. Executes a SELECT COUNT(*), then generates SQL queries to page over the rows of a table in chunks, but does not execute them.
Use ExecuteSQLToColumnMaps
Polls Custom Query for additional rows by storing and querying with an increasing column.
Use ExecuteSQLToColumnMaps
This example explores a MarkLogic Community alternative to the built-in SQL processors
Load From an MLCP Archive
This example explores loading content and metadata from an MLCP archive
Transform JSON
This example demonstrates how to transform JSON with the built-in JoltTransformJSON processor
Load Data from SharePoint
This example demonstrates how to load data from a SharePoint server
Invoke HTML Tidy on HTML Content
This example demonstrates how to use HTML tidy to generate XHTML from HTML
Export MarkLogic Database Content to the File System
This example uses the QueryMarkLogic processor to query a MarkLogic database, then writes the documents to the file system with the PutFile processor.
Read MarkLogic XML, Write to CSV
This example uses the MarkLogic QueryBatchProcessor processor to read XML from a MarkLogic database, then writes certain element values to CSV.
Error Handling in NiFi Flows
Many of the example flows presented in this cookbook have auto-terminated relationships that represent error conditions, such as "failure", "unmatched", etc. Here we demonstrate a few patterns for handling those errors.
Error Handling in PutMarkLogic
Here we discuss error handling in the PutMarkLogic processor