The MarkLogic Data Hub Framework is free and open source under the Apache 2 License and is supported by the community of developers who build and contribute to it. Please note that Data Hub Framework is not a supported MarkLogic product.
First thing is first. Load all of your data into MarkLogic… every last bit. Upon ingest, data is stored in a staging area. During the ingest phase you can enhance your data with extra metadata like provenance. Where did this data come from and when did it get ingested? Data can be loaded via:
Now that the data is loaded into the Staging area you will want to harmonize it. This can be as simple as keeping the data as-is or as involved as you want to make it. Some common actions that can be performed as part of the harmonize step are:
- Standardize dates and other fields
- Enrich data with additional information
- Extract important data into indexes for faster searching
- Leverage semantic triples to enrich your data
- Denormalizing multiple data sources into one document
Storing your data in the Data Hub is great, but you need to access it. Your data is made available to downstream sources via HTTP and REST.