Grove + MarkLogic Data Hub
We often want to run Grove as a UI working alongside a MarkLogic Data Hub. This guide provides the mechanics on how to do this, and provides a choice of method.
This guide has been vetted on both Data Hub 5.x and the Data Hub Framework 4.3.1.
Migrating Grove-related artifacts and code to the Data Hub
In order to run a Grove UI project alongside a MarkLogic Data Hub, we recommend migrating Grove related artifacts and code from its code base into your Data Hub project. This approach was tested against both the Data Hub and the Data Hub Service.
Note that this implies some coordination between the Data Hub project and Grove even after migration of Grove’s core capabilities. If possible, it is easiest to manage both projects within the same code repository. The Grove files should all be within a dedicated parent directory. (For example, grove/ui
and grove/middle-tier
. Note that there is nothing magic about the grove
directory name, so call it whatever you want.)
You can accomplish this recommended approach by moving a subset of files from inside the marklogic
directory of your Grove project into your Data Hub project. Grove projects generated using the grove-cli’s grove new
command ship with this marklogic
directory, which contains configuration files for a standalone MarkLogic database, optimized for Grove’s sample data set. Once the relevant files are migrated, the marklogic
directory can be deleted entirely. Additionally, an environment variable will need to be updated to reflect the data-hub-FINAL HTTP app server as configured by the Data Hub project.
The advantages of this approach include:
- Combining the Grove-related artifacts with the Data Hub simplifies development and deployments. The Grove code is logically separated inside its own directory under
data-hub/src/main/ui-modules
for clarity and distinction from Data Hub specific modules. - If code is maintained together in the same repository, consistent versioning can be ensured.
- The Grove UI becomes a downstream consumer of the Data Hub.
There are some disadvantages to this approach:
- UI specific code will reside in close proximity to backend specific configurations. This might not be as preferable to some development teams.
- Any CRUD operations performed by the Grove project will write to the Data Hub’s FINAL database.
- The security approach for Data Hub differs from that of Grove. It is best to drop that direction and use MarkLogic’s guidance for the Data Hub.
How to implement the recommended approach, step-by-step:
Prerequisite: MarkLogic Data Hub has been deployed to your MarkLogic host.
[Optional] If you previously installed Grove to its default database via the mlDeploy
command within the marklogic
directory, remove the Grove-specific configs before proceeding:
> cd grove/marklogic
> ./gradlew mlUndeploy -Pconfirm=true
Steps to move Grove modules to the Data Hub
- Determine the MarkLogic host and port the data-hub-FINAL app server is running on.
- At the command line, from within the top level of your Grove project, (the directory with
middle-tier
andui
directories inside it), rungrove config
and enter the host and port of the data-hub-FINAL app server. This will update environment variables for your project. - Create the
ui-modules
directory inside the Data Hub project:data-hub/src/main/ui-modules
. -
Update the Data Hub’s
gradle.properties
to register this new location for MarkLogic module deployments and UI-related data. Add the following line togradle.properties
:mlModulePaths=src/main/ml-modules,src/main/ui-modules mlDataPaths=src/main/ml-data,src/main/ui-data
-
If you are using a version of the Data Hub older than 5.0.1, update the Data Hub’s
build.gradle
to use a newer ml-gradle, version 3.14.0 or higher. 4.0.4 is used in this example as it is the most recent at time of writing. Add thedependencies
to thebuildscript
object around line 5:dependencies { classpath "gradle.plugin.com.marklogic:ml-gradle:4.0.4" }
- Copy the contents of the Grove modules to the Data Hub project. Copy
grove/marklogic/src/main/ui-modules to data-hub/src/main/ui-modules
- Copy the contents of the Grove default user profile and a dictionary document to the Data Hub project. Copy the files in
grove/marklogic/src/main/ui-data to data-hub/src/main/ui-data
-
Edit the query options used by your Grove middle-tier’s search route (by default, these query options are called
all
and are found in thedata-hub/src/main/ui-modules/options/all.xml
file) to remove the following blocks:<constraint name="eyecolor"> ... </constraint> <constraint name="docFormat"> ... </constraint> <constraint name="gender"> ... </constraint>
- Make any other necessary changes to the query options file. For example, you may need to update the
<additional-query>
specifying that only docs in thedata
collection are returned. There may be a Data Hub specific collections that are a natural fit to limit the search results for your Grove application. For example,<cts:collection-query> <cts:uri>Entity1</cts:uri> <cts:uri>Entity2</cts:uri> ..... </cts:collection-query>
-
Run
./gradlew mlLoadModules
from inside thedata-hub
directory. This command will deploy the new modules and supporting documents. - Delete the contents of
grove/marklogic
.
Alternative Approach to Run Grove with the Data Hub
In order to run a Grove UI project alongside the MarkLogic Data Hub project, we alternatively suggest creating a Grove project-specific app-server, which has its own modules database but points at the content database with the data you wish to visualize: most often the Data Hub project’s FINAL database. The Data Hub project will be responsible for managing the content database (including setting up indexes - needed for facets - and security permissions), as well as the content database’s related triggers and schemas databases, while the Grove project will manage the Grove project’s specific app-server and modules database. Security will be a shared responsibility, where the Data Hub has to give low-level access to data, and Grove can arrange higher-level access.
Note that this implies some coordination between the Data Hub project and Grove. If possible, it is easiest to manage both projects within the same code repository. The Grove files should all be within a dedicated parent directory. (For example, grove/ui
, grove/middle-tier
, and grove/marklogic
. Note that there is nothing magic about the grove
directory name, so call it whatever you want.)
If you need more independence than that provided by this approach, you may consider replicating the Data Hub data to a Grove-specific content database - but that is currently beyond the scope of this guide.
You can accomplish this approach by making some changes inside the marklogic
directory of your Grove project. Grove projects generated using the grove-cli’s grove new
command ship with this marklogic
directory, which contains configuration files for a standalone MarkLogic database, optimized for Grove’s sample data set.
The advantages to this approach include:
- It does not pollute the Data Hub project’s modules database.
- It creates a line between configuration to support the Data Hub project and configuration to support the Grove UI project (with some exceptions, described in the next section on downsides). For example, the Grove project can set up its own users, roles, and security permissions (though those roles need to be consistent with doc permissions set in Data Hub).
There are some downsides to this approach:
- You will have two different ml-gradle installations, one for the Grove project and one for the Data Hub project, which can be confusing and time-consuming, because you have to run gradle tasks in two places, for example when bootstrapping the project. You could mitigate this by creating scripts that automatically run gradle scripts in both places.
- You will have to add some configuration to the Data Hub project in order to support the Grove UI. For example, the Data Hub project may need to add new indexes in order to support facets for the Grove UI. Content permissions will need to correspond to Grove users and roles. And if triggers and schemas are desired for the Grove project, those will have to be set up in the Data Hub ml-gradle configuration. (Note that this kind of demand will come from any downstream system connecting to a Data Hub content database.)
- Any CRUD operations performed by the Grove project will write to the Data Hub project’s content database.
How to implement the alternative approach, step-by-step:
-
Run
grove config
at the top-level of your Grove project to ensure that, for example, an available port is specified. -
Delete
content-database.json
,schemas-database.json
, andtriggers-database.json
frommarklogic/src/main/ml-config/databases/
. -
Edit
marklogic/src/main/ml-config/servers/app-server
andmarklogic/src/main/ml-config/rest-api.json
to point to the correct content-database name. -
Edit the search options used by your Grove middle-tier’s search route (by default, these search options are called
all
and are found in themarklogic/src/main/ml-modules/options/all.xml
file) to remove the following blocks:<constraint name="eyecolor"> ... </constraint> <constraint name="docFormat"> ... </constraint> <constraint name="gender"> ... </constraint>
-
Make any other necessary changes to the search options file. For example, you may need to remove the
<additional-query>
specifying that only docs in thedata
collection are returned. -
Run
./gradlew mlDeploy
from inside themarklogic
directory. This command will deploy the new configuration, but it will not change the Data Hub project’s content database, because you have removed all related configuration files.