Navigation

3-2 Load Data from SharePoint

Authentication is the most challenging issue in NiFi / SharePoint integration. NiFi currently (1.6.0) does not support NTLM authentication in the InvokeHTTP processor. (see Jira issue) Azure-hosted SharePoint and the Office 365 version of SharePoint may require a combination of OAuth and two-way SSL.

These are beyond the scope of this recipe. The flows here were tested with an on-premises installation of SharePoint Server 2016 with basic authentication enabled, as below:

Download Template
Download Template (with extra LogAttribute processors)

Most of the flow's error conditions are routed to these processors. For debugging, keep the LogAttribute processors in the Stopped state. On error, you can List Queue on any queue which is non-empty and view the error flowfiles.

Processors:
  • GenerateFlowFile – starts the flow. Stop and Start the processor to retry.
    • Settings
      • Automatically Terminate Relationships: failure
    • Scheduling
      • Run Schedule: 1000 days (prevents infinitely looping)
  • UpdateAttribute
    • Properties
      • sharepoint.url: http://yourdomainname
  • InvokeHTTP ("Get SharePoint ContextInfo Digest")
    • Properties
      • HTTP Method: POST
      • Remote URL: ${sharepoint.url}/_api/contextinfo
      • Basic Authentication Username: (sharepoint user)
      • Basic Authentication Password: (sharepoint password)
      • Attributes to Send: Accept
      • Accept: application/json; odata=verbose (custom property)
  • EvaluateJsonPath ("Extract Digest")
    • Properties
      • Destination: flowfile-attribute
      • sharepoint.digest: $.d.GetContextWebInformation.FormDigestValue (custom property)
  • InvokeHTTP ("Get Document Libraries")
    • Properties
      • HTTP Method: GET
      • Remote URL: ${sharepoint.url}/_api/lists?$select=Title,ServerRelativeUrl&$filter=BaseTemplate eq 101 and hidden eq false&$expand=RootFolder
      • Basic Authentication Username: (sharepoint user)
      • Basic Authentication Password: (sharepoint password)
      • Attributes to Send: Accept,X-RequestDigest
      • Accept: application/json; odata=verbose (custom property)
      • X-RequestDigest: ${sharepoint.digest}
  • SplitJson
    • Properties
      • JsonPath Expression: $.d.results
  • EvaluateJsonPath ("Extract Library ServerRelativeUrl")
    • Properties
      • Destination: flowfile-attribute
      • server.relative.url: $.RootFolder.ServerRelativeUrl (custom property)
  • InvokeHTTP ("Get Doc Library File List")
    • Properties
      • HTTP Method: GET
      • Remote URL: ${sharepoint.url}/_api/web/GetFolderByServerRelativeUrl('${server.relative.url}')/Files
      • Basic Authentication Username: (sharepoint user)
      • Basic Authentication Password: (sharepoint password)
      • Attributes to Send: Accept,X-RequestDigest
      • Accept: application/json; odata=verbose (custom property)
      • X-RequestDigest: ${sharepoint.digest}
  • SplitJson
    • Properties
      • JsonPath Expression: $.d.results
  • EvaluateJsonPath ("Extract FileServerRelativeUrl")
    • Properties
      • Destination: flowfile-attribute
      • server.relative.url: $.ServerRelativeUrl (custom property)
  • UpdateAttribute ("Set MarkLogic URI (Metadata)")
    • Properties
      • marklogic.uri: ${server.relative.url:replace(' ', '%20'):urlEncode()}.metadata.json
  • InvokeHTTP ("Get Document")
    • Properties
      • HTTP Method: GET
      • Remote URL: ${sharepoint.url}${server.relative.url}
      • Basic Authentication Username: (sharepoint user)
      • Basic Authentication Password: (sharepoint password)
      • Attributes to Send: Accept,X-RequestDigest
      • Accept: application/json; odata=verbose (custom property)
      • X-RequestDigest: ${sharepoint.digest}
  • UpdateAttribute ("Set MarkLogic URI (Document)")
    • Properties
      • marklogic.uri: ${server.relative.url:replace(' ', '%20'):urlEncode()}
  • InvokeHTTP ("PUT to /v1/documents")
    • Properties
      • HTTP Method: PUT
      • Remote URL: http://localhost:8000/LATEST/documents?uri=${marklogic.uri}
      • Basic Authentication Username: youruser
      • Basic Authentication Password: yourpassword

The REST API calls to extract document library documents and metadata are as follows.

POST

http://hostname/_api/contextinfo

Headers: "Accept: application/json; odata=verbose"

Returns:


{ "d": { "GetContextWebInformation": { "__metadata": { "type": "SP.ContextWebInformation" }, "FormDigestTimeoutSeconds": 1800, "FormDigestValue": "0x2C161C94B635D0D7221D917DA77F2B942F33BAFE3261D839CBDED272B5CA39A56B14FFDEBEABE2B25C1974EDEA15B24E9FEBE881C423BE99CF774B611BE7F0F1,30 May 2018 22:46:04 -0000", "LibraryVersion": "16.0.4690.1000", "SiteFullUrl": "http://win-ds66hph86cm", "SupportedSchemaVersions": { "__metadata": { "type": "Collection(Edm.String)" }, "results": [ "14.0.0.0", "15.0.0.0" ] }, "WebFullUrl": "http://win-ds66hph86cm" } } }

Extract the value of FormDigestValue to send in the HTTP header X-RequestDigest on each subsequent REST call.

Get document libraries:

GET

http://hostname/_api/lists?$select=Title,ServerRelativeUrl&$filter=BaseTemplate eq 101 and hidden eq false&$expand=RootFolder

Headers

Accept: application/json; odata=verbose

X-RequestDigest: (value of FormDigestValue from above)

 

Split the JSON on the Results array, then extract the ServerRelativeUrl from each document library object. For each ServerRelativeUrl, get document library file list:

 

GET

http://hostname/_api/web/GetFolderByServerRelativeUrl('${server.relative.url}')/Files

Headers

Accept: application/json; odata=verbose

X-RequestDigest: (value of FormDigestValue from above)

 

Split the JSON on the Results array, then extract the ServerRelativeUrl from each file object. Store the file object, the file metadata, in MarkLogic with a URI of '${server.relative.url}'.metadata.json. For each ServerRelativeUrl, get the file from SharePoint and store in MarkLogic with the ServerRelativeUrl as the URI.

 

To get each document:

GET

http://hostname${server.relative.url}