Navigation

2-7 Get Data from a Relational Database

The NiFi support for relational databases has some issues with flexibility and scaleability. There is no single processor that can execute an arbitrary SQL SELECT statement and page through the results efficiently.

There are the available processors and their issues:

  • ExecuteSQL Executes any query against a database. Does not support paging. Gets the entire resultset as a single Avro result that needs to be split.
  • GenerateTableFetch Designed for paging. Executes a SELECT COUNT(*), then generates SQL queries to page over the rows of a table in chunks, but does not execute them. Limitations: 1) only supports one table, 2) does not always get the paging syntax right for every database.
  • QueryDatabaseTable As of NiFi 1.7.0, supports arbitrary queries through the "Custom Query" property. Polls for additional rows by storing and querying with an increasing column. (timestamp, incrementing ID, etc.).
  • ExecuteSQLToColumnMaps
  • MarkLogic Community alternative to the built-in SQL processors

All of the examples below were tested with MySQL 8 with the sample employee database located here: https://github.com/datacharmer/test_db There are about 300,000 employees in the database, so I will assume that that is too big for ExecuteSQL without paging. This is the SQL query from which my documents will be generated:

SELECT employees.emp_no, employees.birth_date, employees.first_name, employees.last_name, employees.gender, employees.hire_date, employees.dept_emp.dept_no, employees.departments.dept_name, employees.salaries.salary, employees.titles.title FROM employees.employees, employees.dept_emp, employees.departments, employees.salaries, employees.titles WHERE employees.employees.emp_no = employees.dept_emp.emp_no AND employees.dept_emp.to_date > CURDATE() AND employees.dept_emp.dept_no = employees.departments.dept_no AND employees.employees.emp_no = employees.salaries.emp_no AND employees.salaries.to_date > CURDATE() AND employees.employees.emp_no = employees.titles.emp_no AND employees.titles.to_date > CURDATE()

Subpages: