You use the Apache NiFi user interface to configure a dataflow that uses the VMware Greenplum Connector for Apache NiFi PutGreenplumRecord
processor to load record-oriented data from any source into Greenplum Database.
The PutGreenplumRecord
processor accepts record-based FlowFiles, sending the data to the Greenplum Streaming Server to write to Greenplum Database. When you configure the processor, you must identify the type and instance of the RecordReader
that corresponds to the format of the data contained in incoming FlowFiles, the Greenplum connection specifics, and the Greenplum schema and table.
The default load mode for the Connector is to insert data into Greenplum. You can configure the processor to merge or update data instead, and configuration properties for field to column translation and mappings allow you further specify these operations.
You configure the PutGreenplumRecord
processor via the Configure Processor dialog. This dialog includes SETTINGS, SCHEDULING, PROPERTIES, and COMMENTS tabs.
The SETTINGS tab specifies FlowFile routing and timeouts for the processor. You can also use this tab to change the name of the processor and activate/deactivate the processor.
The Settings Tab documentation in the Apache NiFi User Guide describes the configuration options on this tab.
The SCHEDULING tab specifies the scheduling strategy, run schedule, and concurrency options for the processor.
When you set Concurrent Tasks to a value greater than one, the processor runs with the specified number of threads. The single PutGreenplumRecord
processor instance will process multiple flow files concurrently, each managed by its own session.
The Scheduling Tab documentation in the Apache NiFi User Guide describes the configuration properties on this tab.
The PROPERTIES tab of the Configure Processor dialog identifies the PutGreenplumRecord
processor configuration properties.
The Connector utilizes default values for many of the PutGreenplumRecord
properties. You are required to set the Record Reader
, Greenplum Adapter
, and Greenplum Table Name
property values.
The PutGreenplumRecord
processor configuration properties are listed and further described in the table and topics below:
Property Name | Description | Default Value |
---|---|---|
Record Reader | The controller service that deserializes the input FlowFile. Required. | |
Greenplum Adapter | The controller service that identifies and manages the Greenplum Database and Greenplum Streaming Server connection parameters. Required. | |
Schema Name | The name of the Greenplum Database schema in which the target table resides. Required. | public |
Table Name | The name of the target Greenplum table in which to load the data. Required. | |
Operation Type | The type of load operation: INSERT, UPDATE, or MERGE. Required. | INSERT |
Match Columns | The Greenplum table columns to match with the FlowFile record data. Required for the UPDATE and MERGE operation types. | |
Translate Field Names | Boolean value that specifies if the Connector translates input FlowFile field names to Greenplum table column names. When true , the Connector uses case insensitive matching and ignores underscores. When false , the Connector does not translate, and field and column names must match exactly. |
true |
Unmatched Field Behavior | Specifies the Connector’s behavior when an incoming FlowFile record has a field that does not map to a column in the Greenplum table. | Ignore Unmatched Fields |
Unmatched Column Behavior | Specifies the Connector’s behavior when an incoming FlowFile record does not have a field mapping for every one of the Greenplum table columns. | Fail on Unmatched Columns |
Rollback On Failure | Boolean value that specifies whether or not the Connector should roll back when it encounters an error processing a FlowFile. | false |
Maximum Record Batch Size | Specifies the maximum number of records in each batch of data that the Connector will write to Greenplum. The Connector stores the batch in memory until it reaches this size. | 0 (write all records in a single transaction) |
The Connector supports inserting, merging, and updating records from a FlowFile into a Greenplum Database table. You use the Operation Type
property to specify the load mode:
Mode | Description |
---|---|
INSERT | Insert records as new rows into the Greenplum table (the default mode). |
MERGE | Use Match Columns to match records to existing table rows, and update these rows with the data from the records. A record with no matching database row is inserted into the Greenplum table as new row. |
UPDATE | Use Match Columns to match records to existing table rows, and update these Greenplum Table rows with data from the records. |
Use operation.type Attribute | Obtain the load mode from an operation.type attribute in the FlowFile. |
When Operation Type
is UPDATE or MERGE, you must specify one or more Match Columns
, a comma-separated list of column names that uniquely identifies a row in the Greenplum table. The Connector ignores the Match Columns
property when the Operation Type
is INSERT.
The Connector exposes properties that allow you to choose how you want the Connector to map FlowFile record fields to Greenplum Database table columns.
The Translate Field Name
property is a boolean value that specifies if the Connector translates field names in the FlowFile record into column names in the Greenplum table. The default value is true
; the processor uses case-insensitive matching and ignores underscores when it translates field names into column names. When the value is false
, the FlowFile field names must match the Greenplum table column names exactly, or the column value will not be updated.
When an incoming FlowFile record has a field that does not map to any of the columns in the Greenplum table, set the Unmatched Field Behaviour
property to specify how the Connector should handle the situation:
Ignore Unmatched Fields
- (the default) The Connector ignores any field in the FlowFile record that cannot be mapped to a column in the Greenplum table.Fail on Unmatched Fields
- The Connector routes the FlowFile to the failure relationship when the record has any field that cannot be mapped to a column in the table.If an incoming FlowFile record does not have a field mapping for every one of the columns in the Greenplum table, set the Unmatched Column Behavior
property to specify how the Connector should handle the situation:
Ignore Unmatched Columns
- The Connector assumes that a column in the table that does not have a matching field in the record is not required.Warn on Unmatched Columns
- The Connector assumes that a column in the table that does not have a matching field in the record is not required, and the Connector logs a warning.Fail on Unmatched Columns
- (the default) A flow fails when a column exists in the table and there is no matching field in the record. The Connector also logs an error.The Connector distinguishes between the transient and the non-recoverable errors that it encounters. Transient errors are those that may succeed on a later retry, such as a connection attempt to Greenplum Database. Conversely, a FlowFile that contained bad input data would continue to fail when retried.
The Connector applies success or failure at the FlowFile level. That is, the Connector considers a write operation successful if all records in a single FlowFile are written to the Greenplum Database table with no errors. If a single record in the FlowFile fails to write for some reason (say the data is malformed), none of the records in the FlowFile are written to Greenplum, and the Connector considers the operation failed.
Rollback On Failure
is a boolean property that specifies whether or not the Connector rolls back the NiFi session when it encounters a failure processing a FlowFile.
The default Rollback On Failure
setting is false
. When the Connector encounters an error while processing a FlowFile, the FlowFile is routed to the failure
or retry
relationship based on the error type, and the processor continues processing the next FlowFile.
When Rollback On Failure
is true
, the Connector:
The rolled back FlowFile may be processed repeatedly by the Connector until it is processed successfully or removed by other means.
Be sure to set an adequate SETTINGS Yield Duration
for the processor to avoid retrying too frequently.
For each FlowFile it receives, the Connector:
The maximum number of records in a write call that the Connector makes to the Greenplum Streaming Server is determined by the Maximum Record Batch Size
that you specify for the processor.
The default value is zero (0); there is no limit on the batch size, and the Connector accumulates all FlowFile content in memory before it writes to Greenplum in a single transaction.