This topic presents best practices to follow when you use the Greenplum Streaming Server Kafka Integration.
gpkafka supports two mechanisms to control how and when it commits data to Greenplum Database: a time period or a number of rows. You specify one or both of
MAX_ROW in the Kafka load configuration file.
For best results, try various settings of
MINIMAL_INTERVAL to determine what value works best in your environment.
When message flow is heavy, GPSS may receive and buffer many messages during the
MINIMAL_INTERVAL time period. In this situation, also providing a
MAX_ROW setting may mitigate any high memory usage scenarios.