This document contains pertinent release information about the VMware Tanzu Greenplum Streaming Server version 1.x releases. The Greenplum Streaming Server (GPSS) is included in certain Tanzu Greenplum 5.x and 6.x distributions. GPSS for Red Hat/CentOS 6 and 7, Red Hat 8, Photon 3, and Ubuntu 18.04 is also updated and distributed independently of Greenplum Database. You may need to download and install the GPSS distribution from VMware Tanzu Network to obtain the most recent version of this component.

Supported Platforms

Tanzu Greenplum Streaming Server 1.x is compatible with these Tanzu Greenplum versions:

  • Tanzu Greenplum 5.17.0 and later
  • Tanzu Greenplum 6.0.0 and later

Release 1.8

Release 1.8.0

Release Date: September 9, 2022

Greenplum Streaming Server 1.8.0 adds new features, includes changes, and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.8.0.

New and Changed Features

Greenplum Streaming Server 1.8.0 includes these new and changed features:

GPSS Configuration

The gpss.json server configuration file now includes a Gpfdist:Certificate:DBClientShared property. Use this boolean property to instruct GPSS to reuse the Gpfdist SSL certificate for the control channel (client) connection to Greenplum Database. Configuring SSL for the Control Channel provides the relevant configuration information.

General
  • When ReuseTables is set to false, GPSS now creates each job's external table using the job name rather than a hash. This enables you to more easily track external tables per-job. About External Table Naming and Lifecycle describes how GPSS names external tables, and also provides information about their lifecycle.
  • GPSS introduces new scheduling options that allow you to configure automatic stop and restart conditions for jobs. You specify the RUNNING_DURATION, AUTO_STOP_RESTART_INTERVAL, MAX_RESTART_TIMES, and QUIT_AT_EOF_AFTER (version 2) or running_duration, auto_stop_restart_interval, max_restart_times, and quit_at_eof_after (version 3 (Beta)) options in the SCHEDULE/schedule block of the load configuration file.
  • GPSS enhances the delimited data format to support setting quote and escape characters and an end-of-line prefix string when you use the format to load data into Greenplum Database.
Kafka Data Source
  • GPSS records in the progress log file the total number of rows that it processes in a Kafka message. Now, when loading jsonl, delimited, and csv format data where a Kafka message can include multiple rows, the total_rows_read identifies the Kafka message and the new total_rows field identifies the total number of rows inserted and rejected.
  • The Kafka data source exposes a new metadata field named timestamp. This int64-type field identifies the time that a message was written to the Kafka log.
  • When SAVE_FAILING_BATCH is true, GPSS records the time that a record was inserted into the backup table. The name of the new column is gpss_save_timestamp. Refer to Redirecting Data to a Backup Table when GPSS Encounters Expression Evaluation Errors for a discussion of the backup table schema.
  • When RECOVER_FAILING_BATCH (Beta) is true, GPSS reports more information about the result of the operation, including the batch size and number of records recovered.
File Data Source
  • The file data source now supports the delimited data format.
  • The file data source can now load the stdout of a command into a Greenplum Database table. You specify command specifics via the new EXEC (version 2) or exec (version 3 (Beta)) block in the load configuration file.
  • GPSS now supports initiating a dry run of a file job.
New RabbitMQ Data Source (Beta)

GPSS introduces Beta support for loading from a RabbitMQ data source. You can load messages from a RabbitMQ queue or stream into Greenplum Database. Refer to Loading from RabbitMQ into Greenplum (Beta) for more information about using this new Beta feature, and rabbitmq-v3.yaml (Beta) and rabbitmq-v2.yaml (Beta) for more information about the supported load configuration file properties.

Resolved Issues

Greenplum Streaming Server 1.8.0 resolves these issues:

32278, 32180
Resolves an issue where a Greenplum Database cluster using pgbouncer to manage connections did not receive a client SSL certificate as expected. GPSS now exposes a DBClientShared GPSS server configuration property that you can use to instruct GPSS to present the Gpfdist certificate as the client SSL cert to Greenplum Database.
32096, 31802
Resolves an issue where GPSS was unable to automatically stop a job based on run time by exposing new job scheduling properties.
32044
Resolves an issue where the recovery of a failed batch (Beta) could not be adequately monitored. GPSS now records the time that a record is inserted into the backup table in a new column named gpss_save_timestamp. GPSS also reports more information during bad batch recovery operations.
32144
Resolves an issue where external tables used by GPSS were difficult to locate. Now, when ReuseTables is false, GPSS names the external table using the job name instead of a hash of configuration properties.
182386619
GPSS would incorrectly fall back (to earliest or latest offset) all Kafka partitions, even those without offset gaps. This issue is resolved; GPSS now falls back only those partitions that have experienced an offset gap and writes this information to the GPSS log.
N/A
Resolves an issue where GPSS did not reset MAX_RETRIES after a job was successfully submitted and running.

Release 1.7

Release 1.7.2

Release Date: April 21, 2022

Greenplum Streaming Server 1.7.2 includes changes and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.2.

Changed Features

Greenplum Streaming Server 1.7.2 includes these changes:

  • GPSS adds support for specifying backslash escape sequences when you set the following CSV options: delimiter, quote, and escape. GPSS supports the standard backslash escape sequences for backspace, form feed, newline, carriage return, and tab, as well as escape sequences that you specify in hexadecimal format (prefaced with \x). Refer to Backslash Escape Sequences in the PostgreSQL documentation for more information.
  • To resolve issue 32168, GPSS version 1.7.2 introduces support for loading files or messages that contain one JSON record per line into Greenplum Database. To use this new feature, you must specify FORMAT: jsonl in version 2 format load configuration files, or specify json format with is_jsonl: true in version 3 (Beta) format load configuration files.

Resolved Issues

Greenplum Streaming Server 1.7.2 resolves these issues:

32168
Resolves an issue where GPSS did not support loading multi-line JSON files into Greenplum Database. GPSS 1.7.2 introduces support for loading JSON message or file data that contains a single JSON record per line.
N/A
Resolves an issue where GPSS did not support escape sequences that were specified in the CSV delimiter, quote, and escape options. GPSS now supports standard and hexadecimal-format backslash escape sequences.

Release 1.7.1

Release Date: March 31, 2022

Greenplum Streaming Server 1.7.1 resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.1.

Resolved Issues

Greenplum Streaming Server 1.7.1 resolves these issues:

32105
Resolves an issue where GPSS incorrectly added an offset based on the Greenplum Database local time zone to timestamp (without timezone) types that it loaded into a Greenplum Database table.
181293923
In some cases, GPSS returned the error pq: missing data for column *name* when loading a file containing CSV-format data. This issue is resolved; GPSS no longer automatically adds a newline when one already exists at the end of the file.

Release 1.7.0

Release Date: March 18, 2022

Greenplum Streaming Server 1.7.0 adds new features, includes changes, and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.0.

New and Changed Features

Greenplum Streaming Server 1.7.0 includes these new and changed features:

OS and Platforms
  • GPSS introduces support for Red Hat Enterprise Linux 8 and Photon 3 for Greenplum Database 6, and now provides download packages for these operating system versions on VMware Tanzu Network.
  • GPSS updates the version of go that it uses to build the CLI tools to version 1.17.6 to mitigate CVE-2021-44716.
GPSS Configuration

GPSS introduces a default timeout of 10 seconds for a gpss service instance to connect to Greenplum Database and a related environment variable named GPDB_CONNECT_TIMEOUT. You can set this environment variable to change the amount of time that GPSS waits to establish a connection to Greenplum Database as described in Running the Greenplum Streaming Server.

Authentication
  • After it encounters an SSL connection failure on the control channel, GPSS will attempt to initiate a non-SSL connection on the channel.
  • The gpss.json server configuration file now includes an Authentication property block. Use the configuration properties in this block to specify a user name and password for client authentication to the GPSS server. Refer to Configuring the Streaming Server for Client-to-Server Authentication for additional information about this new feature.
  • GPSS adds the -U/--username and -P/--password options to the gpsscli subcommands to specify the user name and password for client authentication to the GPSS server.
Kafka Data Source
  • GPSS now saves the topic:partition:offset for each badly-formatted Kafka message written to the error log; you can view this information when you run the SELECT * FROM gp_read_error_log('<exttbl>') command.
  • GPSS adds the --skip-explain flag to the gpsscli start subcommand to skip the explain SQL check step of its internal processing.
  • GPSS now supports loading from a single kafka topic into multiple Greenplum Database tables. Provide an OUTPUTS:TABLE (version 2) or targets:gpdb:tables:table (version 3 (Beta)) block for each table, and specify the properties that identify the data targeted to each.
  • GPSS introduces a new datatype named gp_json (Beta) to the dataflow extension. For additional information about using the gp_json data type, refer to About the JSON Format and Column Type documentation.
File and Kafka Data Sources
  • GPSS adds support for new CSV options for file and Kafka jobs. You can now specify the delimiter, quote, and null string values in the load configuration file. You can identify a list of columns whose values GPSS forces to be not null. You can also specify GPSS's behaviour when it encounters missing trailing fields in a row of data. New version 2 property names include DELIMITER, QUOTE, NULL_STRING, ESCAPE, FORCE_NOT_NULL, and FILL_MISSING_FIELDS. New version 3 property names include delimiter, quote, null_string, escape, force_not_null, and fill_missing_fields.
  • GPSS exposes new PREPARE_SQL and TEARDOWN_SQL (version 2) and prepare_statement and teardown_statement (version 3) load configuration file properties for Kafka and file data sources. You can use the properties to specify user-defined function or SQL commands for GPSS to run before executing a job, and/or at job completion.
version 3 (Beta) Configuration

GPSS 1.7.0 adds, changes, and relocates property keywords in the version 3 (Beta) configuration file format. Refer to the gpsscli-v3.yaml (Beta), gpkafka-v3.yaml (Beta), and filesource-v3.yaml (Beta) reference pages for the new keywords and locations.

New S3 Data Source (Beta)

GPSS 1.7.0 introduces Beta support for a new data source, S3. This data source does not read directly from S3, but rather uses the Greenplum Database s3 protocol and external tables to read from s3 and write to Greenplum in parallel. Refer to Loading from S3 into Greenplum (Beta) for more information about using this new feature, and s3source-v3.yaml (Beta) for the supported load configuration file properties.

New Commands and Options
  • GPSS adds the new gpsscli dryrun subcommand. When you invoke this command, GPSS performs a trial run of a Kafka or S3 job without actually writing to Greenplum Database. You can use the command to help diagnose load job errors as described in Diagnosing an Error with a Trial Load.
  • GPSS adds the -f/--force flag to the gpsscli remove subcommand to forcibly stop and remove a GPSS job(s).
Other Changes
  • GPSS adds new Submitted and Success statuses for batch (file, s3) jobs. GPSS 1.7.0 also changes the Stopped status to signify that a job was stopped by the user. Refer to the gpsscli status reference page for a description of GPSS job statuses.
  • GPSS 1.7.0 removes the Streaming Job API (Beta) documentation.

Resolved Issues

Greenplum Streaming Server 1.7.0 resolves these issues:

CVE-2021-44716
Updates the go library to version 1.17.6.
N/A
You can now specify an Avro schema file path for both the key and the value when you load Kafka data into Greenplum Database.
N/A
Resolves an issue where GPSS erroneously inserted a \n after parsing 76 characters of Avro data when the load configuration file specified bytes_to_base64: true.
32022
Resolves an issue where GPSS did not provide any way to execute SQL commands before GPSS initiates a job or after a GPSS job completes by exposing new properties in version 2 and version 3 (Beta) load configuration files ( PREPARE_SQL/TEARDOWN_SQL and prepare_statement/teardown_statement).
31886
Resolves an issue where GPSS returned an authentication error when SSL was disabled for the user (i.e. there was a hostnossl connection type entry configured for the user in the pg_hba.conf file). GPSS now attempts to initiate a non-SSL connection when it encounters an SSL connection failure on the control channel.

Release 1.6

Release 1.6.0

Release Date: May 28, 2021

Greenplum Streaming Server 1.6.0 adds new features, includes changes, and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.6.0.

New and Changed Features

Greenplum Streaming Server 1.6.0 includes these new and changed features:

  • GPSS adds the -c | --config flag/option to the gpss command to specify the JSON-formatted configuration file.
  • The gpsscli --version command now displays the version of the GPSS server in addition to displaying that of the client.
  • The gpss.json server configuration file now includes a KeepAlive property block. Use the configuration properties in this block to specify timeout options for the gRPC connection between the GPSS client and the GPSS server.
  • GPSS changes the format of front-end logs (messages written by commands to stdout) from CSV format to a more human-readable format. Related, GPSS adds a --csv-log option to the commands to write the front-end logs in CSV format. GPSS also adds a --color option to commands to enable the use of color in message display.
  • GPSS exposes a new load configuration property for Kafka data sources named IDLE_DURATION (version 2 configuration) and idle_duration_ms (version 3 configuration). Use this property to specify that GPSS use lazy load mode, waiting until data arrives before locking the target Greenplum Database table.
  • GPSS exposes a new load configuration property for Kafka data sources named SCHEMA_PATH_ON_GPDB (version 2 configuration) and schema_path_on_gpdb (version 3 configuration). Use this property to specify the path to the Avro .avsc file that contains the schema of the Kafka key or value data (but not both). This file must reside in the same location on all Greenplum Database segment hosts.
  • GPSS exposes a new load configuration property for Kafka data sources named FALLBACK_OFFSET (version 2 configuration) and fallback_offset (version 3 configuration). Use this property to specify that GPSS automatically handle Kafka message offset mismatches, and how.
  • GPSS exposes new load configuration properties for Kafka data sources to support access to an SSL-secured schema registry. Refer to Accessing an SSL-Secured Schema Registry for more information.
  • GPSS now supports acting as a high-level Kafka consumer when the Kafka client properties include a group.id setting.
  • GPSS exposes a new load configuration property for Kafka data sources named CONSISTENCY (version 2 configuration) and consistency (version 3 configuration). Use this property to specify how GPSS manages Kafka message offsets when it acts as a high-level consumer. Refer to Understanding Kafka Message Offset Management for more information.
  • GPSS 1.6.0 provides additional documentation about developing and using custom formatters with GPSS.

Beta Features

Greenplum Streaming Server 1.6.0 includes these new Beta features:

  • GPSS exposes a new load configuration property for Kafka data sources named RECOVER_FAILING_BATCH (version 2 configuration) and recover_failing_batch (version 3 configuration). Use this property in conjunction with SAVE_FAILING_BATCH to instruct GPSS to automatically reload the good data in the batch, and retain only the error data in the backup table.

    Note: Enabling this feature may have severe performance implications when any data in the Kafka topic generates an expression error.

    Note: This feature requires that GPSS has the Greenplum Database privileges to create a function.

  • GPSS adds a new extension named dataflow. This extension includes a new data type, gp_jsonb (available for Greenplum Database version 6.x only), and a new formatter, text_in. You must CREATE EXTENSION dataflow; in each database in which you choose to use these types and formatters. For additional information about the gp_jsonb data type, see About the JSON Format and Column Type.

Resolved Issues

Greenplum Streaming Server 1.6.0 resolves this issue:

31458
Resolves an issue where job progress information was available only via stdout. GPSS now supports consumer groups, which saves message offsets to the Kafka topic.
31396
Resolves an issue where the GPSS Ubuntu download package was missing certain dependent libraries. These libraries are now marked as required.
31359
Resolves an issue where GPSS could not restart a job that had been stopped for a long period of time. GPSS now supports a FALLBACK_OPTION load configuration property that instructs GPSS to automatically handle offset mismatches, and how to handle them.
31315
Resolves an issue where GPSS was unable to load data from Kafka when TLS-secured communication was required between the Kafka broker and the schema registry. GPSS now supports load configuration properties to specify the certificates and keys required for this communication.
31278
Resolves an issue where GPSS was unable to load Avro data when the schema was not embedded in the .avro file. GPSS now supports the SCHEMA_PATH_ON_GPDB load configuration property to specify the .avsc schema file.
31277
Resolves a request for a job timeout by supporting a new IDLE_DURATION load configuration property.
30723, 30711
Resolves an issue where GPSS failed to load JSON-format data that included \u0000 by creating a new Greenplum Database data type named gp_jsonb (Beta).

Release 1.5

Release 1.5.3

Release Date: April 15, 2021

Greenplum Streaming Server 1.5.3 resolves an issue.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.3.

Resolved Issues

Greenplum Streaming Server 1.5.3 resolves this issue:

31357
Resolves an issue where GPSS did not correctly handle CUSTOM_OPTION properties specified in a load configuration file. GPSS now supports using the NAME and PARAMSTR properties to specify a custom formatter user-defined function.

Release 1.5.2

Release Date: March 5, 2021

Greenplum Streaming Server 1.5.2 resolves several issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.2.

Changed Features

Greenplum Streaming Server 1.5.2 includes this change:

  • GPSS omits the end time in its output error hints. Resolved issue 31287 provides more information.

Resolved Issues

Greenplum Streaming Server 1.5.2 resolves these issues:

N/A
Resolves an issue where GPSS logged the message execInsert and err: nil because it did not check for an error before logging.
31287
Resolves an issue where GPSS did not always display the correct end time in the output error hint by removing the end time condition.
177153850
Resolves an issue where a GPSS query returned a syntax error from Greenplum Database because MATCH COLUMNS was empty. GPSS now requires and checks that this field includes at least one column when you submit a load job that specifies UPDATE or MERGE mode.
177133400
Resolves an issue where GPSS stopped a Kafka job unexpectedly and did not return an error when it encountered a batch that contained only a control message.
177077055
Resolves an issue where the --all option was incorrectly displayed in the help output of the gpsscli load command.
177077007
GPSS consumed a large mount of memory caching Kafka messages when it ran many concurrent jobs that read from multiple partitions. This issue is resolved; GPSS now specifies a less aggressive default value for the librdkafka queued.max.messages.kbytes property when the user does not explicitly configure it.
177014072
Resolves an issue where GPSS incorrectly returned the error gpkafka load show job progress fail, err: job progress is nil when it failed to start a Kafka job. GPSS now returns the more meaningful error gpkafka load start job failed in this situation.
176842005
Resolves an issue where GPSS submitted a job with the wrong name when a gpsscli load *.yaml command operated on more than one load job.

Release 1.5.1

Release Date: February 5, 2021

Greenplum Streaming Server 1.5.1 includes changes and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.1.

Changed Features

Greenplum Streaming Server 1.5.1 includes these changes:

  • Version 1.5.1 is the first standalone GPSS release that includes a .deb installation package for Ubuntu 18.04 LTS systems.
  • The gpsscli subcommands now consistently return zero (0) on success and non-zero when GPSS encounters an error.
  • GPSS improves the error message that it returns when it encounters a mismatched extension or formatter version.
  • GPSS bundles a patched version of the libserdes library to fix an issue that can arise when the SCHEMA_REGISTRY_ADDRS property value includes a trailing slash. See resolved issue 31137.
  • GPSS now registers the gp_read_persistent_error_log() function when you register the GPSS extension in a database. Resolved issue 31201 provides more information.
  • The progress log file name format has changed; the new format retains the complete job name rather than truncating it to 8 characters.

Resolved Issues

Greenplum Streaming Server 1.5.1 resolves these issues:

31201
Resolves an issue where GPSS returned a permission denied for language c error when it attempted, at runtime, to register an internal function as the Greenplum Database user that started GPSS, and this user did not have the privileges required to create such functions. GPSS now registers this internal function when you create the GPSS extension in a database.
31137
Due to a bug in the dependent library libserdes, GPSS did not correctly handle a trailing slash when specified in the first address in a list of SCHEMA_REGISTRY_ADDRs. This issue is resolved; GPSS 1.5.1 bundles a patched version of the libserdes library that can handle such addresses.
176136800
Resolves an issue where GPSS returned an error when it interpreted and parsed the SAVE_FAILING_BATCH property and value in a (deprecated) version 1 load configuration file, when version 1 of the file does not support this property. GPSS now displays a warning message when it encounters a property that is not supported in a version 1 configuration file.
176068963
GPSS reported an offset gap when it read Kafka messages using the read_committed isolation level, the job was restarted, and the topic retention period had expired. This issue is resolved; GPSS now records control message offsets.
175867685
Resolves an issue where the -i | --edit-in-place option was displayed in the help output of subcommands that did not support the option. GPSS now correctly displays the option only for the gpsscli convert command.
175867670
Resolves an issue where the gpsscli subcommands did not return consistent values. gpsscli now returns zero ( 0) on success and non-zero on failure.
n/a
Resolves an issue where GPSS did not correctly validate a filesource.yaml load configuration file before submitting the job.

Release 1.5.0

Release Date: December 2, 2020

Greenplum Streaming Server 1.5.0 adds new features, includes changes, and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.0.

New and Changed Features

Greenplum Streaming Server 1.5.0 includes these new and changed features:

  • The load configuration file ERROR_LIMIT property, previously mandatory, is now optional. The default value for the property is zero (0); GPSS disables error logging and stops a load operation upon encountering the first error.
  • GPSS includes out-of-the-box Prometheus integration, enabling you to use the tool to monitor your gpss server instances. Refer to Monitoring GPSS Service Instances for more information on enabling and using this integration.
  • New configuration properties in the gpss.json server configuration file include:
    • The DebugPort configuration property. You can use this property to identify the port number on which GPSS starts a debug server for the gpss server instance. Refer to Pulling Information from the Debug Server for more information.
    • The MinTLSVersion configuration property. You use this property to specify the minimum TLS version that GPSS requests on encrypted connections.
    • The Logging configuration property block. You can use these configuration properties to set the front-end and back-end logging levels for GPSS commands. See About GPSS Logging.
    • The JobStore configuration property block. Use the configuration property in this block to specify a local directory in which GPSS maintains job status information. This allows a GPSS server instance to (re)start any in-progress jobs when the instance first starts up. See About GPSS Job Management.
    • The Monitor configuration property block. You use this property to enable GPSS Prometheus integration.
  • GPSS no longer generates and assigns a unique identifier as the job name when you invoke the gpsscli submit or gpsscli load commands without specifying the --name option. GPSS now assigns the base name of the load configuration file as the default job name.
  • GPSS exposes a new load configuration property for Kafka data sources named PARTITIONS. Use this property to specify the specific partition numbers from which you want GPSS to load Kafka messages from the topic. (This property is not supported for the Kafka version 1 configuration file format.)
  • GPSS supports specifying template parameters for load configuration file properties. When you specify the {{template\_var}} value syntax in the file, GPSS substitutes template\_var with a value that you specify via the -p | --property template\_var=value option when you submit or load the job.
  • GPSS supports SSL encryption on the control channel between GPSS and the Greenplum Database master, and ships with an updated pq library to support this feature. See Configuring SSL for the Control Channel for configuration information.
  • The gpsscli start, stop, and remove subcommands now support a --all flag. When you specify this flag, GPSS: starts all submitted jobs, stops all running jobs, or removes all stopped jobs.
  • The gpsscli submit and gpsscli load commands can now operate on one or more YAML load configuration files.
  • GPSS exposes the new SAVE_FAILING_BATCH load configuration property. When you set this property to true, GPSS also writes loading data to a backup table. When GPSS encounters expression evaluation errors, this backup table aids in the recovery of the load operation. See Redirecting Data to a Backup Table when GPSS Encounters Expression Evaluation Errors for additional information. (This property is not supported for the Kafka version 1 configuration file format.)
  • GPSS 1.5.0 introduces a new Beta feature, the version 3 load configuration file format. This format introduces a new YAML organization and keywords, and more closely aligns with the GPSS gRPC Streaming Job API. Refer to gpsscli-v3.yaml (Beta) for the version 3 syntax.
  • GPSS 1.5.0 supports the persisent error log feature of Greenplum Database when you are running against Greenplum version 5.26+ or 6.6+. For more details about the persisent error log, refer to the CREATE EXTERNAL TABLE SQL reference page in the Greenplum Database documentation.

Resolved Issues

Greenplum Streaming Server 1.5.0 resolves these issues:

30332
In some cases when GPSS reused external tables for jobs, it did not update the external table that it uses internally for load operations when the target Greenplum table definition was modified.
171299427
Resolves an issue where GPSS was unable to cancel a batch write operation when it encountered an error, and left a lingering session.

Release 1.4

Release 1.4.3

Release Date: December 17, 2021

Greenplum Streaming Server 1.4.3 resolves an issues and includes related changes.

Note: You may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.3.

Changes

Greenplum Streaming Server 1.4.3 includes this change:

  • After it encounters an SSL connection failure on the control channel, GPSS will attempt to initiate a non-SSL connection on the channel.

Resolved Issues

Greenplum Streaming Server 1.4.3 resolves this issue:

31886
Resolves an issue where, after upgrade to version 1.4.2, GPSS returned an authentication error when SSL was disabled for the user (i.e. there was a hostnossl connection type entry configured for the user in the pg_hba.conf file). GPSS now attempts to initiate a non-SSL connection when it encounters an SSL connection failure on the control channel.

Release 1.4.2

Release Date: November 2, 2020

Greenplum Streaming Server 1.4.2 resolves issues and includes related changes.

Note: You may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.2.

Changes

Greenplum Streaming Server 1.4.2 includes these changes:

  • GPSS now specifies the SSL prefer mode on the control channel to the Greenplum Database master host. GPSS previously explicitly disabled SSL on the channel.

Resolved Issues

Greenplum Streaming Server 1.4.2 resolves these issues:

n/a
Resolves an issue where GPSS recorded an incorrect count in the progress log file when the messages it received included offset gaps, such as with transaction control messages.
30776, 174685715
Resolves an issue where gpsscli stop would not respond (hang).
174685711
Resolves an issue where GPSS failed to load a large (>2GB) file. GPSS now transfers a file in multiple, smaller chunks when loading to Greenplum.
174984151
GPSS sent an HTTP request to the Avro schema registry service on every segment on every commit; in some cases, this created and destroyed a large number of TCP connections in the process. GPSS resolves this issue by reading the schema a single time per session (as long as the schema remains unchanged).

Release 1.4.1

Release Date: August 7, 2020

Greenplum Streaming Server 1.4.1 resolves issues and includes related changes.

Note: You may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.1.

Changes

Greenplum Streaming Server 1.4.1 includes these changes:

  • GPSS bundles a patched version of the librdkafka library to fix an issue that can arise when the Kafka topic that GPSS loads includes messages with discontinuous offsets. See resolved issue 30797, 30776.
  • GPSS now always tracks Kafka job progress in a separate, CSV-format log file. See resolved issue 173603095 and Checking the Progress of a Load Operation.
  • GPSS 1.4.1 changes the format and content of the server and client log file messsages. The old log file format was delimited text, which could not be parsed when the text contained a newline. The log files are now CSV-format and include a header row. See resolved issue 173603029 and Examining GPSS Log Files.

Resolved Issues

Greenplum Streaming Server 1.4.1 resolves these issues:

n/a
When the schema registry service was down, GPSS appeared to hang during a Kafka load operation because it tried to access the registry multiple times for each Kafka message. This issue is resolved; GPSS now reports an error and stops retrying immediately when it detects that the schema registry is down.
30797, 30776
Due to a bug in the dependent library librdkafka, a load job from Kafka would hang when there were aborted Kafka transactions in the topic, or when the messages were deleted before GPSS was able to consume them. This issue is resolved. GPSS 1.4.1 bundles a patched version of the librdkafka library and can now handle message offsets that are not continuous.
30760
Certain merge/update operations failed with the error Cannot parallelize an UPDATE statement that updates the distribution columns because GPSS versions 1.3.5 through 1.4.0 used the Greenplum Postgres Planner by default, which does not support updating columns that are specified as the distribution key. GPSS 1.4.1 resolves this issue by not explicitly specifying a query planner/optimizer, but rather using the default that is configured in the Greenplum cluster.
173653147
In some cases, gpsscli stop would hang when you invoked it to stop a Kafka load job that GPSS had previously retried. This issue is resolved.
173637940
The GPSS utilities distributed in the Greenplum Database 6.8.x and 6.9.0 Client and Loader Tools packages were missing the dependent library libserdes.so. This issue is resolved, the package now includes this library.
173637900
The GPSS 1.4.1 Batch Data gRPC API fixes a parallel loading regression that manifested itself when the gpss.json server configuration file included the (default) ReuseTables: true property setting.
173603095
Because GPSS tracked job progress only during gpsscli progress command execution, the progress information for jobs for which you did not run the command was lost. This issue is resolved. GPSS now always tracks job progress in a separate, CSV-format log file (with header row) named progress_*jobname*_*jobid*_*date*.log.
173603029
GPSS log file messages with embedded newlines could not be parsed. This issue is resolved; GPSS changes the client and server log file format to CSV (with header row).

Release 1.4.0

Release Date: June 26, 2020

Greenplum Streaming Server 1.4.0 adds new features, includes changes, and resolves issues.

Note: You may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.0.

New and Changed Features

Greenplum Streaming Server 1.4.0 includes these new and changed features:

  • GPSS supports loading from a file data source. You can now load data in Avro, binary, CSV, and JSON files into Greenplum Database. See Loading File Data into Greenplum for more information.
  • GPSS defines a new META load configuration property block. You can load the properties in this single JSON-format column into the target table, or use the properties in update or merge criteria for a load operation. The available META properties are data-source specific:
    • The Kafka data source exposes the following META properties: topic (text), partition (int), and offset (bigint).
    • The file data source exposes a single META property named filename (text).
  • GPSS supports Avro data containing binary fields.
  • GPSS implements a faster update in merge mode for large datasets when the load configuration specifies no UPDATE_COLUMNS. In this scenario, GPSS updates all MAPPING columns in each row.
  • You can use GPSS to load data into a Greenplum Database cluster that utilizes the PgBouncer connection pooler.
  • The CentOS 7.x GPSS packages for Greenplum 6 support Oracle Enterprise Linux 7.
  • GPSS uses a single thread and socket per partition by sharing a Kafka consumer between workers.
  • GPSS bundles librdkafka version 1.4.2. This version provides support for controlling how GPSS reads Kafka messages written transactionally via the isolation.level property.
  • GPSS 1.4 introduces the new Streaming Job API (Beta), a gRPC API that allows you to manage and submit streaming jobs to the server.

Resolved Issues

Greenplum Streaming Server 1.4.0 resolves these issues:

172142789
The GPSS Batch Data gRPC API fixes inaccurate TransferStats success and error counts for data load operations initiated in update mode.

Deprecated Features

Deprecated features may be removed in a future minor release of the Greenplum Streaming Server. GPSS 1.4.x deprecates:

  • The gpkafka Version 1 configuration file format (deprecated since 1.4.0).
  • The gpkafka.yaml (versions 1 and 2) POLL block, including the POLL:BATCHSIZE and POLL:TIMEOUT properties (deprecated since 1.3.5).

Removed Features

Deprecated features may be removed in a future minor release of the Greenplum Streaming Server. GPSS 1.4.x removes:

  • The gpsscli history and gpkafka history commands (deprecated in 1.3.5).

Release 1.3

Release 1.3.1

Release Date: December 19, 2019

Greenplum Streaming Server version 1.3.1 is the first standalone release of GPSS. GPSS 1.3.1 is also included in the Greenplum Database version 5.24 and 6.2 distributions.

Greenplum Streaming Server 1.3.1 is a maintenance release that resolves several issues.

Resolved Issues

Greenplum Streaming Server 1.3.1 resolves these issues:

169806983
In some cases, reading from Kafka using the default MINIMAL_INTERVAL (0 seconds) caused GPSS to consume a large amount of CPU resources, even when no new messages existed in the Kafka topic. This issue is resolved.
169807372, 169831558
GPSS 1.3.0 did not recognize internal history tables that were created with GPSS 1.2.6 and earlier. In some cases, this caused GPSS to load duplicate messages into Greenplum Database. This issue is resolved.

Release 1.3.0

Release Date: November 1, 2019

Greenplum Streaming Server version 1.3.0 is included in the Greenplum Database version 5.23 and 6.1 distributions.

Greenplum Streaming Server 1.3.0 is a minor release that includes new and changed features and resolves several issues.

New and Changed Features

Greenplum Streaming Server 1.3.0 includes these new and changed features:

  • GPSS now supports log rotation, utilizing a mechanism that you can easily integrate with the Linux logrotate system. See Managing GPSS Log Files for more information.
  • GPSS has added the new INPUT:FILTER load configuration property. This property enables you to specify a filter that GPSS applies to Kafka input data before loading it into Greenplum Database.
  • GPSS displays job progress by partition when you provide the --partition flag to the gpsscli progress command.
  • GPSS enables you to load Kafka data that was emitted since a specific timestamp into Greenplum Database. To use this feature, you provide the --force-reset-timestamp flag when you run gpsscli load, gpsscli start, or gpkafka load.
  • GPSS now supports update and merge operations on data stored in a Greenplum Database table. The load configuration file accepts MODE, MATCH_COLUMNS, UPDATE_COLUMNS, and UPDATE_CONDITION property values to direct these operations. Example: Merging Data from Kafka into Greenplum Using the Streaming Server provides an example merge scenario.
  • GPSS supports Kerberos authentication to both Kafka and Greenplum Database.
  • GPSS supports SSL encryption between GPSS and Kafka.
  • GPSS supports SSL encryption on the data channel between GPSS and Greenplum Database.

Resolved Issues

Greenplum Streaming Server 1.3.0 is a minor release that resolves these issues:

168130147
In some situations, specifying the --force-reset-earliest flag when loading data failed to read from the correct offset. This problem has been fixed. (Using the --force-reset-*xxx* flags outside of an offset mismatch scenario is discouraged.)
167997441
GPSS did not save error data to the external table error log when it encountered an incorrectly-formatted JSON or Avro message. This issue has been fixed; invoking gp_read_error_log() on the external table now displays the offending data.
164823612
GPSS incorrectly treated Kafka jobs that specified the same Kafka topic and Greenplum output schema name and output table name, but different database names, as the same job. This issue has been resolved. GPSS now includes the Greenplum database name when constructing a job definition.

Beta Features

Greenplum Streaming Server 1.x includes these Beta features:

  • GPSS adds support for a RabbitMQ data source (introduced in 1.8.0).

  • GPSS adds support for an s3 data source (introduced in 1.7.0).

  • GPSS adds a new datatype named gp_json to the dataflow extension (introduced in 1.7.0).

  • GPSS exposes a new load configuration property for Kafka data sources named RECOVER_FAILING_BATCH (version 2 configuration) and recover_failing_batch (version 3 configuration). Use this property in conjunction with SAVE_FAILING_BATCH to instruct GPSS to automatically reload the good data in the batch, and retain only the error data in the backup table.

    Note: Enabling this feature may have severe performance implications when any data in the Kafka topic generates an expression error.

    Note: This feature requires that GPSS has the Greenplum Database privileges to create a function.

    (Introduced in 1.6.0.)

  • GPSS adds a new extension named dataflow. This extension includes a new data type, gp_jsonb (available for Greenplum Database version 6.x only), and a new formatter, text_in. (Introduced in 1.6.0).

  • GPSS specifies a new version 3 load configuration file format. This format introduces a new YAML organization and keywords. (Introduced in 1.5.0.)

Deprecated Features

Deprecated features may be removed in a future release of the Greenplum Streaming Server. GPSS 1.x deprecates:

  • Specifying the gpss.json configuration file to the gpss command standalone (deprecated since 1.6.0). Use the -c | --config option when you specify the file.
  • The gpkafka Version 1 configuration file format (deprecated since 1.4.0).
  • The gpkafka.yaml (versions 1 and 2) POLL block, including the POLL:BATCHSIZE and POLL:TIMEOUT properties (deprecated since 1.3.5).

Known Issues and Limitations

Greenplum Streaming Server 1.x has these known issues:

31998
In some cases, an EXPLAIN INSERT command internally launched by GPSS on a Kafka job may take a long time to complete. You can work around this issue by specifying the --skip-explain flag to the gpsscli start command when you start the job.
N/A
The SAVE_FAILING_BATCH and PARTITIONS configuration properties are not supported when you use the version 1 configuration file format to load data.
N/A
The Greenplum Streaming Server may consume a very large amount of system memory when you use it to load a huge (hundreds of GBs) file, in some cases causing the Linux kernel to kill the GPSS server process. Do not use GPSS to load very large files; instead, use gpfdist.
30503

Due to limitations in the Greenplum Database external table framework, GPSS cannot log a data type conversion error that it encounters while evaluating a mapping expression. For example, if you use the expression EXPRESSION: (jdata->>'id')::int in your load configuration file, and the content of jdata->>'id' is a string that includes non-integer characters, the evaluation fails and GPSS terminates the load job. GPSS cannot log and propagate the error back to the user via gp_read_error_log().

Workarounds for Kafka:

  • Set the SAVE_FAILING_BATCH load configuration property to true, and then manually load any data batch that included expression errors.
  • Skip the bad Kafka message by specifying a --force--reset-*xxx* flag on the job start or load command.
  • Correct the message and publish it to another Kafka topic before loading it into Greenplum Database.
check-circle-line exclamation-circle-line close-line
Scroll to top icon