This document contains pertinent release information about the VMware Tanzu Greenplum Streaming Server version 1.7 release. The Greenplum Streaming Server (GPSS) is included in certain Tanzu Greenplum 5.x and 6.x distributions. GPSS for Red Hat/CentOS 6 and 7, Red Hat 8, Photon 3, and Ubuntu 18.04 is also updated and distributed independently of Greenplum Database. You may need to download and install the GPSS distribution from VMware Tanzu Network to obtain the most recent version of this component.

Supported Platforms

Tanzu Greenplum Streaming Server 1.7.x is compatible with these Tanzu Greenplum versions:

  • Tanzu Greenplum 5.17.0 and later
  • Tanzu Greenplum 6.0.0 and later

Release 1.7.2

Release Date: April 21, 2022

Greenplum Streaming Server 1.7.2 includes changes and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.2.

Changed Features

Greenplum Streaming Server 1.7.2 includes these changes:

  • GPSS adds support for specifying backslash escape sequences when you set the following CSV options: delimiter, quote, and escape. GPSS supports the standard backslash escape sequences for backspace, form feed, newline, carriage return, and tab, as well as escape sequences that you specify in hexadecimal format (prefaced with \x). Refer to Backslash Escape Sequences in the PostgreSQL documentation for more information.
  • To resolve issue 32168, GPSS version 1.7.2 introduces support for loading files or messages that contain one JSON record per line into Greenplum Database. To use this new feature, you must specify FORMAT: jsonl in version 2 format load configuration files, or specify json format with is_jsonl: true in version 3 (Beta) format load configuration files.

Resolved Issues

Greenplum Streaming Server 1.7.2 resolves these issues:

32168
Resolves an issue where GPSS did not support loading multi-line JSON files into Greenplum Database. GPSS 1.7.2 introduces support for loading JSON message or file data that contains a single JSON record per line.
N/A
Resolves an issue where GPSS did not support escape sequences that were specified in the CSV delimiter, quote, and escape options. GPSS now supports standard and hexadecimal-format backslash escape sequences.

Release 1.7.1

Release Date: March 31, 2022

Greenplum Streaming Server 1.7.1 resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.1.

Resolved Issues

Greenplum Streaming Server 1.7.1 resolves these issues:

32105
Resolves an issue where GPSS incorrectly added an offset based on the Greenplum Database local time zone to timestamp (without timezone) types that it loaded into a Greenplum Database table.
181293923
In some cases, GPSS returned the error pq: missing data for column *name* when loading a file containing CSV-format data. This issue is resolved; GPSS no longer automatically adds a newline when one already exists at the end of the file.

Release 1.7.0

Release Date: March 18, 2022

Greenplum Streaming Server 1.7.0 adds new features, includes changes, and resolves issues.

Note: You are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.0.

New and Changed Features

Greenplum Streaming Server 1.7.0 includes these new and changed features:

OS and Platforms

  • GPSS introduces support for Red Hat Enterprise Linux 8 and Photon 3 for Greenplum Database 6, and now provides download packages for these operating system versions on VMware Tanzu Network.
  • GPSS updates the version of go that it uses to build the CLI tools to version 1.17.6 to mitigate CVE-2021-44716.

GPSS Configuration

GPSS introduces a default timeout of 10 seconds for a gpss service instance to connect to Greenplum Database and a related environment variable named GPDB_CONNECT_TIMEOUT. You can set this environment variable to change the amount of time that GPSS waits to establish a connection to Greenplum Database as described in Running the Greenplum Streaming Server.

Authentication

  • After it encounters an SSL connection failure on the control channel, GPSS will attempt to initiate a non-SSL connection on the channel.
  • The gpss.json server configuration file now includes an Authentication property block. Use the configuration properties in this block to specify a user name and password for client authentication to the GPSS server. Refer to Configuring the Streaming Server for Client-to-Server Authentication for additional information about this new feature.
  • GPSS adds the -U/--username and -P/--password options to the gpsscli subcommands to specify the user name and password for client authentication to the GPSS server.

Kafka Data Source

  • GPSS now saves the topic:partition:offset for each badly-formatted Kafka message written to the error log; you can view this information when you run the SELECT * FROM gp_read_error_log('<exttbl>') command.
  • GPSS adds the --skip-explain flag to the gpsscli start subcommand to skip the explain SQL check step of its internal processing.
  • GPSS now supports loading from a single kafka topic into multiple Greenplum Database tables. Provide an OUTPUTS:TABLE (version 2) or targets:gpdb:tables:table (version 3 (Beta)) block for each table, and specify the properties that identify the data targeted to each.
  • GPSS introduces a new datatype named gp_json (Beta) to the dataflow extension. For additional information about using the gp_json data type, refer to About the JSON Format and Column Type documentation.

File and Kafka Data Sources

  • GPSS adds support for new CSV options for file and Kafka jobs. You can now specify the delimiter, quote, and null string values in the load configuration file. You can identify a list of columns whose values GPSS forces to be not null. You can also specify GPSS's behaviour when it encounters missing trailing fields in a row of data. New version 2 property names include DELIMITER, QUOTE, NULL_STRING, ESCAPE, FORCE_NOT_NULL, and FILL_MISSING_FIELDS. New version 3 property names include delimiter, quote, null_string, escape, force_not_null, and fill_missing_fields.
  • GPSS exposes new PREPARE_SQL and TEARDOWN_SQL (version 2) and prepare_statement and teardown_statement (version 3) load configuration file properties for Kafka and file data sources. You can use the properties to specify user-defined function or SQL commands for GPSS to run before executing a job, and/or at job completion.

version 3 (Beta) Configuration

GPSS 1.7.0 adds, changes, and relocates property keywords in the version 3 (Beta) configuration file format. Refer to the gpsscli-v3.yaml (Beta), gpkafka-v3.yaml (Beta), and filesource-v3.yaml (Beta) reference pages for the new keywords and locations.

New S3 Data Source (Beta)

GPSS 1.7.0 introduces Beta support for a new data source, S3. This data source does not read directly from S3, but rather uses the Greenplum Database s3 protocol and external tables to read from s3 and write to Greenplum in parallel. Refer to Loading from S3 into Greenplum (Beta) for more information about using this new feature, and s3source-v3.yaml (Beta) for the supported load configuration file properties.

New Commands and Options

  • GPSS adds the new gpsscli dryrun subcommand. When you invoke this command, GPSS performs a trial run of a Kafka or S3 job without actually writing to Greenplum Database. You can use the command to help diagnose load job errors as described in Diagnosing an Error with a Trial Load.
  • GPSS adds the -f/--force flag to the gpsscli remove subcommand to forcibly stop and remove a GPSS job(s).

Other Changes

  • GPSS adds new Submitted and Success statuses for batch (file, s3) jobs. GPSS 1.7.0 also changes the Stopped status to signify that a job was stopped by the user. Refer to the gpsscli status reference page for a description of GPSS job statuses.
  • GPSS 1.7.0 removes the Streaming Job API (Beta) documentation.

Resolved Issues

Greenplum Streaming Server 1.7.0 resolves these issues:

CVE-2021-44716
Updates the go library to version 1.17.6.
N/A
You can now specify an Avro schema file path for both the key and the value when you load Kafka data into Greenplum Database.
N/A
Resolves an issue where GPSS erroneously inserted a \n after parsing 76 characters of Avro data when the load configuration file specified bytes_to_base64: true.
32022
Resolves an issue where GPSS did not provide any way to execute SQL commands before GPSS initiates a job or after a GPSS job completes by exposing new properties in version 2 and version 3 (Beta) load configuration files ( PREPARE_SQL/TEARDOWN_SQL and prepare_statement/teardown_statement).
31886
Resolves an issue where GPSS returned an authentication error when SSL was disabled for the user (i.e. there was a hostnossl connection type entry configured for the user in the pg_hba.conf file). GPSS now attempts to initiate a non-SSL connection when it encounters an SSL connection failure on the control channel.

Beta Features

Greenplum Streaming Server 1.x includes these Beta features:

  • GPSS adds support for an s3 data source (introduced in 1.7.0).

  • GPSS adds a new datatype named gp_json to the dataflow extension (introduced in 1.7.0).

  • GPSS exposes a new load configuration property for Kafka data sources named RECOVER_FAILING_BATCH (version 2 configuration) and recover_failing_batch (version 3 configuration). Use this property in conjunction with SAVE_FAILING_BATCH to instruct GPSS to automatically reload the good data in the batch, and retain only the error data in the backup table.

    Note: Enabling this feature may have severe performance implications when any data in the Kafka topic generates an expression error.

    Note: This feature requires that GPSS has the Greenplum Database privileges to create a function.

    (Introduced in 1.6.0.)

  • GPSS adds a new extension named dataflow. This extension includes a new data type, gp_jsonb (available for Greenplum Database version 6.x only), and a new formatter, text_in. (Introduced in 1.6.0).

  • GPSS specifies a new version 3 load configuration file format. This format introduces a new YAML organization and keywords. (Introduced in 1.5.0.)

Deprecated Features

Deprecated features may be removed in a future release of the Greenplum Streaming Server. GPSS 1.x deprecates:

  • Specifying the gpss.json configuration file to the gpss command standalone (deprecated since 1.6.0). Use the -c | --config option when you specify the file.
  • The gpkafka Version 1 configuration file format (deprecated since 1.4.0).
  • The gpkafka.yaml (versions 1 and 2) POLL block, including the POLL:BATCHSIZE and POLL:TIMEOUT properties (deprecated since 1.3.5).

Known Issues and Limitations

Greenplum Streaming Server 1.7.x has these known issues:

31998
In some cases, an EXPLAIN INSERT command internally launched by GPSS on a Kafka job may take a long time to complete. You can work around this issue by specifying the --skip-explain flag to the gpsscli start command when you start the job.
N/A
The SAVE_FAILING_BATCH and PARTITIONS configuration properties are not supported when you use the version 1 configuration file format to load data.
N/A
The Greenplum Streaming Server may consume a very large amount of system memory when you use it to load a huge (hundreds of GBs) file, in some cases causing the Linux kernel to kill the GPSS server process. Do not use GPSS to load very large files; instead, use gpfdist.
30503

Due to limitations in the Greenplum Database external table framework, GPSS cannot log a data type conversion error that it encounters while evaluating a mapping expression. For example, if you use the expression EXPRESSION: (jdata->>'id')::int in your load configuration file, and the content of jdata->>'id' is a string that includes non-integer characters, the evaluation fails and GPSS terminates the load job. GPSS cannot log and propagate the error back to the user via gp_read_error_log().

Workarounds for Kafka:

  • Set the SAVE_FAILING_BATCH load configuration property to true, and then manually load any data batch that included expression errors.
  • Skip the bad Kafka message by specifying a --force--reset-*xxx* flag on the job start or load command.
  • Correct the message and publish it to another Kafka topic before loading it into Greenplum Database.
check-circle-line exclamation-circle-line close-line
Scroll to top icon