This document contains pertinent release information about the VMware Greenplum Streaming Server version 1.x releases. The Greenplum Streaming Server (GPSS) is included in certain VMware Greenplum 5.x, 6.x, and 7.x distributions. GPSS is also updated and distributed independently of VMware Greenplum. You may need to download and install the GPSS distribution from Broadcom Support Portal to obtain the most recent version of this component.
VMware Greenplum Streaming Server 1.x is compatible with these Operating System and VMware Greenplum versions:
GPSS Version | OS Version | VMware Greenplum Version |
---|---|---|
all | RHEL 6.x, RHEL 7.x, CentOS 6.x, CentOS 7.x | 5.17.0+, 6.x |
1.6.0+ | Ubuntu 18.04 LTS | 6.x |
1.7.0+ | OEL 7.x, Photon 3, RHEL 8.x | 6.x |
1.10.3+ | RHEL 8.7+, Rocky Linux 8.7+, OEL 8.7+ using Red Hat Compatible Kernel (RHCK) | 7.x |
1.10.4+ | RHEL 9, Rocky Linux 9, OEL 9 using Red Hat Compatible Kernel (RHCK) | 6.x, 7.x |
NoteVMware Greenplum Streaming Server version 1.10.x is the last version that supports VMware Greenplum 5.x.
Release Date: November 1, 2023
Greenplum Streaming Server 1.10.4 includes changes and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.4.
Greenplum Streaming Server 1.10.4 includes these changes:
Greenplum Streaming Server 1.10.4 resolves this issue:
retry job <jobname> is disabled, stop schedule
and exiting a job when a primary Greenplum Database segment went down. GPSS now explicitly retains the retry configuration for manually stopped jobs, enabling it to better tolerate a segment failure and mirror switch over.
Release Date: September 21, 2023
Greenplum Streaming Server 1.10.3 includes changes and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.3.
Greenplum Streaming Server 1.10.3 includes these changes:
start job
. The job status log message is prefaced with job finished
.Greenplum Streaming Server 1.10.3 resolves these issues:
Resource temporarily unavailable
error due to a resource leak that occurred when it repeatedly retried a Kafka job that consumed illegal JSON. GPSS now ensures that it releases all connections to Kafka when it detects an offset gap.
Release Date: July 27, 2023
Greenplum Streaming Server 1.10.2 resolves an issue.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.2.
Greenplum Streaming Server 1.10.2 resolves this issue:
value out of range
error when the object identifier of the target Greenplum Database table was larger than 2^32. GPSS now checks for the existence of the target table rather than attempting to access the table's object identifier.
Release Date: June 9, 2023
Greenplum Streaming Server 1.10.1 includes changes and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.1.
Greenplum Streaming Server 1.10.1 includes these changes:
gpss-<jobname>_<timestamp>.log
to gpss_<jobname>_<timestamp>.log
.SERVER
and server
(version 3 (Beta)) load configuration file properties.Greenplum Streaming Server 1.10.1 resolves these issues:
Release Date: May 15, 2023
Greenplum Streaming Server 1.10.0 adds new features and includes changes.
NoteThis version of the VMware Greenplum Streaming Server documentation replaces the term master with the term coordinator.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.10.0.
Greenplum Streaming Server 1.10.0 includes these new and changed features:
GPSS updates the go
library dependency to version 1.19.1.
GPSS introduces support for TLS encryption to RabbitMQ. Refer to Configuring gpss for TLS-Encrypted Communications with RabbitMQ for more information.
GPSS v1.10.0 includes these logging-related changes and new features:
Logging:Rotate
property in the gpss.json server configuration file. See Configuring Automatic Server Log File Rotation for more information about this new feature.Logging:SplitByJob
property in the gpss.json server configuration file. Refer to Configuring Per-Run Server Log Files for more information.The GPSS gRPC Batch Data API exposes a new ConnectionRequest
message field named SessionTimeout
that allows the developer to specify the maximum amount of idle time before GPSS releases a connection to Greenplum Database. If you choose to make use of this feature in your GPSS client application, upgrade actions are required as described in Upgrading the Streaming Server.
Release Date: March 10, 2023
Greenplum Streaming Server 1.9.0 adds new features, includes changes, and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.9.0.
Greenplum Streaming Server 1.9.0 includes these new and changed features:
TEARDOWN_SQL
(version 2) or teardown_statement
(version 3 (Beta)) on both job success and failure. The functions/commands were previously invoked only when the job was successful.--property <template_var>=<value>
option. This allows you to use property template variables in the load configuration file that you provide to the command.--name <jobname>
option to the gpsscli dryrun
command to name the dry run job.ENCODING
(version 2) / encoding
(version 3 (Beta) property to the load configuration file that allows you to specify the character set encoding for source data that is of the csv
, custom
, delimited
, or json
formats.FILTER
(version 2) / filter
(version 3 (Beta)) property to the load configuration file that allows you to specify an output filter for a job. An output filter may be useful when you want to write different data to multiple Greenplum Database output tables.ALERT
(version 2) / alert
(version 3 (Beta)) property block to the load configuration file that allows you to register for a job stopped notification, specifying a command that GPSS will run when a job is stopped.TRANSFORMER
(version 2) / transformer
(version 3 (Beta)) property block to the load configuration file that allows you to specify input and/or output transform functions for the data. An input transformer is a go
plugin, an output transformer is a user-defined SQL function (UDF). GPSS supports specifying transforms only when loading from Kafka or RabbitMQ data sources.strong
consistency for streams. Refer to Understanding RabbitMQ Message Offset Management for more information about how GPSS manages RabbitMQ offsets and message consistency.Greenplum Streaming Server 1.9.0 resolves these issues:
SELECT VERSION()
queries consumed connection resources.
Release Date: December 21, 2022
Greenplum Streaming Server 1.8.1 includes changes and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.8.1.
Greenplum Streaming Server 1.8.1 includes these changes:
S3
load job with the s3ext
prefix. The prefix was previously S3ext
.Greenplum Streaming Server 1.8.1 resolves these issues:
pq: password authentication failed for user
when a load job specified no password because it did not clear the configuration of the previous job.
Release Date: September 9, 2022
Greenplum Streaming Server 1.8.0 adds new features, includes changes, and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.8.0.
Greenplum Streaming Server 1.8.0 includes these new and changed features:
The gpss.json
server configuration file now includes a Gpfdist:Certificate:DBClientShared
property. Use this boolean property to instruct GPSS to reuse the Gpfdist SSL certificate for the control channel (client) connection to Greenplum Database. Configuring SSL for the Control Channel provides the relevant configuration information.
ReuseTables
is set to false
, GPSS now creates each job's external table using the job name rather than a hash. This enables you to more easily track external tables per-job. About External Table Naming and Lifecycle describes how GPSS names external tables, and also provides information about their lifecycle.RUNNING_DURATION
, AUTO_STOP_RESTART_INTERVAL
, MAX_RESTART_TIMES
, and QUIT_AT_EOF_AFTER
(version 2) or running_duration
, auto_stop_restart_interval
, max_restart_times
, and quit_at_eof_after
(version 3 (Beta)) options in the SCHEDULE/schedule
block of the load configuration file.delimited
data format to support setting quote and escape characters and an end-of-line prefix string when you use the format to load data into Greenplum Database.window
property to task
.jsonl
, delimited
, and csv
format data where a Kafka message can include multiple rows, the total_rows_read
identifies the Kafka message and the new total_rows
field identifies the total number of rows inserted and rejected.timestamp
. This int64
-type field identifies the time that a message was written to the Kafka log.SAVE_FAILING_BATCH
is true
, GPSS records the time that a record was inserted into the backup table. The name of the new column is gpss_save_timestamp
. Refer to Redirecting Data to a Backup Table when GPSS Encounters Expression Evaluation Errors for a discussion of the backup table schema.RECOVER_FAILING_BATCH (Beta)
is true
, GPSS reports more information about the result of the operation, including the batch size and number of records recovered.delimited
data format.stdout
of a command into a Greenplum Database table. You specify command specifics via the new EXEC
(version 2) or exec
(version 3 (Beta)) block in the load configuration file.GPSS introduces Beta support for loading from a RabbitMQ data source. You can load messages from a RabbitMQ queue or stream into Greenplum Database. Refer to Loading from RabbitMQ into Greenplum (Beta) for more information about using this new Beta feature, and rabbitmq-v3.yaml (Beta) and rabbitmq-v2.yaml (Beta) for more information about the supported load configuration file properties.
Greenplum Streaming Server 1.8.0 resolves these issues:
pgbouncer
to manage connections did not receive a client SSL certificate as expected. GPSS now exposes a
DBClientShared
GPSS server configuration property that you can use to instruct GPSS to present the Gpfdist certificate as the client SSL cert to Greenplum Database.
gpss_save_timestamp
. GPSS also reports more information during bad batch recovery operations.
ReuseTables
is
false
, GPSS names the external table using the job name instead of a hash of configuration properties.
MAX_RETRIES
after a job was successfully submitted and running.
Release Date: April 21, 2022
Greenplum Streaming Server 1.7.2 includes changes and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.2.
Greenplum Streaming Server 1.7.2 includes these changes:
\x
). Refer to Backslash Escape Sequences in the PostgreSQL documentation for more information.32168
, GPSS version 1.7.2 introduces support for loading files or messages that contain one JSON record per line into Greenplum Database. To use this new feature, you must specify FORMAT: jsonl
in version 2 format load configuration files, or specify json
format with is_jsonl: true
in version 3 (Beta) format load configuration files.Greenplum Streaming Server 1.7.2 resolves these issues:
Release Date: March 31, 2022
Greenplum Streaming Server 1.7.1 resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.1.
Greenplum Streaming Server 1.7.1 resolves these issues:
timestamp
(without timezone) types that it loaded into a Greenplum Database table.
pq: missing data for column *name*
when loading a file containing CSV-format data. This issue is resolved; GPSS no longer automatically adds a newline when one already exists at the end of the file.
Release Date: March 18, 2022
Greenplum Streaming Server 1.7.0 adds new features, includes changes, and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.7.0.
Greenplum Streaming Server 1.7.0 includes these new and changed features:
go
that it uses to build the CLI tools to version 1.17.6 to mitigate CVE-2021-44716.GPSS introduces a default timeout of 10 seconds for a gpss
service instance to connect to Greenplum Database and a related environment variable named GPDB_CONNECT_TIMEOUT
. You can set this environment variable to change the amount of time that GPSS waits to establish a connection to Greenplum Database as described in Running the Greenplum Streaming Server.
gpss.json
server configuration file now includes an Authentication
property block. Use the configuration properties in this block to specify a user name and password for client authentication to the GPSS server. Refer to Configuring the Streaming Server for Client-to-Server Authentication for additional information about this new feature.-U/--username
and -P/--password
options to the gpsscli subcommands to specify the user name and password for client authentication to the GPSS server.topic:partition:offset
for each badly-formatted Kafka message written to the error log; you can view this information when you run the SELECT * FROM gp_read_error_log('<exttbl>')
command.--skip-explain
flag to the gpsscli start subcommand to skip the explain SQL check step of its internal processing.OUTPUTS:TABLE
(version 2) or targets:gpdb:tables:table
(version 3 (Beta)) block for each table, and specify the properties that identify the data targeted to each.gp_json
(Beta) to the dataflow
extension. For additional information about using the gp_json
data type, refer to About the JSON Format and Column Type documentation.DELIMITER
, QUOTE
, NULL_STRING
, ESCAPE
, FORCE_NOT_NULL
, and FILL_MISSING_FIELDS
. New version 3 property names include delimiter
, quote
, null_string
, escape
, force_not_null
, and fill_missing_fields
.PREPARE_SQL
and TEARDOWN_SQL
(version 2) and prepare_statement
and teardown_statement
(version 3) load configuration file properties for Kafka and file data sources. You can use the properties to specify user-defined function or SQL commands for GPSS to run before executing a job, and/or at job completion.GPSS 1.7.0 adds, changes, and relocates property keywords in the version 3 (Beta) configuration file format. Refer to the gpsscli-v3.yaml (Beta), gpkafka-v3.yaml (Beta), and filesource-v3.yaml (Beta) reference pages for the new keywords and locations.
GPSS 1.7.0 introduces Beta support for a new data source, S3. This data source does not read directly from S3, but rather uses the Greenplum Database s3 protocol and external tables to read from s3 and write to Greenplum in parallel. Refer to Loading from S3 into Greenplum (Beta) for more information about using this new feature, and s3source-v3.yaml (Beta) for the supported load configuration file properties.
-f/--force
flag to the gpsscli remove subcommand to forcibly stop and remove a GPSS job(s).Greenplum Streaming Server 1.7.0 resolves these issues:
go
library to version 1.17.6.
\n
after parsing 76 characters of Avro data when the load configuration file specified
bytes_to_base64: true
.
PREPARE_SQL/TEARDOWN_SQL
and
prepare_statement/teardown_statement
).
hostnossl
connection type entry configured for the user in the
pg_hba.conf
file). GPSS now attempts to initiate a non-SSL connection when it encounters an SSL connection failure on the control channel.
Release Date: May 28, 2021
Greenplum Streaming Server 1.6.0 adds new features, includes changes, and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.6.0.
Greenplum Streaming Server 1.6.0 includes these new and changed features:
-c | --config
flag/option to the gpss
command to specify the JSON-formatted configuration file.gpsscli --version
command now displays the version of the GPSS server in addition to displaying that of the client.gpss.json
server configuration file now includes a KeepAlive
property block. Use the configuration properties in this block to specify timeout options for the gRPC connection between the GPSS client and the GPSS server.stdout
) from CSV format to a more human-readable format. Related, GPSS adds a --csv-log
option to the commands to write the front-end logs in CSV format. GPSS also adds a --color
option to commands to enable the use of color in message display.IDLE_DURATION
(version 2 configuration) and idle_duration_ms
(version 3 configuration). Use this property to specify that GPSS use lazy load mode, waiting until data arrives before locking the target Greenplum Database table.SCHEMA_PATH_ON_GPDB
(version 2 configuration) and schema_path_on_gpdb
(version 3 configuration). Use this property to specify the path to the Avro .avsc
file that contains the schema of the Kafka key or value data (but not both). This file must reside in the same location on all Greenplum Database segment hosts.FALLBACK_OFFSET
(version 2 configuration) and fallback_offset
(version 3 configuration). Use this property to specify that GPSS automatically handle Kafka message offset mismatches, and how.group.id
setting.CONSISTENCY
(version 2 configuration) and consistency
(version 3 configuration). Use this property to specify how GPSS manages Kafka message offsets when it acts as a high-level consumer. Refer to Understanding Kafka Message Offset Management for more information.Greenplum Streaming Server 1.6.0 includes these new Beta features:
GPSS exposes a new load configuration property for Kafka data sources named RECOVER_FAILING_BATCH
(version 2 configuration) and recover_failing_batch
(version 3 configuration). Use this property in conjunction with SAVE_FAILING_BATCH
to instruct GPSS to automatically reload the good data in the batch, and retain only the error data in the backup table.
Note: Enabling this feature may have severe performance implications when any data in the Kafka topic generates an expression error.
Note: This feature requires that GPSS has the Greenplum Database privileges to create a function.
GPSS adds a new extension named dataflow
. This extension includes a new data type, gp_jsonb
(available for Greenplum Database version 6.x only), and a new formatter, text_in
. You must CREATE EXTENSION dataflow;
in each database in which you choose to use these types and formatters. For additional information about the gp_jsonb
data type, see About the JSON Format and Column Type.
Greenplum Streaming Server 1.6.0 resolves this issue:
stdout
. GPSS now supports consumer groups, which saves message offsets to the Kafka topic.
FALLBACK_OPTION
load configuration property that instructs GPSS to automatically handle offset mismatches, and how to handle them.
.avro
file. GPSS now supports the
SCHEMA_PATH_ON_GPDB
load configuration property to specify the
.avsc
schema file.
IDLE_DURATION
load configuration property.
\u0000
by creating a new Greenplum Database data type named
gp_jsonb
(Beta).
Release Date: April 15, 2021
Greenplum Streaming Server 1.5.3 resolves an issue.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.3.
Greenplum Streaming Server 1.5.3 resolves this issue:
CUSTOM_OPTION
properties specified in a load configuration file. GPSS now supports using the
NAME
and
PARAMSTR
properties to specify a custom formatter user-defined function.
Release Date: March 5, 2021
Greenplum Streaming Server 1.5.2 resolves several issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.2.
Greenplum Streaming Server 1.5.2 includes this change:
Greenplum Streaming Server 1.5.2 resolves these issues:
execInsert and err: nil
because it did not check for an error before logging.
MATCH COLUMNS
was empty. GPSS now requires and checks that this field includes at least one column when you submit a load job that specifies
UPDATE
or
MERGE
mode.
--all
option was incorrectly displayed in the help output of the
gpsscli load
command.
librdkafka
queued.max.messages.kbytes
property when the user does not explicitly configure it.
gpkafka load show job progress fail, err: job progress is nil
when it failed to start a Kafka job. GPSS now returns the more meaningful error
gpkafka load start job failed
in this situation.
gpsscli load *.yaml
command operated on more than one load job.
Release Date: February 5, 2021
Greenplum Streaming Server 1.5.1 includes changes and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.1.
Greenplum Streaming Server 1.5.1 includes these changes:
.deb
installation package for Ubuntu 18.04 LTS systems.gpsscli
subcommands now consistently return zero (0
) on success and non-zero when GPSS encounters an error.libserdes
library to fix an issue that can arise when the SCHEMA_REGISTRY_ADDRS
property value includes a trailing slash. See resolved issue 31137.gp_read_persistent_error_log()
function when you register the GPSS extension in a database. Resolved issue 31201 provides more information.Greenplum Streaming Server 1.5.1 resolves these issues:
permission denied for language c
error when it attempted, at runtime, to register an internal function as the Greenplum Database user that started GPSS, and this user did not have the privileges required to create such functions. GPSS now registers this internal function when you create the GPSS extension in a database.
libserdes
, GPSS did not correctly handle a trailing slash when specified in the first address in a list of
SCHEMA_REGISTRY_ADDR
s. This issue is resolved; GPSS 1.5.1 bundles a patched version of the
libserdes
library that can handle such addresses.
SAVE_FAILING_BATCH
property and value in a (deprecated) version 1 load configuration file, when version 1 of the file does not support this property. GPSS now displays a warning message when it encounters a property that is not supported in a version 1 configuration file.
read_committed
isolation level, the job was restarted, and the topic retention period had expired. This issue is resolved; GPSS now records control message offsets.
-i | --edit-in-place
option was displayed in the help output of subcommands that did not support the option. GPSS now correctly displays the option only for the
gpsscli convert
command.
gpsscli
subcommands did not return consistent values.
gpsscli
now returns zero (
0
) on success and non-zero on failure.
filesource.yaml
load configuration file before submitting the job.
Release Date: December 2, 2020
Greenplum Streaming Server 1.5.0 adds new features, includes changes, and resolves issues.
NoteYou are required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.5.0.
Greenplum Streaming Server 1.5.0 includes these new and changed features:
ERROR_LIMIT
property, previously mandatory, is now optional. The default value for the property is zero (0
); GPSS deactivates error logging and stops a load operation upon encountering the first error.gpss
server instances. Refer to Monitoring GPSS Service Instances for more information on enabling and using this integration.gpss.json
server configuration file include:
DebugPort
configuration property. You can use this property to identify the port number on which GPSS starts a debug server for the gpss
server instance. Refer to Pulling Information from the Debug Server for more information.MinTLSVersion
configuration property. You use this property to specify the minimum TLS version that GPSS requests on encrypted connections.Logging
configuration property block. You can use these configuration properties to set the front-end and back-end logging levels for GPSS commands. See About GPSS Logging.JobStore
configuration property block. Use the configuration property in this block to specify a local directory in which GPSS maintains job status information. This allows a GPSS server instance to (re)start any in-progress jobs when the instance first starts up. See About GPSS Job Management.Monitor
configuration property block. You use this property to enable GPSS Prometheus integration.gpsscli submit
or gpsscli load
commands without specifying the --name
option. GPSS now assigns the base name of the load configuration file as the default job name.PARTITIONS
. Use this property to specify the specific partition numbers from which you want GPSS to load Kafka messages from the topic. (This property is not supported for the Kafka version 1 configuration file format.){{template\_var}}
value syntax in the file, GPSS substitutes template\_var
with a value
that you specify via the -p | --property template\_var=value
option when you submit or load the job.pq
library to support this feature. See Configuring SSL for the Control Channel for configuration information.gpsscli
start
, stop
, and remove
subcommands now support a --all
flag. When you specify this flag, GPSS: starts all submitted jobs, stops all running jobs, or removes all stopped jobs.gpsscli submit
and gpsscli load
commands can now operate on one or more YAML load configuration files.SAVE_FAILING_BATCH
load configuration property. When you set this property to true
, GPSS also writes loading data to a backup table. When GPSS encounters expression evaluation errors, this backup table aids in the recovery of the load operation. See Redirecting Data to a Backup Table when GPSS Encounters Expression Evaluation Errors for additional information. (This property is not supported for the Kafka version 1 configuration file format.)Greenplum Streaming Server 1.5.0 resolves these issues:
Release Date: December 17, 2021
Greenplum Streaming Server 1.4.3 resolves an issues and includes related changes.
NoteYou may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.3.
Greenplum Streaming Server 1.4.3 includes this change:
Greenplum Streaming Server 1.4.3 resolves this issue:
hostnossl
connection type entry configured for the user in the
pg_hba.conf
file). GPSS now attempts to initiate a non-SSL connection when it encounters an SSL connection failure on the control channel.
Release Date: November 2, 2020
Greenplum Streaming Server 1.4.2 resolves issues and includes related changes.
NoteYou may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.2.
Greenplum Streaming Server 1.4.2 includes these changes:
prefer
mode on the control channel to the Greenplum Database coordinator host. GPSS previously explicitly deactivated SSL on the channel.Greenplum Streaming Server 1.4.2 resolves these issues:
gpsscli stop
would not respond (hang).
Release Date: August 7, 2020
Greenplum Streaming Server 1.4.1 resolves issues and includes related changes.
NoteYou may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.1.
Greenplum Streaming Server 1.4.1 includes these changes:
librdkafka
library to fix an issue that can arise when the Kafka topic that GPSS loads includes messages with discontinuous offsets. See resolved issue 30797, 30776.Greenplum Streaming Server 1.4.1 resolves these issues:
librdkafka
, a load job from Kafka would hang when there were aborted Kafka transactions in the topic, or when the messages were deleted before GPSS was able to consume them. This issue is resolved. GPSS 1.4.1 bundles a patched version of the
librdkafka
library and can now handle message offsets that are not continuous.
Cannot parallelize an UPDATE statement that updates the distribution columns
because GPSS versions 1.3.5 through 1.4.0 used the Greenplum Postgres Planner by default, which does not support updating columns that are specified as the distribution key. GPSS 1.4.1 resolves this issue by not explicitly specifying a query planner/optimizer, but rather using the default that is configured in the Greenplum cluster.
gpsscli stop
would hang when you invoked it to stop a Kafka load job that GPSS had previously retried. This issue is resolved.
libserdes.so
. This issue is resolved, the package now includes this library.
gpss.json
server configuration file included the (default)
ReuseTables: true
property setting.
gpsscli progress
command execution, the progress information for jobs for which you did not run the command was lost. This issue is resolved. GPSS now always tracks job progress in a separate, CSV-format log file (with header row) named
progress_*jobname*_*jobid*_*date*.log
.
Release Date: June 26, 2020
Greenplum Streaming Server 1.4.0 adds new features, includes changes, and resolves issues.
NoteYou may be required to perform upgrade actions for this release. Review Upgrading the Streaming Server to plan your upgrade to GPSS 1.4.0.
Greenplum Streaming Server 1.4.0 includes these new and changed features:
META
load configuration property block. You can load the properties in this single JSON-format column into the target table, or use the properties in update or merge criteria for a load operation. The available META
properties are data-source specific:
META
properties: topic
(text
), partition
(int
), and offset
(bigint
).META
property named filename
(text
).UPDATE_COLUMNS
. In this scenario, GPSS updates all MAPPING
columns in each row.librdkafka
version 1.4.2. This version provides support for controlling how GPSS reads Kafka messages written transactionally via the isolation.level
property.Greenplum Streaming Server 1.4.0 resolves these issues:
TransferStats
success and error counts for data load operations initiated in update mode.
Deprecated features may be removed in a future minor release of the Greenplum Streaming Server. GPSS 1.4.x deprecates:
gpkafka
Version 1 configuration file format (deprecated since 1.4.0).gpkafka.yaml
(versions 1 and 2) POLL
block, including the POLL:BATCHSIZE
and POLL:TIMEOUT
properties (deprecated since 1.3.5).Deprecated features may be removed in a future minor release of the Greenplum Streaming Server. GPSS 1.4.x removes:
gpsscli history
and gpkafka history
commands (deprecated in 1.3.5).Release Date: December 19, 2019
Greenplum Streaming Server version 1.3.1 is the first standalone release of GPSS. GPSS 1.3.1 is also included in the Greenplum Database version 5.24 and 6.2 distributions.
Greenplum Streaming Server 1.3.1 is a maintenance release that resolves several issues.
Greenplum Streaming Server 1.3.1 resolves these issues:
MINIMAL_INTERVAL
(0 seconds) caused GPSS to consume a large amount of CPU resources, even when no new messages existed in the Kafka topic. This issue is resolved.
Release Date: November 1, 2019
Greenplum Streaming Server version 1.3.0 is included in the Greenplum Database version 5.23 and 6.1 distributions.
Greenplum Streaming Server 1.3.0 is a minor release that includes new and changed features and resolves several issues.
Greenplum Streaming Server 1.3.0 includes these new and changed features:
logrotate
system. See Managing GPSS Log Files for more information.INPUT:FILTER
load configuration property. This property enables you to specify a filter that GPSS applies to Kafka input data before loading it into Greenplum Database.--partition
flag to the gpsscli progress
command.--force-reset-timestamp
flag when you run gpsscli load
, gpsscli start
, or gpkafka load
.MODE
, MATCH_COLUMNS
, UPDATE_COLUMNS
, and UPDATE_CONDITION
property values to direct these operations. Example: Merging Data from Kafka into Greenplum Using the Streaming Server provides an example merge scenario.Greenplum Streaming Server 1.3.0 is a minor release that resolves these issues:
--force-reset-earliest
flag when loading data failed to read from the correct offset. This problem has been fixed. (Using the
--force-reset-*xxx*
flags outside of an offset mismatch scenario is discouraged.)
gp_read_error_log()
on the external table now displays the offending data.
Greenplum Streaming Server 1.x includes these Beta features:
GPSS adds support for a RabbitMQ data source (introduced in 1.8.0, promoted to supported in 1.9.0).
GPSS adds support for an s3
data source (introduced in 1.7.0).
GPSS adds a new datatype named gp_json
to the dataflow
extension (introduced in 1.7.0).
GPSS exposes a new load configuration property for Kafka data sources named RECOVER_FAILING_BATCH
(version 2 configuration) and recover_failing_batch
(version 3 configuration). Use this property in conjunction with SAVE_FAILING_BATCH
to instruct GPSS to automatically reload the good data in the batch, and retain only the error data in the backup table.
Note: Enabling this feature may have severe performance implications when any data in the Kafka topic generates an expression error.
Note: This feature requires that GPSS has the Greenplum Database privileges to create a function.
(Introduced in 1.6.0.)
GPSS adds a new extension named dataflow
. This extension includes a new data type, gp_jsonb
(available for Greenplum Database version 6.x only), and a new formatter, text_in
. (Introduced in 1.6.0).
GPSS specifies a new version 3 load configuration file format. This format introduces a new YAML organization and keywords. (Introduced in 1.5.0.)
Deprecated features may be removed in a future release of the Greenplum Streaming Server. GPSS 1.x deprecates:
gpss.json
configuration file to the gpss
command standalone (deprecated since 1.6.0). Use the -c | --config
option when you specify the file.gpkafka
Version 1 configuration file format (deprecated since 1.4.0).gpkafka.yaml
(versions 1 and 2) POLL
block, including the POLL:BATCHSIZE
and POLL:TIMEOUT
properties (deprecated since 1.3.5).Greenplum Streaming Server 1.x has these known issues:
EXPLAIN INSERT
command internally launched by GPSS on a Kafka job may take a long time to complete. You can work around this issue by specifying the
--skip-explain
flag to the
gpsscli start command when you start the job.
SAVE_FAILING_BATCH
and
PARTITIONS
configuration properties are not supported when you use the version 1 configuration file format to load data.
gpfdist
.
Due to limitations in the Greenplum Database external table framework, GPSS cannot log a data type conversion error that it encounters while evaluating a mapping expression. For example, if you use the expression EXPRESSION: (jdata->>'id')::int
in your load configuration file, and the content of jdata->>'id'
is a string that includes non-integer characters, the evaluation fails and GPSS terminates the load job. GPSS cannot log and propagate the error back to the user via gp_read_error_log()
.
Workarounds for Kafka:
SAVE_FAILING_BATCH
load configuration property to true
, and then manually load any data batch that included expression errors.--force--reset-*xxx*
flag on the job start or load command.