The Tanzu Greenplum streaming server (GPSS) is an ETL (extract, transform, load) tool. An instance of the GPSS server ingests streaming data from one or more clients, using Tanzu Greenplum readable external tables to transform and insert the data into a target Greenplum table. The data source and the format of the data are specific to the client. You may also unload data from Tanzu Greenplum to a file using writable external tables.
The Tanzu Greenplum streaming server includes the gpss command-line utility. When you run gpss
, you start an instance of GPSS; this instance waits indefinitely for client data.
The Tanzu Greenplum streaming server also includes the gpsscli command-line utility, a client tool for submitting data load jobs to a GPSS instance and managing those jobs.
NoteThe Tanzu Greenplum streaming server
gpsscli
client utility currently supports Kafka, file, RabbitMQ, and S3 (Beta) data sources, and file data as a target for unloading data.
The Tanzu Greenplum streaming server is a gRPC server. The GPSS gRPC service definition includes the operations and messages necessary to connect to Tanzu Greenplum and examine Greenplum metadata. The service definition also includes the operations and messages necessary to write data from a client into a Tanzu Greenplum table. For more information about gRPC, refer to the gRPC documentation.
The gpsscli
utility is a Tanzu Greenplum streaming server gRPC client, as is the Tanzu Greenplum Connector for Apache NiFi. You can develop your own GPSS gRPC client using the GPSS Batch Data API.
A typical sequence of events for performing an ETL task using the Tanzu Greenplum streaming server follows:
gpfdist
protocol to store data in external tables that it creates or reuses.