The Greenplum Streaming Server (GPSS) is an ETL (extract, transform, load) tool. An instance of the GPSS server ingests streaming data from one or more clients, using Greenplum Database readable external tables to transform and insert the data into a target Greenplum table. The data source and the format of the data are specific to the client.
The Greenplum Streaming Server includes the gpss command-line utility. When you run
gpss, you start an instance of GPSS; this instance waits indefinitely for client data.
The Greenplum Streaming Server also includes the gpsscli command-line utility, a client tool for submitting data load jobs to a GPSS instance and managing those jobs.
The Greenplum Streaming Server
gpsscliclient utility currently supports Kafka, file, RabbitMQ, and S3 (Beta) data sources.
The Greenplum Streaming Server is a gRPC server. The GPSS gRPC service definition includes the operations and messages necessary to connect to Greenplum Database and examine Greenplum metadata. The service definition also includes the operations and messages necessary to write data from a client into a Greenplum Database table. For more information about gRPC, refer to the gRPC documentation.
gpsscli utility is a Greenplum Streaming Server gRPC client, as are the VMware Greenplum Connector for Informatica and the VMware Greenplum Connector for Apache NiFi. You can develop your own GPSS gRPC client using the GPSS Batch Data API.
A typical sequence of events for performing an ETL task using the Greenplum Streaming Server follows:
gpfdistprotocol to store data in external tables that it creates or reuses.