The Tanzu Greenplum streaming server (GPSS) is an ETL (extract, transform, load) tool. An instance of the GPSS server ingests streaming data from one or more clients, using Tanzu Greenplum readable external tables to transform and insert the data into a target Greenplum table. The data source and the format of the data are specific to the client. You may also unload data from Tanzu Greenplum to a file using writable external tables.

The Tanzu Greenplum streaming server includes the gpss command-line utility. When you run gpss, you start an instance of GPSS; this instance waits indefinitely for client data.

The Tanzu Greenplum streaming server also includes the gpsscli command-line utility, a client tool for submitting data load jobs to a GPSS instance and managing those jobs.

Note

The Tanzu Greenplum streaming server gpsscli client utility currently supports Kafka, file, RabbitMQ, and S3 (Beta) data sources, and file data as a target for unloading data.

Architecture

The Tanzu Greenplum streaming server is a gRPC server. The GPSS gRPC service definition includes the operations and messages necessary to connect to Tanzu Greenplum and examine Greenplum metadata. The service definition also includes the operations and messages necessary to write data from a client into a Tanzu Greenplum table. For more information about gRPC, refer to the gRPC documentation.

The gpsscli utility is a Tanzu Greenplum streaming server gRPC client, as is the Tanzu Greenplum Connector for Apache NiFi. You can develop your own GPSS gRPC client using the GPSS Batch Data API.

Greenplum Streaming Server Architecture

A typical sequence of events for performing an ETL task using the Tanzu Greenplum streaming server follows:

  1. A user initiates one or more ETL load jobs via a client application.
  2. The client application uses the gRPC protocol to submit and start data load job(s) to a running GPSS service instance.
  3. The GPSS service instance submits each load request transaction to the Tanzu Greenplum cluster coordinator instance. GPSS uses the gpfdist protocol to store data in external tables that it creates or reuses.
  4. The GPSS service instance writes the data delivered from the client directly into the segments of the Tanzu Greenplum cluster.
check-circle-line exclamation-circle-line close-line
Scroll to top icon