VMware Greenplum is a massively parallel processing database server specially designed to manage large scale analytic data warehouses and business intelligence workloads. Informatica PowerCenter is a high-speed platform for integrating Enterprise data.
The VMware Greenplum Connector for Informatica provides high speed data transfer from Informatica PowerCenter to a VMware Greenplum cluster to support batch and continuous (streaming) ETL.
The Connector architecture consists of the Connector itself, which runs on an Informatica PowerCenter node and on Informatica client machines, and the Greenplum Streaming Server (GPSS) service, which runs in the Greenplum Database cluster. The GPSS service can run anywhere in the Greenplum Database cluster, and interacts with Greenplum Database master and segment hosts as necessary to transfer data from Informatica.
Figure: Greenplum Connector for Informatica Architecture
A typical sequence of events for performing an ETL task using the Connector involves:
An Informatica user accesses the PowerCenter server with client tools and initiates one or more ETL load requests with the Connector.
The Connector uses the gRPC protocol to transmit the load requests to the GPSS service running in the Greenplum Database cluster.
The GPSS service submits each load request transaction to the Greenplum Database cluster master instance, and creates the external tables needed to store data. Each load request can configure session properties to customize the services that GPSS provides in the Greenplum Database cluster.
The GPSS service transfers the requested data from the PowerCenter node into segments of the Greenplum Database cluster.