When multiple, related data sets exist in external systems, it is often more efficient to join data sets remotely and return only the results, rather than negotiate the time and storage requirements of performing a rather expensive full data load operation. The VMware Tanzu Greenplum platform extension framework, a Tanzu Greenplum Database extension that provides parallel, high throughput data access and federated query processing, provides this capability.
With the Tanzu Greenplum platform extension framework, you can use Tanzu Greenplum Database and SQL to query these heterogeneous data sources:
Hadoop, Hive, and HBase
Azure Blob Storage and Azure Data Lake
AWS S3
MinIO
Google Cloud Storage
SQL databases including Apache Ignite, Hive, MySQL, ORACLE, Microsoft SQL Server, DB2, and PostgreSQL (via JDBC)
Network file systems
And these data formats:
Avro, AvroSequenceFile
JSON
ORC
Parquet
RCFile
SequenceFile
Text (plain, delimited, embedded line feeds)