When multiple, related data sets exist in external systems, it is often more efficient to join data sets remotely and return only the results, rather than negotiate the time and storage requirements of performing a rather expensive full data load operation. The VMware Greenplum Platform Extension Framework, a Greenplum Database extension that provides parallel, high throughput data access and federated query processing, provides this capability.

With the VMware Greenplum Platform Extension Framework, you can use Greenplum Database and SQL to query these heterogeneous data sources:

  • Hadoop, Hive, and HBase

  • Azure Blob Storage and Azure Data Lake

  • AWS S3

  • MinIO

  • Google Cloud Storage

  • SQL databases including Apache Ignite, Hive, MySQL, ORACLE, Microsoft SQL Server, DB2, and PostgreSQL (via JDBC)

  • Network file systems

And these data formats:

  • Avro, AvroSequenceFile

  • JSON

  • ORC

  • Parquet

  • RCFile

  • SequenceFile

  • Text (plain, delimited, embedded line feeds)