PXF supports column projection, and it is always enabled. With column projection, only the columns required by a SELECT
query on an external table are returned from the external data source. This process can improve query performance, and can also reduce the amount of data that is transferred to Greenplum Database.
Note: Some external data sources do not support column projection. If a query accesses a data source that does not support column projection, the query is instead run without it, and the data is filtered after it is transferred to Greenplum Database.
Column projection is automatically enabled for the pxf
external table protocol. PXF accesses external data sources using different connectors, and column projection support is also determined by the specific connector implementation. The following PXF connector and profile combinations support column projection on read operations:
Data Source | Connector | Profile(s) |
---|---|---|
External SQL database | JDBC Connector | jdbc |
Hive | Hive Connector | hive (accessing tables stored as Text, Parquet, RCFile, and ORC), hive:rc, hive:orc |
Hadoop | HDFS Connector | hdfs:orc, hdfs:parquet |
Network File System | File Connector | file:orc, file:parquet |
Amazon S3 | S3-Compatible Object Store Connectors | s3:orc, s3:parquet |
Amazon S3 using S3 Select | S3-Compatible Object Store Connectors | s3:parquet, s3:text |
Google Cloud Storage | GCS Object Store Connector | gs:orc, gs:parquet |
Azure Blob Storage | Azure Object Store Connector | wasbs:orc, wasbs:parquet |
Azure Data Lake | Azure Object Store Connector | adl:orc, adl:parquet |
Note: PXF may deactivate column projection in cases where it cannot successfully serialize a query filter; for example, when the WHERE
clause resolves to a boolean
type.
To summarize, all of the following criteria must be met for column projection to occur: