PXF supports filter pushdown. When filter pushdown is activated, the constraints from the WHERE
clause of a SELECT
query can be extracted and passed to the external data source for filtering. This process can improve query performance, and can also reduce the amount of data that is transferred to Greenplum Database.
You activate or deactivate filter pushdown for all external table protocols, including pxf
, by setting the gp_external_enable_filter_pushdown
server configuration parameter. The default value of this configuration parameter is on
; set it to off
to deactivate filter pushdown. For example:
SHOW gp_external_enable_filter_pushdown;
SET gp_external_enable_filter_pushdown TO 'on';
Note: Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead run without filter pushdown (the data is filtered after it is transferred to Greenplum Database).
PXF filter pushdown can be used with these data types (connector- and profile-specific):
INT2
, INT4
, INT8
CHAR
, TEXT
, VARCHAR
FLOAT
NUMERIC
(not available with the hive
profile when accessing STORED AS Parquet
)BOOL
DATE
, TIMESTAMP
(available only with the JDBC connector, the S3 connector when using S3 Select, the hive:rc
and hive:orc
profiles, and the hive
profile when accessing STORED AS
RCFile
or ORC
)PXF accesses data sources using profiles exposed by different connectors, and filter pushdown support is determined by the specific connector implementation. The following PXF profiles support some aspects of filter pushdown as well as different arithmetic and logical operations:
Profile | <, >, <=, >=, =, <> |
LIKE | IS [NOT] NULL | IN | AND | OR | NOT |
---|---|---|---|---|---|---|---|
jdbc | Y | Y4 | Y | N | Y | Y | Y |
*:parquet | Y1 | N | Y1 | Y1 | Y1 | Y1 | Y1 |
*:orc (all except hive:orc) | Y1,3 | N | Y1,3 | Y1,3 | Y1,3 | Y1,3 | Y1,3 |
s3:parquet and s3:text with S3-Select | Y | N | Y | Y | Y | Y | Y |
hbase | Y | N | Y | N | Y | Y | N |
hive:text | Y2 | N | N | N | Y2 | Y2 | N |
hive:rc, hive (accessing stored as RCFile) | Y2 | N | Y | Y | Y, Y2 | Y, Y2 | Y |
hive:orc, hive (accessing stored as ORC) | Y, Y2 | N | Y | Y | Y, Y2 | Y, Y2 | Y |
hive (accessing stored as Parquet) | Y, Y2 | N | N | Y | Y, Y2 | Y, Y2 | Y |
hive:orc and VECTORIZE=true | Y2 | N | N | N | Y2 | Y2 | N |
1 PXF applies the predicate, rather than the remote system, reducing CPU usage and the memory footprint.
2 PXF supports partition pruning based on partition keys.
3 PXF filtering is based on file-level, stripe-level, and row-level ORC statistics.
4 The PXF jdbc
profile supports the LIKE
operator only for TEXT
fields.
PXF does not support filter pushdown for any profile not mentioned in the table above, including: *:avro, *:AvroSequenceFile, *:SequenceFile, *:json, *:text, *:csv, *:fixedwidth
, and *:text:multi.
To summarize, all of the following criteria must be met for filter pushdown to occur:
gp_external_enable_filter_pushdown
server configuration parameter to 'on'
.pxf
external table protocol supports pushdown.For queries on external tables that you create with the pxf
protocol, the underlying PXF connector must also support filter pushdown. For example, the PXF Hive, HBase, and JDBC connectors support pushdown, as do the PXF connectors that support reading ORC and Parquet data.
Refer to Hive Partition Pruning for more information about Hive support for this feature.