The VMware Tanzu Greenplum Platform Extension Framework for Red Hat Enterprise Linux, CentOS, and Oracle Enterprise Linux is updated and distributed independently of Greenplum Database starting with version 5.13.0. Version 5.16.0 is the first independent release that includes an Ubuntu distribution. Version 6.3.0 is the first independent release that includes a Red Hat Enterprise Linux 8.x distribution.
You must download and install the PXF package to obtain the most recent version of this component.
The independent PXF 6.x distribution is compatible with these operating system platform versions and Greenplum Database versions:
|OS Version||Greenplum Version|
|RHEL 7.x, CentOS 7.x||5.21.2+, 6.x|
|OEL 7.x, Ubuntu 18.04 LTS||6.x|
PXF is compatible with these Java and Hadoop component versions:
|PXF Version||Java Versions||Hadoop Versions||Hive Server Versions||HBase Server Version|
|6.4.x, 6.3.x, 6.2.x, 6.1.0, 6.0.x||8, 11||2.x, 3.1+||1.x, 2.x, 3.1+||1.3.2|
|5.16.x, 5.15.x, 5.14, 5.13||8, 11||2.x, 3.1+||1.x, 2.x, 3.1+||1.3.2|
Release Date: September 20, 2022
PXF 6.4.2 includes these changes:
PXF 6.4.2 resolves these issues:
|32439||Resolves an issue where PXF returned the error
|32353||Resolves an issue where PXF returned incomplete data when it read a JSON file containing multi-line records, and the external table definition specified both a
Release Date: August 19, 2022
PXF 6.4.0 includes these changes:
pxf.orc.write.timezone.utcto govern how PXF writes ORC timestamp values to the external data store. By default, PXF writes timestamp values using the UTC time zone.
PreparedStatementwhen reading with the JDBC Connector, using the
aws-java-sdk-s3dependency to version 1.12.261 to resolve CVE-2022-31159.
snappy-javadependency to version 220.127.116.11.
postgresqldependency to version 42.4.1 to resolve CVE-2022-31197.
Release Date: July 21, 2022
PXF 6.3.2 includes these changes:
PXF 6.3.2 resolves these issues:
|CVE‑2022‑22965||Updates Spring to version 2.5.12. (Resolved by PR-789.)|
|CVE‑2021‑37404||Updates Hadoop to version 2.10.2. (Resolved by PR-819.)|
Release Date: April 27, 2022
PXF 6.3.1 resolves these issues:
|32177||Resolves an issue where PXF returned a
|32149||Resolves an issue where the PXF post-installation script failed when the PXF
Release Date: March 18, 2022
PXF 6.3.0 includes these new and changed features:
postgresqlJDBC JAR file to mitigate CVE-2022-21724.
uuidAvro logical types.
gpupgradeto upgrade from Greenplum 5 to Greenplum 6. The PXF package now includes two new scripts,
pxf-post-gpupgrade, that you use during this upgrade process.
pxf.service.kerberos.constrained-delegationto enable this feature.
PXF 6.3.0 resolves these issues:
|31992||Resolves an issue where PXF returned duplicate rows when the
|31112||Resolves an issue where PXF required that its service principal be configured as a Hadoop proxy user to access a Kerberos-secured Hadoop cluster. (Resolved by PR-707.)|
|N/A||Resolves an issue where PXF did not close Hive Metastore connections in a timely manner, which eventually resulted in the exhaustion of the Metastore connection pool. (Resolved by PR-756.)|
Release Date: February 1, 2022
PXF 6.2.3 includes these changes:
log4j2library to mitigate CVE-2021-44832.
gothat it uses to build the
pxfCLI tool to version 1.17.6 to mitigate CVE-2021-44716.
stdout/stderrand ignored to the file
PXF 6.2.3 resolves these issues:
|CVE‑2021‑44832||Updates the bundled
Release Date: December 22, 2021
PXF 6.2.2 includes these changes:
log4j2library to mitigate CVE-2021-45105.
PXF 6.2.2 resolves these issues:
|CVE‑2021‑45105||Updates the bundled
|31927||Resolves an issue where the PXF C extension reported a
Release Date: December 17, 2021
PXF 6.2.1 includes these changes:
log4j2library to mitigate CVE-2021-44228 and CVE-2021-45046.
UnsupportedOperationExceptionwhen it accesses a Hive transactional table.
SKIP_HEADER_COUNToption for external tables that specified a
jdbc.statement.fetchSizedefault value of
Integer.MIN_VALUE). This setting enables the MySQL JDBC driver to stream the results from a MySQL server, lessening the memory requirements when reading large data sets.
hive.metastore.failure.retriesproperty setting to identify the maximum number of times to retry a failed connection to the Hive MetaStore. The default value is one retry. Addressing Hive MetaStore Connection Errors describes when and how to configure this property.
PXF 6.2.1 resolves these issues:
|CVE‑2021‑45046||Updates the bundled
|CVE‑2021‑44228||Updates the bundled
|31955||Resolves an issue where PXF failed to access a Hive table due to a MetaStore connection issue. PXF now includes retry logic for the MetaStore connection based on the
|31948||Resolves an issue where PXF ran out of memory when it read a large data set from a MySQL database. PXF now uses a
|31906||Resolves an issue where PXF returned 0 rows when a query was performed on a Hive transactional table instead of reporting that transactional tables are unsupported. PXF now more clearly identifies the problem by returning an
|31791||Resolves an issue where PXF ignored the
Release Date: September 13, 2021
PXF 6.2.0 includes these new and changed features:
TEXT). Refer to Working with JSON Data for additional information.
PXF improves its message logging by:
pxf.sasl.connection.retries, to specify the maximum number of times that it retries a SASL connection request to an external data source after a refused connection returns a
GSS initiate failederror.
pxf.fragmenter-cache.expiration, to specify the amount of time after which an entry expires and is removed from the fragment cache.
PXF 6.2.0 resolves these issues:
|N/A||Resolves an issue when using the
|31675||Resolves a fragment cache issue that appeared when an external table was re-created within the same transaction in a stored procedure, and the new external table referenced a different
|31657||Queries on an external table intermittently failed in some Kerberos-secured environments because the Hadoop NameNode erroneously detected a replay attack during Kerberos authentication. This issue is resolved by PR-688.|
|31571||PXF did not support ORC lists. PXF 6.2.0 includes support for reading lists of certain ORC scalar types into a Greenplum Database array of native type. (Resolved by PR-675.)|
|31326||PXF did not support reading a JSON array into a Greenplum Database array-type column. PXF 6.2.0 includes support for reading a JSON array into a text array (
|683||Resolves an issue where PXF incorrectly casted an
Release Date: June 24, 2021
PXF 6.1.0 includes these new and changed features:
text. The data returned by PXF is a valid JSON string that you can manipulate with the existing Greenplum Database JSON functions and operators.
pxf.connection.upload-timeout, and is located in the pxf-application.properties file.
pxf.connection.timeoutconfiguration property to set the connection timeout only for read operations. If you previously set this property to specify the write timeout, you should now use
gp-common-go-libssupporting library along with its dependencies.
PXF 6.1.0 resolves these issues:
|31389||Resolves an issue where certain
|31317||PXF did not support writing Avro arrays. PXF 6.1.0 includes native support for reading and writing Avro arrays. (Resolved by PR-636.)|
Release Date: May 11, 2021
PXF 6.0.1 resolves these issues:
|N/A||Resolves an issue where PXF returned wrong results for batches of ORC data that were shorter than the default batch size. (Resolved by PR-630.)|
|N/A||Resolves an issue where PXF threw a
|178013439||Resolves an issue where using the profile
|31409||Resolves an issue where PXF intermittently failed with the error
Release Date: March 29, 2021
PXF 6.0.0 includes these new and changed features:
Architecture and Bundled Libraries
PXF 6.0.0 is built on the Spring Boot framework:
postgresql-42.2.14.jarPostgreSQL driver JAR file.
Files, Configuration, and Commands
$PXF_BASEenvironment variable to identify its runtime configuration directory; it no longer uses
$PXF_CONFfor this purpose.
PXF_BASE=$PXF_HOME. See About the PXF Installation and Configuration Directories for the new installation file layout.
$PXF_BASEruntime configuration directory to a different directory after you install PXF by running the new
pxf [cluster] preparecommand as described in Relocating $PXF_BASE.
$PXF_HOME/templates; they were previously located in the
pxf [cluster] registercommand now copies only the PXF
pxf.controlextension file to the Greenplum Database installation. Run this command after your first installation of PXF, and/or after you upgrade your Greenplum Database installation.
pxf [cluster] initis now equivalent to
pxf [cluster] register, and
pxf [cluster] resetis a no-op.
PXF 6 includes new and changed configuration; see About the PXF Configuration Files for more information:
pxf-log4j2.xml, and is in
PXF 6 adds a new configuration file for the PXF server application,
pxf-application.properties; this file includes:
pxf.log.levelproperty to set the PXF logging level.
Configuration properties moved from the PXF 5
pxf-env.sh file and renamed:
|pxf-env.sh Property Name||pxf-application.properties Property Name|
PXF 6 adds new configuration environment variables to
pxf-env.sh to simplify the registration of external library dependencies:
|New Property Name||Description|
|PXF_LOADER_PATH||Additional directories and JARs for PXF to class-load.|
|LD_LIBRARY_PATH||Additional directories and native libraries for PXF to load.|
See Registering PXF Library Dependencies for more information.
PXF_FRAGMENTER_CACHEconfiguration property; fragment metadata caching is no longer configurable and is now always enabled.
PXF 6 introduces new profile names and deprecates some older profile names. The old profile names still work, but it is highly recommended to switch to using the new profile names:
|New Profile Name||Old/Deprecated Profile Name|
1 To use the
HiveVectorizedORC profile in PXF 6, specify the
hive:orc profile name with the new
VECTORIZE=true custom option.
CSVprofile. See the Hadoop Text and Object Store Text documentation for usage information.
INoperator when you specify one of the
*:parquetprofiles to read a parquet file.
*:SequenceFileprofile that includes a
psqlclient, in some cases including a
HINTthat provides possible error resolution actions.
PXF version 6.0.0 removes:
THREAD-SAFEexternal table custom option (deprecated since 5.10.0).
PXF_KEYTABconfiguration properties in
pxf-env.sh(deprecated since 5.10.0).
jdbc.user.impersonationconfiguration property in
jdbc-site.xml(deprecated since 5.10.0).
SequenceWritable(deprecated since 5.0.1).
PXF 6.0.0 resolves these issues:
|30987||Resolves an issue where PXF returned an
Deprecated features may be removed in a future major release of PXF. PXF version 6.x deprecates:
PXF_FRAGMENTER_CACHEconfiguration property (deprecated since PXF version 6.0.0).
pxf [cluster] initcommands (deprecated since PXF version 6.0.0).
pxf [cluster] resetcommands (deprecated since PXF version 6.0.0).
HiveVectorizedORC(deprecated since PXF version 6.0.0). Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for the new profile names.
HBaseprofile name (now
hbase) (deprecated since PXF version 6.0.0).
Jdbcprofile name (now
jdbc) (deprecated since PXF version 6.0.0).
COMPRESSION_CODECusing the Java class name; use the codec short name instead.
PXF 6.x has these known issues and limitations:
|178013439||(Resolved in 6.0.1) Using the deprecated
Workaround: Use the
|31409||(Resolved in 6.0.1) PXF can intermittently fail with the following error when it accesses Hive tables
Workaround: Use vectorized query execution by adding the
|168957894||The PXF Hive Connector does not support using the
Workaround: Use the PXF JDBC Connector to access Hive 3 managed tables.