The VMware Greenplum Platform Extension Framework for Red Hat Enterprise Linux, CentOS, and Oracle Enterprise Linux is updated and distributed independently of Greenplum Database starting with version 5.13.0. Version 5.16.0 is the first independent release that includes an Ubuntu distribution. Version 6.3.0 is the first independent release that includes a Red Hat Enterprise Linux 8.x distribution.
You must download and install the PXF package to obtain the most recent version of this component.
The independent PXF 6.x distribution is compatible with these operating system platform versions and Greenplum Database versions:
OS Version | Greenplum Version |
---|---|
RHEL 7.x, CentOS 7.x | 5.21.2+, 6.x |
OEL 7.x, Ubuntu 18.04 LTS | 6.x |
RHEL 8.x | 6.20+ |
Additionally, PXF 6.5.1+ is compatible with Greenplum 7 Beta 2+ on RHEL 8.x.
PXF is compatible with these Java and Hadoop component versions:
PXF Version | Java Versions | Hadoop Versions | Hive Server Versions | HBase Server Version |
---|---|---|---|---|
6.6.0, 6.5.x, 6.4.x, 6.3.x, 6.2.x, 6.1.0, 6.0.x | 8, 11 | 2.x, 3.1+ | 1.x, 2.x, 3.1+ | 1.3.2 |
5.16.x, 5.15.x, 5.14, 5.13 | 8, 11 | 2.x, 3.1+ | 1.x, 2.x, 3.1+ | 1.3.2 |
Release Date: April 10, 2023
PXF 6.6.0 includes these new features and changes:
*:fixedwidth
profiles to support fixed-width text data. Refer to Reading and Writing Fixed-Width Text Data for more information and for examples.PXF v6.6.0 introduces these changes and features related to precison overflow detection and action when writing to Parquet files:
NULL
value. See Resolved Issues.pxf.parquet.write.decimal.overflow
property in the pxf-site.xml
server configuration file to govern its action when numeric data that it writes to a Parquet file exceeds the maximum supported precision of 38 and overflows. During upgrade, be sure to perform step 7 if you want to change the default value (round
) of this property for an existing PXF server configuration.DATA-SCHEMA
external table option (used with SequenceFile
profiles) and replaces it with the option named DATA_SCHEMA
.PXF 6.6.0 resolves these issues:
Issue # | Summary |
---|---|
32723 | Partially resolves an issue where PXF wrote a NULL value to a Parquet file when numeric data had a precision greater than 38. (Resolved by PR-940.) |
32715 | Partially resolves an issue where PXF returned an ArrayIndexOutOfBoundsException when it wrote a numeric value with precision greater than 38 to a Parquet file. (Resolved by PR-940.) |
Release Date: March 20, 2023
PXF 6.5.1 includes these changes:
gp-common-go-libs
supporting library along with its dependencies to resolve several CVEs (see Resolved Issues).PXF 6.5.1 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2022‑41723 | Updates golang.org/x/net . |
CVE‑2021‑43565 | Updates golang.org/x/crypto/ssh . |
CVE‑2022‑27664 | Updates golang.org/x/net/http2 |
CVE‑2022‑27191 | Updates golang.org/x/crypto/ssh . |
CVE‑2022‑32149 | Updates golang.org/x/text/language . |
CVE‑2022‑30632 | Updates golang.org/x/net/http/httpguts . |
CVE‑2020‑29652 | Updates golang.org/x/crypto/ssh . |
CVE‑2021‑33194 | Updates golang.org/x/net/html . |
CVE‑2021‑38561 | Updates golang.org/x/text/language . |
CVE‑2022‑29526 | Updates golang.org/x/sys/unix . |
Release Date: December 22, 2022
PXF 6.5.0 includes these new features and changes:
CREATE EXTERNAL TABLE
option for the *:json
profiles named SPLIT_BY_FILE
that you can use to specify how PXF splits the data it reads. The default value is false
, PXF creates multiple splits for each file that will be processed in parallel. When set to true
, PXF creates and processes a single split per file.LIST
types. Refer to the PXF Parquet Data Type Mapping documentation for more information about the data types supported and the data type mappings.PXF 6.5.0 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2022‑41946 | Updates the postgresql JDBC JAR file to version 42.4.3. |
32387 | PXF did not support reading or writing Parquet LIST types. PXF 6.5.0 includes native support for reading and writing LISTS of certain Parquet types. (Resolved by PR-885 and PR-876.) |
32353 | When PXF read a JSON file containing multi-line records and the external table definition specified both a *:json profile and an IDENTIFIER , PXF could both return wrong results when the data included special characters, and return duplicate rows when the data was compressed with a splittable codec. These issues are resolved. (Resolved by PR-879.) |
N/A | Resolves an issue where, in certain error conditions, PXF failed to close a connection to an external data source. PXF now closes the connection. (Resolved by PR-897.) |
N/A | Resolves an out-of-buffer data access issue by adding additional buffer boundary checks to the PXF extension to guard against invalid reads. (Resolved by PR-885.) |
N/A | Resolves an issue where PXF may have returned incomplete or incorrect results when it did not project a boolean column that was included in a WHERE clause but was not also present in the SELECT list. (Resolved by PR-875.) |
Release Date: September 20, 2022
PXF 6.4.2 includes these changes:
32353
.PXF 6.4.2 resolves these issues:
Issue # | Summary |
---|---|
32439 | Resolves an issue where PXF returned the error expected column <N> to have length <M>, actual length is 0 when it read an ORC or Parquet data file that contained a string that included ASCII NULL-bytes by removing a string length check. (Resolved by PR-870.) |
32353 | Resolves an issue where PXF returned incomplete data when it read a JSON file containing multi-line records, and the external table definition specified both a *:json profile and an IDENTIFIER . (Resolved by PR-858.) |
Release Date: August 19, 2022
PXF 6.4.0 includes these changes:
pxf.orc.write.timezone.utc
to govern how PXF writes ORC timestamp values to the external data store. By default, PXF writes timestamp values using the UTC time zone.PreparedStatement
when reading with the JDBC Connector, using the jdbc.read.prepared-statement
property in jdbc-site.xml
.aws-java-sdk-s3
dependency to version 1.12.261 to resolve CVE-2022-31159.snappy-java
dependency to version 1.1.8.4.postgresql
dependency to version 42.4.1 to resolve CVE-2022-31197.Release Date: July 21, 2022
PXF 6.3.2 includes these changes:
UnsupportedOperationException
s.PXF 6.3.2 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2022‑22965 | Updates Spring to version 2.5.12. (Resolved by PR-789.) |
CVE‑2021‑37404 | Updates Hadoop to version 2.10.2. (Resolved by PR-819.) |
32264 | Resolves an ORC split generation failed error encountered when the HiveORC profile was used to read an ORC file by updating the bundled ORC library to version 1.6.13 to pull in the fix for ORC-1065. (Resolved by PR-815.) |
Release Date: April 27, 2022
PXF 6.3.1 resolves these issues:
Issue # | Summary |
---|---|
32177 | Resolves an issue where PXF returned a NullPointerException while reading from a Hive table when the hive:orc profile and the VECTORIZE=true option were specified, and some of the table data contained repeating values. (Resolved by PR-794.) |
32149 | Resolves an issue where the PXF post-installation script failed when the PXF rpm was installed with the --prefix option (install to a custom location). (Resolved by PR-788.) |
Release Date: March 18, 2022
PXF 6.3.0 includes these new and changed features:
postgresql
JDBC JAR file to mitigate CVE-2022-21724.date
, decimal
, local-timestamp-millis
, local-timestamp-micros
, time-millis
, time-micros
, timestamp-millis
, timestamp-micros
, and uuid
Avro logical types.gpupgrade
to upgrade from Greenplum 5 to Greenplum 6. The PXF package now includes two new scripts, pxf-pre-gpupgrade
and pxf-post-gpupgrade
, that you use during this upgrade process.pxf.service.kerberos.constrained-delegation
to activate this feature.PXF 6.3.0 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2022‑21724 | Updates the postgresql JDBC JAR file to version 42.3.3. (Resolved by PR-760.) |
31992 | Resolves an issue where PXF returned duplicate rows when the hdfs:json profile was used to read a JSON file with multi-line records and the data contained multi-byte characters. (Resolved by PR-738.) |
31112 | Resolves an issue where PXF required that its service principal be configured as a Hadoop proxy user to access a Kerberos-secured Hadoop cluster. (Resolved by PR-707.) |
N/A | Resolves an issue where PXF did not close Hive Metastore connections in a timely manner, which eventually resulted in the exhaustion of the Metastore connection pool. (Resolved by PR-756.) |
Release Date: February 1, 2022
PXF 6.2.3 includes these changes:
log4j2
library to mitigate CVE-2021-44832.go
that it uses to build the pxf
CLI tool to version 1.17.6 to mitigate CVE-2021-44716.stdout/stderr
and ignored to the file $PXF_LOG_DIR/pxf_app.out
.PXF 6.2.3 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2021‑44832 | Updates the bundled log4j2 library to version 2.17.1. (Resolved by PR-735.) |
CVE‑2021‑44716 | Updates the go library to version 1.17.6. (Resolved by PR-740.) |
Release Date: December 22, 2021
PXF 6.2.2 includes these changes:
log4j2
library to mitigate CVE-2021-45105.PXF 6.2.2 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2021‑45105 | Updates the bundled log4j2 library to version 2.17.0. (Resolved by PR-733.) |
31927 | Resolves an issue where the PXF C extension reported a partial file transfer error when a data-less response that the PXF server sent to Greenplum Database failed to include a zero-length chunk. PXF 6.2.2 downgrades the bundled version of Spring Boot to 2.4.3, which does not exhibit the error behavior. (Resolved by PR-732.) |
Release Date: December 17, 2021
PXF 6.2.1 includes these changes:
log4j2
library to mitigate CVE-2021-44228 and CVE-2021-45046.UnsupportedOperationException
when it accesses a Hive transactional table.SKIP_HEADER_COUNT
option for external tables that specified a *:text:multi
profile.jdbc.statement.fetchSize
default value of -2147483648
(Integer.MIN_VALUE
). This setting enables the MySQL JDBC driver to stream the results from a MySQL server, lessening the memory requirements when reading large data sets.hive-site.xml
hive.metastore.failure.retries
property setting to identify the maximum number of times to retry a failed connection to the Hive MetaStore. The default value is one retry. Addressing Hive MetaStore Connection Errors describes when and how to configure this property.PXF 6.2.1 resolves these issues:
Issue # | Summary |
---|---|
CVE‑2021‑45046 | Updates the bundled log4j2 library to version 2.16.0. (Resolved by PR-727.) |
CVE‑2021‑44228 | Updates the bundled log4j2 library to version 2.15.0. (Resolved by PR-723.) |
31955 | Resolves an issue where PXF failed to access a Hive table due to a MetaStore connection issue. PXF now includes retry logic for the MetaStore connection based on the hive.metastore.failure.retries property setting in the hive-site.xml file. (Resolved by PR‑726.) |
31948 | Resolves an issue where PXF ran out of memory when it read a large data set from a MySQL database. PXF now uses a jdbc.statement.fetchSize default value of -2147483648 (Integer.MIN_VALUE ) when it accesses MySQL, which streams the results from a MySQL server to PXF. (Resolved by PR‑721.) |
31906 | Resolves an issue where PXF returned 0 rows when a query was performed on a Hive transactional table instead of reporting that transactional tables are unsupported. PXF now more clearly identifies the problem by returning an UnsupportedOperationException and the error: PXF does not support Hive transactional tables . (Resolved by PR-719.) |
31791 | Resolves an issue where PXF ignored the SKIP_HEADER_COUNT custom option when it read from an external data source via an external table that specified a *:text:multi profile. PXF now recognizes and implements this option for *:text:multi profiles. (Resolved by PR-710.) |
Release Date: September 13, 2021
PXF 6.2.0 includes these new and changed features:
TEXT[]
). Refer to Working with JSON Data for additional information.PXF improves its message logging by:
pxf.sasl.connection.retries
, to specify the maximum number of times that it retries a SASL connection request to an external data source after a refused connection returns a GSS initiate failed
error.pxf.fragmenter-cache.expiration
, to specify the amount of time after which an entry expires and is removed from the fragment cache.PXF 6.2.0 resolves these issues:
Issue # | Summary |
---|---|
N/A | Resolves an issue when using the jdbc profile to write data to a Hive table. The Hive JDBC driver always returned 0 when running an update, and PXF would return an error even if the INSERT ran correctly. (Resolved by PR-662.) |
31675 | Resolves a fragment cache issue that appeared when an external table was re-created within the same transaction in a stored procedure, and the new external table referenced a different LOCATION . (Resolved by PR-691.) |
31657 | Queries on an external table intermittently failed in some Kerberos-secured environments because the Hadoop NameNode erroneously detected a replay attack during Kerberos authentication. This issue is resolved by PR-688. |
31571 | PXF did not support ORC lists. PXF 6.2.0 includes support for reading lists of certain ORC scalar types into a Greenplum Database array of native type. (Resolved by PR-675.) |
31326 | PXF did not support reading a JSON array into a Greenplum Database array-type column. PXF 6.2.0 includes support for reading a JSON array into a text array (TEXT[] ). (Resolved by PR-646.) |
683 | Resolves an issue where PXF incorrectly casted an enum value from the external data source to a string . (Resolved by PR-696.) |
Release Date: June 24, 2021
PXF 6.1.0 includes these new and changed features:
text
. The data returned by PXF is a valid JSON string that you can manipulate with the existing Greenplum Database JSON functions and operators.pxf.connection.upload-timeout
, and is located in the pxf-application.properties file.pxf.connection.timeout
configuration property to set the connection timeout only for read operations. If you previously set this property to specify the write timeout, you should now use pxf.connection.upload-timeout
instead.gp-common-go-libs
supporting library along with its dependencies.PXF 6.1.0 resolves these issues:
Issue # | Summary |
---|---|
31389 | Resolves an issue where certain pxf cluster commands returned the error connect: no such file or directory when the current working directory contained a directory with the same name as the hostname. This issue was resolved by upgrading a dependent library. (Resolved by PR-633.) |
31317 | PXF did not support writing Avro arrays. PXF 6.1.0 includes native support for reading and writing Avro arrays. (Resolved by PR-636.) |
Release Date: May 11, 2021
PXF 6.0.1 resolves these issues:
Issue # | Summary |
---|---|
N/A | Resolves an issue where PXF returned wrong results for batches of ORC data that were shorter than the default batch size. (Resolved by PR-630.) |
N/A | Resolves an issue where PXF threw a NullPointerException when it encountered a repeating ORC column value of type string . (Resolved by PR-627.) |
178013439 | Resolves an issue where using the profile HiveVectorizedORC did not result in vectorized execution. (Resolved by PR-624.) |
31409 | Resolves an issue where PXF intermittently failed with the error ERROR: PXF server error(500) : Failed to initialize HiveResolver when it accessed Hive tables STORED AS ORC . (Resolved by PR-626.) |
Release Date: March 29, 2021
PXF 6.0.0 includes these new and changed features:
Architecture and Bundled Libraries
PXF 6.0.0 is built on the Spring Boot framework:
postgresql-42.2.14.jar
PostgreSQL driver JAR file.Files, Configuration, and Commands
$PXF_BASE
environment variable to identify its runtime configuration directory; it no longer uses $PXF_CONF
for this purpose.$PXF_HOME
, and PXF_BASE=$PXF_HOME
. See About the PXF Installation and Configuration Directories for the new installation file layout.$PXF_BASE
runtime configuration directory to a different directory after you install PXF by running the new pxf [cluster] prepare
command as described in Relocating $PXF_BASE.$PXF_HOME/templates
; they were previously located in the $PXF_CONF/templates
directory.pxf [cluster] register
command now copies only the PXF pxf.control
extension file to the Greenplum Database installation. Run this command after your first installation of PXF, and/or after you upgrade your Greenplum Database installation.init
and reset
commands. pxf [cluster] init
is now equivalent to pxf [cluster] register
, and pxf [cluster] reset
is a no-op.PXF 6 includes new and changed configuration; see About the PXF Configuration Files for more information:
pxf-log4j2.xml
, and is in xml
format.PXF 6 adds a new configuration file for the PXF server application, pxf-application.properties
; this file includes:
pxf.log.level
property to set the PXF logging level.Configuration properties moved from the PXF 5 pxf-env.sh
file and renamed:
pxf-env.sh Property Name | pxf-application.properties Property Name |
---|---|
PXF_MAX_THREADS | pxf.max.threads |
PXF 6 adds new configuration environment variables to pxf-env.sh
to simplify the registration of external library dependencies:
New Property Name | Description |
---|---|
PXF_LOADER_PATH | Additional directories and JARs for PXF to class-load. |
LD_LIBRARY_PATH | Additional directories and native libraries for PXF to load. |
See Registering PXF Library Dependencies for more information.
PXF_FRAGMENTER_CACHE
configuration property; fragment metadata caching is no longer configurable and is now always activated.Profiles
PXF 6 introduces new profile names and deprecates some older profile names. The old profile names still work, but it is highly recommended to switch to using the new profile names:
New Profile Name | Old/Deprecated Profile Name |
---|---|
hive | Hive |
hive:rc | HiveRC |
hive:orc | HiveORC |
hive:orc | HiveVectorizedORC1 |
hive:text | HiveText |
jdbc | Jdbc |
hbase | HBase |
1 To use the HiveVectorizedORC
profile in PXF 6, specify the hive:orc
profile name with the new VECTORIZE=true
custom option.
CSV
profile. See the Hadoop Text and Object Store Text documentation for usage information.VARCHAR
data types.IN
operator when you specify one of the *:parquet
profiles to read a parquet file.*:text
, *:csv
, or *:SequenceFile
profile that includes a COMPRESSION_CODEC
.Monitoring
Logging
psql
client, in some cases including a HINT
that provides possible error resolution actions.$PXF_LOGDIR/pxf-oom.log
rather than catalina.out
.PXF version 6.0.0 removes:
THREAD-SAFE
external table custom option (deprecated since 5.10.0).PXF_USER_IMPERSONATION
, PXF_PRINCIPAL
, and PXF_KEYTAB
configuration properties in pxf-env.sh
(deprecated since 5.10.0).jdbc.user.impersonation
configuration property in jdbc-site.xml
(deprecated since 5.10.0).HdfsTextSimple
, HdfsTextMulti
, Avro
, Json
, Parquet
, and SequenceWritable
(deprecated since 5.0.1).PXF 6.0.0 resolves these issues:
Issue # | Summary |
---|---|
30987 | Resolves an issue where PXF returned an out of memory error while running a query on a Hive table backed by a large number of files when it could not enlarge a string buffer during the fragmentation process. PXF 6.0.0 moves fragment distribution logic and fragment allocation to the PXF Service running on each segment host. |
Deprecated features may be removed in a future major release of PXF. PXF version 6.x deprecates:
DATA-SCHEMA
external table option (deprecated since PXF version 6.6.0).PXF_FRAGMENTER_CACHE
configuration property (deprecated since PXF version 6.0.0).pxf [cluster] init
commands (deprecated since PXF version 6.0.0).pxf [cluster] reset
commands (deprecated since PXF version 6.0.0).Hive
, HiveText
, HiveRC
, HiveORC
, and HiveVectorizedORC
(deprecated since PXF version 6.0.0). Refer to Connectors, Data Formats, and Profiles in the PXF Hadoop documentation for the new profile names.HBase
profile name (now hbase
) (deprecated since PXF version 6.0.0).Jdbc
profile name (now jdbc
) (deprecated since PXF version 6.0.0).COMPRESSION_CODEC
using the Java class name; use the codec short name instead.PXF 6.x has these known issues and limitations:
Issue # | Description |
---|---|
178013439 | (Resolved in 6.0.1) Using the deprecated HiveVectorizedORC profile does not result in vectorized execution.Workaround: Use the hive:orc profile with the option VECTORIZE=true . |
31409 | (Resolved in 6.0.1) PXF can intermittently fail with the following error when it accesses Hive tables STORED AS ORC :ERROR: PXF server error(500) : Failed to initialize HiveResolver Workaround: Use vectorized query execution by adding the VECTORIZE=true custom option to the LOCATION URL. (Note that PXF does not support predicate pushdown, complex types, and the timestamp data type with ORC vectorized execution.) |
168957894 | The PXF Hive Connector does not support using the hive[:*] profiles to access Hive 3 managed (CRUD and insert-only transactional, and temporary) tables.Workaround: Use the PXF JDBC Connector to access Hive 3 managed tables. |