This topic provides an overview of PXF server configuration. To configure a server, refer to the topic specific to the connector that you want to configure.
You read from or write data to an external data store via a PXF connector. To access an external data store, you must provide the server location. You may also be required to provide client access credentials and other external data store-specific properties. PXF simplifies configuring access to external data stores by:
A PXF Server definition is simply a named configuration that provides access to a specific external data store. A PXF server name is the name of a directory residing in $PXF_BASE/servers/
. The information that you provide in a server configuration is connector-specific. For example, a PXF JDBC Connector server definition may include settings for the JDBC driver class name, URL, username, and password. You can also configure connection-specific and session-specific properties in a JDBC server definition.
PXF provides a server template file for each connector; this template identifies the typical set of properties that you must configure to use the connector.
You will configure a server definition for each external data store that Greenplum Database users need to access. For example, if you require access to two Hadoop clusters, you will create a PXF Hadoop server configuration for each cluster. If you require access to an Oracle and a MySQL database, you will create one or more PXF JDBC server configurations for each database.
A server configuration may include default settings for user access credentials and other properties for the external data store. You can allow Greenplum Database users to access the external data store using the default settings, or you can configure access and other properties on a per-user basis. This allows you to configure different Greenplum Database users with different external data store access credentials in a single PXF server definition.
The configuration information for a PXF server resides in one or more <connector>-site.xml
files in $PXF_BASE/servers/<server_name>/
.
PXF provides a template configuration file for each connector. These server template configuration files are located in the <PXF_INSTALL_DIR>/templates/
directory after you install PXF:
gpadmin@gpmaster$ ls <PXF_INSTALL_DIR>/templates
adl-site.xml hbase-site.xml jdbc-site.xml pxf-site.xml yarn-site.xml
core-site.xml hdfs-site.xml mapred-site.xml s3-site.xml
gs-site.xml hive-site.xml minio-site.xml wasbs-site.xml
For example, the contents of the s3-site.xml
template file follow:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>YOUR_AWS_ACCESS_KEY_ID</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>YOUR_AWS_SECRET_ACCESS_KEY</value>
</property>
<property>
<name>fs.s3a.fast.upload</name>
<value>true</value>
</property>
</configuration>
Note: The template files for the Hadoop connectors are not intended to be modified and used for configuration, as they only provide an example of the information needed. Instead of modifying the Hadoop templates, you will copy several Hadoop *-site.xml
files from the Hadoop cluster to your PXF Hadoop server configuration.
PXF defines a special server named default
. The PXF installation creates a $PXF_BASE/servers/default/
directory. This directory, initially empty, identifies the default PXF server configuration. You can configure and assign the default PXF server to any external data source. For example, you can assign the PXF default server to a Hadoop cluster, or to a MySQL database that your users frequently access.
PXF automatically uses the default
server configuration if you omit the SERVER=<server_name>
setting in the CREATE EXTERNAL TABLE
command LOCATION
clause.
When you configure a PXF connector to an external data store, you add a named PXF server configuration for the connector. Among the tasks that you perform, you may:
default
PXF server, or choose a new name for the server configuration.$PXF_BASE/servers/<server_name>
.Note: You must re-sync the PXF configuration to the Greenplum Database cluster after you add or update PXF server configuration.
After you configure a PXF server, you publish the server name to Greenplum Database users who need access to the data store. A user need only provide the server name when they create an external table that accesses the external data store. PXF obtains the external data source location and access credentials from server and user configuration files residing in the server configuration directory identified by the server name.
To configure a PXF server, refer to the connector configuration topic:
PXF includes a template file named pxf-site.xml
for PXF-specific configuration parameters. You can use the pxf-site.xml
template file to configure:
You configure properties in the pxf-site.xml
file for a PXF server when one or more of the following conditions hold:
pxf-site.xml
includes the following properties:
Property | Description | Default Value |
---|---|---|
pxf.service.kerberos.principal | The Kerberos principal name. | gpadmin/[email protected] |
pxf.service.kerberos.keytab | The file system path to the Kerberos keytab file. | $PXF_BASE/keytabs/pxf.service.keytab |
pxf.service.kerberos.constrained-delegation | Activates/deactivates Kerberos constrained delegation. Note: This property is applicable only to Hadoop PXF server configurations, it is not applicable to JDBC PXF servers. | false |
pxf.service.user.impersonation | Activates/deactivates user impersonation when connecting to the remote system. | If the pxf.service.user.impersonation property is missing from pxf-site.xml , the default is true (activated) for PXF Hadoop servers and false (deactivated) for JDBC servers. |
pxf.service.user.name | The login user for the remote system. | This property is commented out by default. When the property is unset, the default value is the operating system user that starts the pxf process, typically gpadmin . When the property is set, the default value depends on the user impersonation setting and, if you are accessing Hadoop, whether or not you are accessing a Kerberos-secured cluster; see the Use Cases and Configuration Scenarios section in the Configuring the Hadoop User, User Impersonation, and Proxying topic. |
pxf.fs.basePath | Identifies the base path or share point on the remote file system. This property is applicable when the server configuration is used with a profile that accesses a file. | None; this property is commented out by default. |
pxf.ppd.hive1 | Specifies whether or not predicate pushdown is enabled for queries on external tables that specify the hive , hive:rc , or hive:orc profiles. |
True; predicate pushdown is enabled. |
pxf.sasl.connection.retries | Specifies the maximum number of times that PXF retries a SASL connection request after a refused connection returns a GSS initiate failed error. |
5 |
pxf.parquet.write.decimal.overflow | Specifies how PXF handles numeric data that exceeds the maximum precision of 38 and overflows when writing to a Parquet file. Valid values are: round, error, or ignore | round |
1 Should you need to, you can override this setting on a per-table basis by specifying the &PPD=<boolean>
option in the LOCATION
clause when you create the external table.
Refer to Configuring PXF Hadoop Connectors and Configuring the JDBC Connector for information about relevant pxf-site.xml
property settings for Hadoop and JDBC server configurations, respectively. See Configuring a PXF Network File System Server for information about relevant pxf-site.xml
property settings when you configure a PXF server to access a network file system.
You can use the pxf.fs.basePath
property to restrict a user’s access to files in a specific remote directory. When set, this property applies to any profile that accesses a file, including *:text
, *:parquet
, *:json
, etc.
When you configure the pxf.fs.basePath
property for a server, PXF considers the file path specified in the CREATE EXTERNAL TABLE
LOCATION
clause to be relative to this base path setting, and constructs the remote path accordingly.
pxf.fs.basePath
when you configure a PXF server for access to a network file system with a
file:*
profile. This property is optional for a PXF server that accesses a file in Hadoop or in an object store.
You can configure access to an external data store on a per-server, per-Greenplum-user basis.
You configure external data store user access credentials and properties for a specific Greenplum Database user by providing a <greenplum_user_name>-user.xml
user configuration file in the PXF server configuration directory, $PXF_BASE/servers/<server_name>/
. For example, you specify the properties for the Greenplum Database user named bill
in the file $PXF_BASE/servers/<server_name>/bill-user.xml
. You can configure zero, one, or more users in a PXF server configuration.
The properties that you specify in a user configuration file are connector-specific. You can specify any configuration property supported by the PXF connector server in a <greenplum_user_name>-user.xml
configuration file.
For example, suppose you have configured access to a PostgreSQL database in the PXF JDBC server configuration named pgsrv1
. To allow the Greenplum Database user named bill
to access this database as the PostgreSQL user named pguser1
, password changeme
, you create the user configuration file $PXF_BASE/servers/pgsrv1/bill-user.xml
with the following properties:
<configuration>
<property>
<name>jdbc.user</name>
<value>pguser1</value>
</property>
<property>
<name>jdbc.password</name>
<value>changeme</value>
</property>
</configuration>
If you want to configure a specific search path and a larger read fetch size for bill
, you would also add the following properties to the bill-user.xml
user configuration file:
<property>
<name>jdbc.session.property.search_path</name>
<value>bill_schema</value>
</property>
<property>
<name>jdbc.statement.fetchSize</name>
<value>2000</value>
</property>
For each PXF user that you want to configure, you will:
Create/edit the file $PXF_BASE/servers/<server_name>/<greenplum_user_name>-user.xml
, and add the outer configuration block:
<configuration>
</configuration>
<greenplum_user_name>-user.xml
file.A PXF server configuration may include default settings for user access credentials and other properties for accessing an external data store. Some PXF connectors, such as the S3 and JDBC connectors, allow you to directly specify certain server properties via custom options in the CREATE EXTERNAL TABLE
command LOCATION
clause. A <greenplum_user_name>-user.xml
file specifies property settings for an external data store that are specific to a Greenplum Database user.
For a given Greenplum Database user, PXF uses the following precedence rules (highest to lowest) to obtain configuration property settings for the user:
<server_name>/<greenplum_user_name>-user.xml
overrides any setting of the property elsewhere.CREATE EXTERNAL TABLE
command LOCATION
clause overrides any setting of the property in a PXF server configuration.<server_name>
PXF server definition identify the default property values.These precedence rules allow you create a single external table that can be accessed by multiple Greenplum Database users, each with their own unique external data store user credentials.
To access an external data store, the Greenplum Database user specifies the server name in the CREATE EXTERNAL TABLE
command LOCATION
clause SERVER=<server_name>
option. The <server_name>
that the user provides identifies the server configuration directory from which PXF obtains the configuration and credentials to access the external data store.
For example, the following command accesses an S3 object store using the server configuration defined in the $PXF_BASE/servers/s3srvcfg/s3-site.xml
file:
CREATE EXTERNAL TABLE pxf_ext_tbl(name text, orders int) LOCATION ('pxf://BUCKET/dir/file.txt?PROFILE=s3:text&SERVER=s3srvcfg') FORMAT 'TEXT' (delimiter=E',');
PXF automatically uses the default
server configuration when no SERVER=<server_name>
setting is provided.
For example, if the default
server configuration identifies a Hadoop cluster, the following example command references the HDFS file located at /path/to/file.txt
:
CREATE EXTERNAL TABLE pxf_ext_hdfs(location text, miles int) LOCATION ('pxf://path/to/file.txt?PROFILE=hdfs:text') FORMAT 'TEXT' (delimiter=E',');
<server_name>
user. If no user-specific credentials are configured for
<server_name>
, the Greenplum user accesses the external data store with the default credentials configured for
<server_name>
.