You can use PXF to access S3-compatible object stores. This topic describes how to configure the PXF connectors to these external data sources.
If you do not plan to use these PXF object store connectors, then you do not need to perform this procedure.
To access data in an object store, you must provide a server location and client credentials. When you configure a PXF object store connector, you add at least one named PXF server configuration for the connector as described in Configuring PXF Servers.
PXF provides a configuration file template for most object store connectors. These template files are located in the <PXF_INSTALL_DIR>/templates/
directory.
The template configuration file for MinIO is <PXF_INSTALL_DIR>/templates/minio-site.xml
. When you configure a MinIO server, you must provide the following server configuration properties and replace the template values with your credentials:
Property | Description | Value |
---|---|---|
fs.s3a.endpoint | The MinIO S3 endpoint to which to connect. | Your endpoint. |
fs.s3a.access.key | The MinIO account access key. | Your MinIO user name. |
fs.s3a.secret.key | The MinIO secret key associated with the access key. | Your MinIO password. |
fs.s3a.fast.upload | Property that governs fast upload; the default value is false . |
Set to true to enable fast upload. |
fs.s3a.path.style.access | Property that governs file specification via paths; the default value is false . |
Set to true to enable path style access. |
The template configuration file for S3 is <PXF_INSTALL_DIR>/templates/s3-site.xml
. When you configure an S3 server, you must provide the following server configuration properties and replace the template values with your credentials:
Property | Description | Value |
---|---|---|
fs.s3a.access.key | The AWS account access key ID. | Your access key. |
fs.s3a.secret.key | The secret key associated with the AWS access key ID. | Your secret key. |
If required, fine-tune PXF S3 connectivity by specifying properties identified in the S3A section of the Hadoop-AWS module documentation in your s3-site.xml
server configuration file.
You can override the credentials for an S3 server configuration by directly specifying the S3 access ID and secret key via custom options in the CREATE EXTERNAL TABLE
command LOCATION
clause. Refer to Overriding the S3 Server Configuration with DDL for additional information.
PXF supports Amazon Web Service S3 Server-Side Encryption (SSE) for S3 files that you access with readable and writable Greenplum Database external tables that specify the pxf
protocol and an s3:*
profile. AWS S3 server-side encryption protects your data at rest; it encrypts your object data as it writes to disk, and transparently decrypts the data for you when you access it.
PXF supports the following AWS SSE encryption key management schemes:
Your S3 access key and secret key govern your access to all S3 bucket objects, whether the data is encrypted or not.
S3 transparently decrypts data during a read operation of an encrypted file that you access via a readable external table that is created by specifying the pxf
protocol and an s3:*
profile. No additional configuration is required.
To encrypt data that you write to S3 via this type of external table, you have two options:
s3-site.xml
configuration file.You can create S3 Bucket Policy(s) that identify the objects that you want to encrypt, the encryption key management scheme, and the write actions permitted on those objects. Refer to Protecting Data Using Server-Side Encryption in the AWS S3 documentation for more information about the SSE encryption key management schemes. How Do I Enable Default Encryption for an S3 Bucket? describes how to set default encryption bucket policies.
You must include certain properties in s3-site.xml
to configure server-side encryption in a PXF S3 server configuration. The properties and values that you add to the file are dependent upon the SSE encryption key management scheme.
SSE-S3
To enable SSE-S3 on any file that you write to any S3 bucket, set the following encryption algorithm property and value in the s3-site.xml
file:
<property>
<name>fs.s3a.server-side-encryption-algorithm</name>
<value>AES256</value>
</property>
To enable SSE-S3 for a specific S3 bucket, use the property name variant that includes the bucket name. For example:
<property>
<name>fs.s3a.bucket.YOUR_BUCKET1_NAME.server-side-encryption-algorithm</name>
<value>AES256</value>
</property>
Replace YOUR_BUCKET1_NAME
with the name of the S3 bucket.
SSE-KMS
To enable SSE-KMS on any file that you write to any S3 bucket, set both the encryption algorithm and encryption key ID. To set these properties in the s3-site.xml
file:
<property>
<name>fs.s3a.server-side-encryption-algorithm</name>
<value>SSE-KMS</value>
</property>
<property>
<name>fs.s3a.server-side-encryption.key</name>
<value>YOUR_AWS_SSE_KMS_KEY_ARN</value>
</property>
Substitute YOUR_AWS_SSE_KMS_KEY_ARN
with your key resource name. If you do not specify an encryption key, the default key defined in the Amazon KMS is used. Example KMS key: arn:aws:kms:us-west-2:123456789012:key/1a23b456-7890-12cc-d345-6ef7890g12f3
.
Note: Be sure to create the bucket and the key in the same Amazon Availability Zone.
To enable SSE-KMS for a specific S3 bucket, use property name variants that include the bucket name. For example:
<property>
<name>fs.s3a.bucket.YOUR_BUCKET2_NAME.server-side-encryption-algorithm</name>
<value>SSE-KMS</value>
</property>
<property>
<name>fs.s3a.bucket.YOUR_BUCKET2_NAME.server-side-encryption.key</name>
<value>YOUR_AWS_SSE_KMS_KEY_ARN</value>
</property>
Replace YOUR_BUCKET2_NAME
with the name of the S3 bucket.
SSE-C
To enable SSE-C on any file that you write to any S3 bucket, set both the encryption algorithm and the encryption key (base-64 encoded). All clients must share the same key.
To set these properties in the s3-site.xml
file:
<property>
<name>fs.s3a.server-side-encryption-algorithm</name>
<value>SSE-C</value>
</property>
<property>
<name>fs.s3a.server-side-encryption.key</name>
<value>YOUR_BASE64-ENCODED_ENCRYPTION_KEY</value>
</property>
To enable SSE-C for a specific S3 bucket, use the property name variants that include the bucket name as described in the SSE-KMS example.
In this procedure, you name and add a PXF server configuration in the $PXF_BASE/servers
directory on the Greenplum Database master host for the S3 Cloud Storage connector. You then use the pxf cluster sync
command to sync the server configuration(s) to the Greenplum Database cluster.
Log in to your Greenplum Database master host:
$ ssh gpadmin@<gpmaster>
Choose a name for the server. You will provide the name to end users that need to reference files in the object store.
Create the $PXF_BASE/servers/<server_name>
directory. For example, use the following command to create a server configuration for an S3 server named s3srvcfg
:
gpadmin@gpmaster$ mkdir $PXF_BASE/servers/s3srvcfg
Copy the PXF template file for S3 to the server configuration directory. For example:
gpadmin@gpmaster$ cp <PXF_INSTALL_DIR>/templates/s3-site.xml $PXF_BASE/servers/s3srvcfg/
Open the template server configuration file in the editor of your choice, and provide appropriate property values for your environment. For example:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>access_key_for_user1</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>secret_key_for_user1</value>
</property>
<property>
<name>fs.s3a.fast.upload</name>
<value>true</value>
</property>
</configuration>
Save your changes and exit the editor.
Use the pxf cluster sync
command to copy the new server configuration to the Greenplum Database cluster:
gpadmin@gpmaster$ pxf cluster sync
There is no template server configuration file for Dell ECS. You can use the MinIO server configuration template, <PXF_INSTALL_DIR>/templates/minio-site.xml
.
When you configure a Dell ECS server, you must provide the following server configuration properties and replace the template values with your credentials:
Property | Description | Value |
---|---|---|
fs.s3a.endpoint | The Dell ECS S3 endpoint to which to connect. | Your ECS endpoint. |
fs.s3a.access.key | The Dell ECS account access key. | Your ECS user name. |
fs.s3a.secret.key | The Dell ECS secret key associated with the access key. | Your ECS secret key1. |
fs.s3a.fast.upload | Property that governs fast upload; the default value is false . |
Set to true to enable fast upload. |
fs.s3a.path.style.access | Property that governs file specification via paths; the default value is false . |
Set to true to enable path style access. |