PXF is installed with connectors to Azure Blob Storage, Azure Data Lake, Google Cloud Storage, AWS, MinIO, and Dell ECS S3-compatible object stores.
Before working with object store data using PXF, ensure that:
The PXF object store connectors provide built-in profiles to support the following data formats:
The PXF connectors to Azure expose the following profiles to read, and in many cases write, these supported data formats:
Data Format | Azure Blob Storage | Azure Data Lake | Supported Operations |
---|---|---|---|
delimited single line plain text | wasbs:text | adl:text | Read, Write |
delimited single line comma-separated values of plain text | wasbs:csv | adl:csv | Read, Write |
delimited text with quoted linefeeds | wasbs:text:multi | adl:text:multi | Read |
Avro | wasbs:avro | adl:avro | Read, Write |
JSON | wasbs:json | adl:json | Read |
ORC | wasbs:orc | adl:orc | Read, Write |
Parquet | wasbs:parquet | adl:parquet | Read, Write |
AvroSequenceFile | wasbs:AvroSequenceFile | adl:AvroSequenceFile | Read, Write |
SequenceFile | wasbs:SequenceFile | adl:SequenceFile | Read, Write |
Similarly, the PXF connectors to Google Cloud Storage, and S3-compatible object stores expose these profiles:
Data Format | Google Cloud Storage | AWS S3, MinIO, or Dell ECS | Supported Operations |
---|---|---|---|
delimited single line plain text | gs:text | s3:text | Read, Write |
delimited single line comma-separated values of plain text | gs:csv | s3:csv | Read, Write |
delimited text with quoted linefeeds | gs:text:multi | s3:text:multi | Read |
Avro | gs:avro | s3:avro | Read, Write |
JSON | gs:json | s3:json | Read |
ORC | gs:orc | s3:orc | Read, Write |
Parquet | gs:parquet | s3:parquet | Read, Write |
AvroSequenceFile | gs:AvroSequenceFile | s3:AvroSequenceFile | Read, Write |
SequenceFile | gs:SequenceFile | s3:SequenceFile | Read, Write |
You provide the profile name when you specify the pxf
protocol on a CREATE EXTERNAL TABLE
command to create a Greenplum Database external table that references a file or directory in the specific object store.
SERVER
in the
LOCATION
URI.
The following command creates an external table that references a text file on S3. It specifies the profile named s3:text
and the server configuration named s3srvcfg
:
CREATE EXTERNAL TABLE pxf_s3_text(location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://S3_BUCKET/pxf_examples/pxf_s3_simple.txt?PROFILE=s3:text&SERVER=s3srvcfg') FORMAT 'TEXT' (delimiter=E',');
The following command creates an external table that references a text file on Azure Blob Storage. It specifies the profile named wasbs:text
and the server configuration named wasbssrvcfg
. You would provide the Azure Blob Storage container identifier and your Azure Blob Storage account name.
CREATE EXTERNAL TABLE pxf_wasbs_text(location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://AZURE_CONTAINER@YOUR_AZURE_BLOB_STORAGE_ACCOUNT_NAME.blob.core.windows.net/path/to/blob/file?PROFILE=wasbs:text&SERVER=wasbssrvcfg') FORMAT 'TEXT';
The following command creates an external table that references a text file on Azure Data Lake. It specifies the profile named adl:text
and the server configuration named adlsrvcfg
. You would provide your Azure Data Lake account name.
CREATE EXTERNAL TABLE pxf_adl_text(location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://YOUR_ADL_ACCOUNT_NAME.azuredatalakestore.net/path/to/file?PROFILE=adl:text&SERVER=adlsrvcfg') FORMAT 'TEXT';
The following command creates an external table that references a JSON file on Google Cloud Storage. It specifies the profile named gs:json
and the server configuration named gcssrvcfg
:
CREATE EXTERNAL TABLE pxf_gsc_json(location text, month text, num_orders int, total_sales float8) LOCATION ('pxf://dir/subdir/file.json?PROFILE=gs:json&SERVER=gcssrvcfg') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');