PXF is installed with connectors to Azure Blob Storage, Azure Data Lake, Google Cloud Storage, AWS, MinIO, and Dell ECS S3-compatible object stores.

Prerequisites

Before working with object store data using PXF, ensure that:

Connectors, Data Formats, and Profiles

The PXF object store connectors provide built-in profiles to support the following data formats:

  • Text
  • CSV
  • Avro
  • JSON
  • ORC
  • Parquet
  • AvroSequenceFile
  • SequenceFile

The PXF connectors to Azure expose the following profiles to read, and in many cases write, these supported data formats:

Data Format Azure Blob Storage Azure Data Lake Supported Operations
delimited single line plain text wasbs:text adl:text Read, Write
delimited single line comma-separated values of plain text wasbs:csv adl:csv Read, Write
multi-byte or multi-character delimited single line csv wasbs:csv adl:csv Read
delimited text with quoted linefeeds wasbs:text:multi adl:text:multi Read
fixed width single line text wasbs:fixedwidth adl:fixedwidth Read, Write
Avro wasbs:avro adl:avro Read, Write
JSON wasbs:json adl:json Read, Write
ORC wasbs:orc adl:orc Read, Write
Parquet wasbs:parquet adl:parquet Read, Write
AvroSequenceFile wasbs:AvroSequenceFile adl:AvroSequenceFile Read, Write
SequenceFile wasbs:SequenceFile adl:SequenceFile Read, Write

Similarly, the PXF connectors to Google Cloud Storage, and S3-compatible object stores expose these profiles:

Data Format Google Cloud Storage AWS S3, MinIO, or Dell ECS Supported Operations
delimited single line plain text gs:text s3:text Read, Write
delimited single line comma-separated values of plain text gs:csv s3:csv Read, Write
multi-byte or multi-character delimited single line comma-separated values csv gs:csv s3:csv Read
delimited text with quoted linefeeds gs:text:multi s3:text:multi Read
fixed width single line text gs:fixedwidth s3:fixedwidth Read, Write
Avro gs:avro s3:avro Read, Write
JSON gs:json s3:json Read
ORC gs:orc s3:orc Read, Write
Parquet gs:parquet s3:parquet Read, Write
AvroSequenceFile gs:AvroSequenceFile s3:AvroSequenceFile Read, Write
SequenceFile gs:SequenceFile s3:SequenceFile Read, Write

You provide the profile name when you specify the pxf protocol on a CREATE EXTERNAL TABLE command to create a Greenplum Database external table that references a file or directory in the specific object store.

Sample CREATE EXTERNAL TABLE Commands

Note: When you create an external table that references a file or directory in an object store, you must specify a SERVER in the LOCATION URI.

The following command creates an external table that references a text file on S3. It specifies the profile named s3:text and the server configuration named s3srvcfg:

CREATE EXTERNAL TABLE pxf_s3_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://S3_BUCKET/pxf_examples/pxf_s3_simple.txt?PROFILE=s3:text&SERVER=s3srvcfg')
FORMAT 'TEXT' (delimiter=E',');

The following command creates an external table that references a text file on Azure Blob Storage. It specifies the profile named wasbs:text and the server configuration named wasbssrvcfg. You would provide the Azure Blob Storage container identifier and your Azure Blob Storage account name.

CREATE EXTERNAL TABLE pxf_wasbs_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://AZURE_CONTAINER@YOUR_AZURE_BLOB_STORAGE_ACCOUNT_NAME.blob.core.windows.net/path/to/blob/file?PROFILE=wasbs:text&SERVER=wasbssrvcfg')
FORMAT 'TEXT';

The following command creates an external table that references a text file on Azure Data Lake. It specifies the profile named adl:text and the server configuration named adlsrvcfg. You would provide your Azure Data Lake account name.

CREATE EXTERNAL TABLE pxf_adl_text(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://YOUR_ADL_ACCOUNT_NAME.azuredatalakestore.net/path/to/file?PROFILE=adl:text&SERVER=adlsrvcfg')
FORMAT 'TEXT';

The following command creates an external table that references a JSON file on Google Cloud Storage. It specifies the profile named gs:json and the server configuration named gcssrvcfg:

CREATE EXTERNAL TABLE pxf_gsc_json(location text, month text, num_orders int, total_sales float8)
  LOCATION ('pxf://dir/subdir/file.json?PROFILE=gs:json&SERVER=gcssrvcfg')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
check-circle-line exclamation-circle-line close-line
Scroll to top icon