External tables enable you to access external data as if it were a regular database table. They are often used to move data into and out of a Greenplum database, and can utilize Greenplum parallelism by using the resources of all Greenplum Database segments to load or unload data.
When you create an external table definition, you specify the structure and format of the data, the data access protocol, the location of the external data source, and other protocol-specific or format-specific options.
ImportantGreenplum 7 internally converts external tables to foreign tables, and internally operates on and represents the table using the foreign table data structures and catalog. Refer to Understanding the External Table to Foreign Table Mapping for detailed information about this conversion, and its runtime implications.
Parent topic: Working with External Data
An external table protocol identifies the data access method for an external data source. Greenplum Database supports the following built-in (automatically-enabled) and opt-in (you enable) external table protocols:
Protocol Name | Description | Type |
---|---|---|
file:// |
Use to access external data files on segment hosts that the Greenplum Database superuser (gpadmin ) can access. Refer to file:// Protocol. |
Built-in |
gpfdist:// |
Use to serve external data files to all Greenplum Database segments in parallel. See gpfdist:// Protocol. | Built-in |
gpfdists:// |
The secure version of gpfdist . See gpfdists:// Protocol. |
Built-in |
pxf:// |
Use to access external object store systems (Azure, Google Cloud Storage, Minio, S3-compatible), Hadoop systems (HDFS, Hive, HBase), network file systems, and SQL databases with the Greenplum Platform Extension Framework (PXF). See pxf:// Protocol. | Opt-in |
s3:// |
Use to access files in an Amazon S3 bucket. See s3:// Protocol. | Opt-in |
A third party can also create a custom protocol that connects Greenplum Database to new external data sources. Refer to Creating a Custom Protocol for more information.
The opt-in/custom and built-in protocols differ in these ways:
pg_extprotocol
catalog table to specify the handler functions that implement the protocol. The built-in protocols are not represented in this table.GRANT [SELECT | INSERT | ALL] ON PROTOCOL <name>
to grant users permissions on opt-in and custom protocols. To allow (or deny) users access to the built-in protocols, you use the CREATE ROLE
or ALTER ROLE
commands to add the CREATEEXTTABLE
(or NOCREATEEXTTABLE
) attribute to each user's role.Greenplum Database provides both readable (CREATE EXTERNAL TABLE) and writable (CREATE WRITABLE EXTERNAL TABLE
) external tables.
Readable external tables are typically used for data loading, and allow only SELECT operations. They support:
WHERE
clause, it may be passed to the external data source. Refer to the gp_external_enable_filter_pushdown server configuration parameter discussion for more information. Note that this feature is currently supported only with the pxf
protocol (see pxf:// Protocol).Writable external tables are typically used for data unloading, and allow only INSERT operations. They support:
External tables are typically file-based or web-based:
http://
protocol or by running OS commands or scripts. External web tables are not rescannable, the data can change while the query runs. Refer to Creating and Using External Web Tables for more information.The pxf://
protocol can also access other types of external data, such as SQL databases.
When you create an external table definition, you specify the structure and format of the data, the access protocol, the location of the external data source, and other protocol-specific or format-specific options.
Examples for Creating External Tables provides examples for different data types and different built-in protocols.
ImportantAfter you create an external table, you must operate on the table using
ALTER EXTERNAL TABLE
andDROP EXTERNAL TABLE
commands. VMware does not recommend mixing and matching external table and foreign table syntaxes for table maintenance operations.
Greenplum Database backup and restore operations back up and restore only external and external web table definitions, not the data source data.
By default, if external table data contains an error, the command fails and no data loads into the target database table. See Handling Errors in External Table Data for more information.