Greenplum Database leverages the parallel architecture of a Hadoop Distributed File System to read and write data files efficiently using the gphdfs
protocol.
Note: The gphdfs
external table protocol is deprecated and will be removed in the next major release of Greenplum Database. Consider using the Greenplum Platform Extension Framework (PXF) pxf
external table protocol to access data stored in a Hadoop file system.
There are three steps to using the gphdfs protocol with HDFS:
For information about using Greenplum Database external tables with Amazon EMR when Greenplum Database is installed on Amazon Web Services (AWS), also see Using Amazon EMR with Greenplum Database installed on AWS (Deprecated).
Specify gphdfs Protocol in an External Table Definition (Deprecated)
HDFS Readable and Writable External Table Examples (Deprecated)
Reading and Writing Custom-Formatted HDFS Data with gphdfs (Deprecated)
Using Amazon EMR with Greenplum Database installed on AWS (Deprecated)
Parent topic: Working with External Data