Serves data files to or writes data files out from Greenplum Database segments.
**gpfdist.exe** [**-d** directory] [**-p** http\_port] [**-l** log\_file] [**-t** timeout]
[**-S**] [**-w** time] [**-v** | **-V**] [**-s**] [**-m** max\_length] [**--ssl** certificate\_path]
**gpfdist.exe -?** | **--help**
**gpfdist.exe --version**
gpfdist.exe
is Greenplum's parallel file distribution program. It is used by readable external tables and gpload.py
to serve external table files to all Greenplum Database segments in parallel. It is used by writable external tables to accept output streams from Greenplum Database segments in parallel and write them out to a file.
In order for gpfdist.exe
to be used by an external table, the LOCATION
clause of the external table definition must specify the external table data using the gpfdist://
protocol (see the Greenplum Database command CREATE EXTERNAL TABLE
).
Note: If the --ssl
option is specified to enable SSL security, create the external table with the gpfdists://
protocol.
The benefit of using gpfdist.exe
is that you are guaranteed maximum parallelism while reading from or writing to external tables, thereby offering the best performance as well as easier administration of external tables.
For readable external tables, gpfdist.exe
parses and serves data files evenly to all the segment instances in the Greenplum Database system when users SELECT
from the external table. For writable external tables, gpfdist.exe
accepts parallel output streams from the segments when users INSERT
into the external table, and writes to an output file.
For readable external tables, if load files are compressed using gzip
or bzip2
(have a .gz
or .bz2
file extension), gpfdist.exe
uncompresses the files automatically before loading provided that gunzip
or bunzip2
is in your path.
Note: Currently, readable external tables do not support compression on Windows platforms, and writable external tables do not support compression on any platforms.
Most likely, you will want to run gpfdist.exe
on your ETL machines rather than the hosts where Greenplum Database is installed. To install gpfdist.exe
on another host, simply copy the utility over to that host and add gpfdist.exe
to your PATH
.
Note: When using IPv6, always enclose the numeric IP address in brackets.
You can also run gpfdist.exe
as a Windows Service. See Running gpfdist as a Windows Service for more details.
gpfdist.exe
will serve files for readable external tables or create output files for writable external tables. If not specified, defaults to the current directory.
gpfdist.exe
will serve files. Defaults to 8080.
gpfdist.exe
process. Default is 5 seconds. Allowed values are 2 to 7200 seconds (2 hours). May need to be increased on systems with a lot of network traffic.
Sets the maximum allowed data row length in bytes. Default is 32768. Should be used when user data includes very wide rows (or when line too long
error message occurs). Should not be used otherwise as it increases resource allocation. Valid range is 32K to 256MB. (The upper limit is 1MB on Windows systems.)
Note: Memory issues might occur if you specify a large maximum row length and run a large number of gpfdist
concurrent connections. For example, setting this value to the maximum of 1MB with 96 concurrent gpfdist
processes requires approximately 97GB of memory ((96 + 1) x 1MB
).
Enables simplified logging. When this option is specified, only messages with WARN
level and higher are written to the gpfdist
log file. INFO
level messages are not written to the log file. If this option is not specified, all gpfdist
messages are written to the log file.
O_SYNC
flag. Any writes to the resulting file descriptor block
gpfdist.exe
until the data is physically written to the underlying hardware.
Sets the number of seconds that Greenplum Database delays before closing a target file such as a named pipe. The default value is 0, no delay. The maximum value is 7200 seconds (2 hours).
Adds SSL encryption to data transferred with gpfdist.exe
. After executing gpfdist.exe
with the --ssl certificate\_path
option, the only way to load data from this file server is with the gpfdist://
protocol. For information on the gpfdist://
protocol, see "Loading and Unloading Data" in the Greenplum Database Administrator Guide.
The location specified in certificate_path must contain the following files:
server.crt
server.key
root.crt
The root directory (/
) cannot be specified as certificate_path.
Greenplum Loaders allow gpfdist.exe
to run as a Windows Service.
Follow the instructions below to download, register and activate gpfdist.exe
as a service:
Update your Greenplum Loader package to the latest version. This package is available from Broadcom Support Portal under the specific Greenplum release.
NoteFor more information about download prerequisites, troubleshooting, and instructions, see Download Broadcom products and software.
Register gpfdist
as a Windows service:
Open a Windows command window
Run the following command:
sc create gpfdist binpath= "path\_to\_gpfdist.exe -p 8081 -d External\load\files\path -l Log\file\path"
You can create multiple instances of gpfdist
by running the same command again, with a unique name and port number for each instance:
sc create gpfdistN binpath= "path\_to\_gpfdist.exe -p 8082 -d External\load\files\path -l Log\file\path"
Activate the gpfdist
service:
Open the Windows Control Panel and select Administrative Tools > Services.
Highlight then right-click on the gpfdist
service in the list of services.
Select Properties from the right-click menu, the Service Properties window opens.
Note that you can also stop this service from the Service Properties window.
Optional: Change the Startup Type to Automatic (after a system restart, this service will be running), then under Service status, click Start.
Click OK.
Repeat the above steps for each instance of gpfdist
that you created.
If the gpfdist
utility hangs with no read or write activity occurring, you can generate a core dump the next time a hang occurs to help debug the issue. Set the environment variable GPFDIST_WATCHDOG_TIMER
to the number of seconds of no activity to wait before gpfdist
is forced to exit. When the environment variable is set and gpfdist
hangs, the utility aborts after the specified number of seconds, creates a core dump, and sends abort information to the log file.
This example sets the environment variable on a Windows system so that gpfdist
exits after 300 seconds (5 minutes) of no activity.
SET GPFDIST_WATCHDOG_TIMER=300
To serve files from a specified directory using port 8081 (and start gpfdist.exe
in the background):
gpfdist.exe -d /var/load_files -p 8081 &
To start gpfdist.exe
in the background and redirect output and errors to a log file:
gpfdist.exe -d /var/load_files -p 8081 -l /home/gpadmin/log &
gpload.py, CREATE EXTERNAL TABLE
in the Greenplum Database Reference Guide