You use PXF to access data stored on external systems. Depending upon the external data store, this access may require that you install and/or configure additional components or services for the external data store.
PXF depends on JAR files and other configuration information provided by these additional components. In most cases, PXF manages internal JAR dependencies as necessary based on the connectors that you use.
Should you need to register a JAR or native library dependency with PXF, you copy the library to a location known to PXF or you inform PXF of a custom location, and then you must synchronize and restart PXF.
PXF loads JAR dependencies from the following directories, in this order:
The directories that you specify in the $PXF_BASE/conf/pxf-env.sh
configuration file, PXF_LOADER_PATH
environment variable. The pxf-env.sh
file includes this commented-out block:
# Additional locations to be class-loaded by PXF
# export PXF_LOADER_PATH=
You would uncomment the PXF_LOADER_PATH
setting and specify one or more colon-separated directory names.
The default PXF JAR directory $PXF_BASE/lib
.
To add a JAR dependency for PXF, for example a MySQL driver JAR file, you must log in to the Greenplum Database master host, copy the JAR file to the PXF user configuration runtime library directory ($PXF_BASE/lib
), sync the PXF configuration to the Greenplum Database cluster, and then restart PXF on each host. For example:
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ cp new_dependent_jar.jar $PXF_BASE/lib/
gpadmin@gpmaster$ pxf cluster sync
gpadmin@gpmaster$ pxf cluster restart
Alternatively, you could have identified the file system location of the JAR in the pxf-env.sh
PXF_LOADER_PATH
environment variable. If you choose this registration option, you must ensure that you copy the JAR file to the same location on the Greenplum Database standby master host and segment hosts before you synchronize and restart PXF.
PXF loads native libraries from the following directories, in this order:
The directories that you specify in the $PXF_BASE/conf/pxf-env.sh
configuration file, LD_LIBRARY_PATH
environment variable. The pxf-env.sh
file includes this commented-out block:
# Additional native libraries to be loaded by PXF
# export LD_LIBRARY_PATH=
You would uncomment the LD_LIBRARY_PATH
setting and specify one or more colon-separated directory names.
The default PXF native library directory $PXF_BASE/lib/native
.
/usr/lib/hadoop/lib/native
.As such, you have three file location options when you register a native library with PXF:
$PXF_BASE/lib/native
, on only the Greenplum Database master host. When you next synchronize PXF, PXF copies the native library to all hosts in the Greenplum cluster./usr/lib/hadoop/lib/native
, on the Greenplum master host, standby master host, and each segment host.pxf-env.sh
LD_LIBRARY_PATH
environment variable.Copy the native library file to one of the following:
$PXF_BASE/lib/native
directory on the Greenplum Database master host. (You may need to create this directory.)/usr/lib/hadoop/lib/native
directory on all Greenplum Database hosts.If you copied the native library to a custom location:
Open the $PXF_BASE/conf/pxf-env.sh
file in the editor of your choice, and uncomment the LD_LIBRARY_PATH
setting:
# Additional native libraries to be loaded by PXF
export LD_LIBRARY_PATH=
Specify the custom location in the LD_LIBRARY_PATH
environment variable. For example, if you copied a library named dependent_native_lib.so
to /usr/local/lib
on all Greenplum hosts, you would set LD_LIBRARY_PATH
as follows:
export LD_LIBRARY_PATH=/usr/local/lib
Save the file and exit the editor.
Synchronize the PXF configuration from the Greenplum Database master host to the standby master host and segment hosts.
gpadmin@gpmaster$ pxf cluster sync
If you copied the native library to the $PXF_BASE/lib/native
directory, this command copies the library to the same location on the Greenplum Database standby master host and segment hosts.
If you updated the pxf-env.sh
LD_LIBRARY_PATH
environment variable, this command copies the configuration change to the Greenplum Database standby master host and segment hosts.
Restart PXF on all Greenplum hosts:
gpadmin@gpmaster$ pxf cluster restart