You can use the PXF JDBC Connector to retrieve data from Hive. You can also use a JDBC named query to submit a custom SQL query to Hive and retrieve the results using the JDBC Connector.
This topic describes how to configure the PXF JDBC Connector to access Hive. When you configure Hive access with JDBC, you must take into account the Hive user impersonation setting, as well as whether or not the Hadoop cluster is secured with Kerberos.
If you do not plan to use the PXF JDBC Connector to access Hive, then you do not need to perform this procedure.
The PXF JDBC Connector is installed with the JAR files required to access Hive via JDBC, hive-jdbc-<version>.jar
and hive-service-<version>.jar
, and automatically registers these JARs.
When you configure a PXF JDBC server for Hive access, you must specify the JDBC driver class name, database URL, and client credentials just as you would when configuring a client connection to an SQL database.
To access Hive via JDBC, you must specify the following properties and values in the jdbc-site.xml
server configuration file:
Property | Value |
---|---|
jdbc.driver | org.apache.hive.jdbc.HiveDriver |
jdbc.url | jdbc:hive2://<hiveserver2_host>:<hiveserver2_port>/<database> |
The value of the HiveServer2 authentication (hive.server2.authentication
) and impersonation (hive.server2.enable.doAs
) properties, and whether or not the Hive service is utilizing Kerberos authentication, will inform the setting of other JDBC server configuration properties. These properties are defined in the hive-site.xml
configuration file in the Hadoop cluster. You will need to obtain the values of these properties.
The following table enumerates the Hive2 authentication and impersonation combinations supported by the PXF JDBC Connector. It identifies the possible Hive user identities and the JDBC server configuration required for each.
Table heading key:
hive.server2.authentication
Settinghive.server2.enable.doAs
Settingauthentication | enable.doAs | User Identity | Configuration Required |
---|---|---|---|
NOSASL |
n/a | No authentication | Must set jdbc.connection.property.auth = noSasl . |
NONE , or not specified |
TRUE |
User name that you provide | Set jdbc.user . |
NONE , or not specified |
TRUE |
Greenplum user name | Set pxf.service.user.impersonation to true in jdbc-site.xml . |
NONE , or not specified |
FALSE |
Name of the user who started Hive, typically hive |
None |
KERBEROS |
TRUE |
Identity provided in the PXF Kerberos principal, typically gpadmin |
Must set hadoop.security.authentication to kerberos in jdbc-site.xml . |
KERBEROS |
TRUE |
User name that you provide | Set hive.server2.proxy.user in jdbc.url and set hadoop.security.authentication to kerberos in jdbc-site.xml . |
KERBEROS |
TRUE |
Greenplum user name | Set pxf.service.user.impersonation to true and hadoop.security.authentication to kerberos in jdbc-site.xml . |
KERBEROS |
FALSE |
Identity provided in the jdbc.url principal parameter, typically hive |
Must set hadoop.security.authentication to kerberos in jdbc-site.xml . |
Note: There are additional configuration steps required when Hive utilizes Kerberos authentication.
Perform the following procedure to configure a PXF JDBC server for Hive:
Log in to your Greenplum Database coordinator host:
$ ssh gpadmin@<coordinator>
Choose a name for the JDBC server.
Create the $PXF_BASE/servers/<server_name>
directory. For example, use the following command to create a JDBC server configuration named hivejdbc1
:
gpadmin@coordinator$ mkdir $PXF_BASE/servers/hivejdbc1
Navigate to the server configuration directory. For example:
gpadmin@coordinator$ cd $PXF_BASE/servers/hivejdbc1
Copy the PXF JDBC server template file to the server configuration directory. For example:
gpadmin@coordinator$ cp <PXF_INSTALL_DIR>/templates/jdbc-site.xml .
When you access Hive secured with Kerberos, you also need to specify configuration properties in the pxf-site.xml
file. If this file does not yet exist in your server configuration, copy the pxf-site.xml
template file to the server config directory. For example:
gpadmin@coordinator$ cp <PXF_INSTALL_DIR>/templates/pxf-site.xml .
Open the jdbc-site.xml
file in the editor of your choice and set the jdbc.driver
and jdbc.url
properties. Be sure to specify your Hive host, port, and database name:
<property>
<name>jdbc.driver</name>
<value>org.apache.hive.jdbc.HiveDriver</value>
</property>
<property>
<name>jdbc.url</name>
<value>jdbc:hive2://<hiveserver2_host>:<hiveserver2_port>/<database></value>
</property>
Obtain the hive-site.xml
file from your Hadoop cluster and examine the file.
If the hive.server2.authentication
property in hive-site.xml
is set to NOSASL
, HiveServer2 performs no authentication. Add the following connection-level property to jdbc-site.xml
:
<property>
<name>jdbc.connection.property.auth</name>
<value>noSasl</value>
</property>
Alternatively, you may choose to add ;auth=noSasl
to the jdbc.url
.
If the hive.server2.authentication
property in hive-site.xml
is set to NONE
, or the property is not specified, you must set the jdbc.user
property. The value to which you set the jdbc.user
property is dependent upon the hive.server2.enable.doAs
impersonation setting in hive-site.xml
:
If hive.server2.enable.doAs
is set to TRUE
(the default), Hive runs Hadoop operations on behalf of the user connecting to Hive. Choose/perform one of the following options:
Set jdbc.user
to specify the user that has read permission on all Hive data accessed by Greenplum Database. For example, to connect to Hive and run all requests as user gpadmin
:
<property>
<name>jdbc.user</name>
<value>gpadmin</value>
</property>
Or, turn on JDBC server-level user impersonation so that PXF automatically uses the Greenplum Database user name to connect to Hive; uncomment the pxf.service.user.impersonation
property in jdbc-site.xml
and set the value to `true:
<property>
<name>pxf.service.user.impersonation</name>
<value>true</value>
</property>
If you enable JDBC impersonation in this manner, you must not specify a jdbc.user
nor include the setting in the jdbc.url
.
If required, create a PXF user configuration file as described in Configuring a PXF User to manage the password setting.
hive.server2.enable.doAs
is set to FALSE
, Hive runs Hadoop operations as the user who started the HiveServer2 process, usually the user hive
. PXF ignores the jdbc.user
setting in this circumstance.If the hive.server2.authentication
property in hive-site.xml
is set to KERBEROS
:
pxf-site.xml
properties as described in the procedure.pxf.service.user.impersonation
property in the pxf-site.xml
file. If you require user impersonation, you will uncomment and set the property in an upcoming step.Uncomment the hadoop.security.authentication
setting in $PXF_BASE/servers/<name>/jdbc-site.xml
:
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
Add the saslQop
property to jdbc.url
, and set it to match the hive.server2.thrift.sasl.qop
property setting in hive-site.xml
. For example, if the hive-site.xml
file includes the following property setting:
<property>
<name>hive.server2.thrift.sasl.qop</name>
<value>auth-conf</value>
</property>
You would add ;saslQop=auth-conf
to the jdbc.url
.
Add the HiverServer2 principal
name to the jdbc.url
. For example:
jdbc:hive2://hs2server:10000/default;principal=hive/hs2server@REALM;saslQop=auth-conf
If hive.server2.enable.doAs
is set to TRUE
(the default), Hive runs Hadoop operations on behalf of the user connecting to Hive. Choose/perform one of the following options:
Do not specify any additional properties. In this case, PXF initiates all Hadoop access with the identity provided in the PXF Kerberos principal (usually gpadmin
).
Or, set the hive.server2.proxy.user
property in the jdbc.url
to specify the user that has read permission on all Hive data. For example, to connect to Hive and run all requests as the user named integration
use the following jdbc.url
:
jdbc:hive2://hs2server:10000/default;principal=hive/hs2server@REALM;saslQop=auth-conf;hive.server2.proxy.user=integration
Or, enable PXF JDBC impersonation in the pxf-site.xml
file so that PXF automatically uses the Greenplum Database user name to connect to Hive. Add or uncomment the pxf.service.user.impersonation
property and set the value to true
. For example:
<property>
<name>pxf.service.user.impersonation</name>
<value>true</value>
</property>
If you enable JDBC impersonation, you must not explicitly specify a hive.server2.proxy.user
in the jdbc.url
.
hive.server2.enable.doAs
is set to FALSE
, Hive runs Hadoop operations with the identity provided by the PXF Kerberos principal (usually gpadmin
).Save your changes and exit the editor.
Use the pxf cluster sync
command to copy the new server configuration to the Greenplum Database cluster:
gpadmin@coordinator$ pxf cluster sync