This section describes Hadoop-related problems and potential solutions to these issues.
You may experience Hadoop access errors with Tanzu Greenplum Text if any DataNodes in the Hadoop cluster reside in a multi-homed network. Tanzu Greenplum Text uses an external IP address to access the HDFS NameNode. Tanzu Greenplum Text encounters an error when the NameNode provides an internal IP address for a DataNode. In this situation, additional configuration is required to configure Tanzu Greenplum Text to perform its own DNS resolution of DataNode host names.
Perform the following procedure to explicitly configure DNS resolution of DataNode host names:
Locate a local copy of the Hadoop authentication configuration directory that you previously uploaded to ZooKeeper. For example, if the directory is located at /home/gpadmin/auths/hdfs_conf
:
$ cd /home/gpadmin/auths/hdfs_conf
$ ls
core-site.xml hdfs-site.xml user.txt
Open hdfs-site.xml
in the editor of your choice. For example:
$ vi hdfs-site.xml
Add the following property block to the file, and then save the file and exit:
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
This property allows Tanzu Greenplum Text hosts to perform their own DNS resolution of HDFS DataNode hostnames.
Re-upload the modified configuration to ZooKeeper. For example, if the hdfs_conf
directory includes the authentication configuration files for a Hadoop cluster with <config_name> hdfs_bill_auth
:
$ cd ..
$ gptext-external upload -t hdfs -c hdfs_bill_auth -p hdfs_conf
Determine the hostname-to-IP address mapping for all DataNodes, and add the associated entries into the /etc/hosts
file on all Tanzu Greenplum Text client hosts.
The following problems are specific to Hadoop clusters secured with Kerberos.
A login attempt to a Hadoop cluster secured with Kerberos will fail if clock skew between Tanzu Greenplum Text client hosts and the Kerberos KDC host is too great. In this situation, you may see the following error in the Solr log:
java.io.IOException
caused by a KrbException
noting "Clock skew too great"
To resolve this situation, ensure that the clocks on the Kerberos KDC host and Tanzu Greenplum Text client hosts are synchronized.
A login attempt to a Hadoop cluster secured with Kerberos may fail with timeout errors when the kdc
and admin_server
settings in the krb5.conf
file are specified with a hostname, and the Tanzu Greenplum Text client hosts cannot resolve the hostname. In this situation, you may see one of the following errors in the Solr log:
org.apache.solr.common.SolrException: Failed to login HDFS
message caused by a java.io.IOException
specifying javax.security.auth.login.LoginException: Receive timed out
java.nio.channels.UnresolvedAddressException
with SocketIOWithTimeout
referenced in the stack traceIn this situation, you may choose either of the following:
Update the Kerberos krb5.conf
file to specify the kdc
and admin_server
settings using IP addresses.
Or
Update all Tanzu Greenplum Text hosts to perform their own DNS resolution of the Kerberos KDC server.
If you choose to update the krb5.conf
file:
Locate a local copy of the Hadoop Kerberos authentication configuration directory that you previously uploaded to ZooKeeper. For example, if the directory is located at /home/gpadmin/auths/hdfs_kerb_conf
:
$ cd /home/gpadmin/auths/hdfs_kerb_conf
$ ls
core-site.xml hdfs-site.xml keytab krb5.conf user.txt
Open krb5.conf
in the editor of your choice. For example:
$ vi krb5.conf
Replace the KERBEROS
block attributes with their equivalent IP addresses and then save the file and exit. For example:
[realms]
KERBEROS = {
kdc = <kdc_ipaddress>
admin_server = <admin_server_ipaddress>
}
Re-upload the modified configuration to ZooKeeper. For example, if the directory named hdfs_kerb_conf
includes the authentication configuration files for a Hadoop cluster defined with the <config_name> hdfs_kerb_auth
:
$ cd ..
$ gptext-external upload -t hdfs -c hdfs_kerb_auth -p hdfs_kerb_conf
Alternatively, if you choose to configure the Tanzu Greenplum Text hosts to perform their own DNS resolution of the Kerberos KDC server, add an entry for the KDC hostname-to-IP address mapping to the /etc/hosts
file on all Tanzu Greenplum Text client hosts.