This section describes Hadoop-related problems and potential solutions to these issues.

DataNode Access Errors

You may experience Hadoop access errors with VMware Greenplum Text if any DataNodes in the Hadoop cluster reside in a multi-homed network. VMware Greenplum Text uses an external IP address to access the HDFS NameNode. VMware Greenplum Text encounters an error when the NameNode provides an internal IP address for a DataNode. In this situation, additional configuration is required to configure VMware Greenplum Text to perform its own DNS resolution of DataNode host names.

Perform the following procedure to explicitly configure DNS resolution of DataNode host names:

  1. Locate a local copy of the Hadoop authentication configuration directory that you previously uploaded to ZooKeeper. For example, if the directory is located at /home/gpadmin/auths/hdfs_conf:

    $ cd /home/gpadmin/auths/hdfs_conf
    $ ls
    core-site.xml  hdfs-site.xml  user.txt
    
  2. Open hdfs-site.xml in the editor of your choice. For example:

    $ vi hdfs-site.xml
    
  3. Add the following property block to the file, and then save the file and exit:

    <property>
        <name>dfs.client.use.datanode.hostname</name>
        <value>true</value>
    </property>
    

    This property allows VMware Greenplum Text hosts to perform their own DNS resolution of HDFS DataNode hostnames.

  4. Re-upload the modified configuration to ZooKeeper. For example, if the hdfs_conf directory includes the authentication configuration files for a Hadoop cluster with <config_name> hdfs_bill_auth:

    $ cd ..
    $ gptext-external upload -t hdfs -c hdfs_bill_auth -p hdfs_conf
    
  5. Determine the hostname-to-IP address mapping for all DataNodes, and add the associated entries into the /etc/hosts file on all VMware Greenplum Text client hosts.

The following problems are specific to Hadoop clusters secured with Kerberos.

Clock Skew

A login attempt to a Hadoop cluster secured with Kerberos will fail if clock skew between VMware Greenplum Text client hosts and the Kerberos KDC host is too great. In this situation, you may see the following error in the Solr log:

java.io.IOException caused by a KrbException noting "Clock skew too great"

To resolve this situation, ensure that the clocks on the Kerberos KDC host and VMware Greenplum Text client hosts are synchronized.

Timeout Errors

A login attempt to a Hadoop cluster secured with Kerberos may fail with timeout errors when the kdc and admin_server settings in the krb5.conf file are specified with a hostname, and the VMware Greenplum Text client hosts cannot resolve the hostname. In this situation, you may see one of the following errors in the Solr log:

  • org.apache.solr.common.SolrException: Failed to login HDFS message caused by a java.io.IOException specifying javax.security.auth.login.LoginException: Receive timed out
  • java.nio.channels.UnresolvedAddressException with SocketIOWithTimeout referenced in the stack trace

In this situation, you may choose either of the following:

  • Update the Kerberos krb5.conf file to specify the kdc and admin_server settings using IP addresses.

    Or

  • Update all VMware Greenplum Text hosts to perform their own DNS resolution of the Kerberos KDC server.

If you choose to update the krb5.conf file:

  1. Locate a local copy of the Hadoop Kerberos authentication configuration directory that you previously uploaded to ZooKeeper. For example, if the directory is located at /home/gpadmin/auths/hdfs_kerb_conf:

    $ cd /home/gpadmin/auths/hdfs_kerb_conf
    $ ls
    core-site.xml  hdfs-site.xml  keytab  krb5.conf  user.txt
    
  2. Open krb5.conf in the editor of your choice. For example:

    $ vi krb5.conf
    
  3. Replace the KERBEROS block attributes with their equivalent IP addresses and then save the file and exit. For example:

    [realms]
    KERBEROS = {
       kdc = <kdc_ipaddress>
       admin_server = <admin_server_ipaddress>
    }
    
  4. Re-upload the modified configuration to ZooKeeper. For example, if the directory named hdfs_kerb_conf includes the authentication configuration files for a Hadoop cluster defined with the <config_name> hdfs_kerb_auth:

    $ cd ..
    $ gptext-external upload -t hdfs -c hdfs_kerb_auth -p hdfs_kerb_conf
    

Alternatively, if you choose to configure the VMware Greenplum Text hosts to perform their own DNS resolution of the Kerberos KDC server, add an entry for the KDC hostname-to-IP address mapping to the /etc/hosts file on all VMware Greenplum Text client hosts.

check-circle-line exclamation-circle-line close-line
Scroll to top icon