Troubleshooting and Debugging

This topic provides guidance on how to troubleshoot and debug issues you may encounter while using gpupgrade. In order to address an issue, you must know about the different logs and where they are located, understand the format of a gpupgrade log message, and follow basic steps to define and understand what is happening in your enviroment.

See below for the most commonly hit errors during the different gpupgrade command phases.

Log Locations

The following table illustrates the different types of logs and their locations. Collect the logs from the specified hosts when troubleshooting an error.

Log Type	Log Location	Log Host(s)
gpupgrade_config	$HOME/gpupgrade/gpupgrade_config	master
gpupgrade	$HOME/gpAdminLogs/gpupgrade This directory is archived and renamed after the upgrade finalizes or is reverted. For example: $HOME/gpAdminLogs/gpupgrade–V1uYwZeLL4-20220630T145533 $HOME/gpupgrade for data migration scripts and script logs.	all hosts
pg_upgrade	$HOME/gpAdminLogs/gpupgrade/pg_upgrade	all hosts
Greenplum utility logs	$HOME/gpAdminLogs	all hosts
Source cluster pg_logs	$MASTER_DATA_DIRECTORY/pg_log $SEGMENT_DATA_DIRECTORY/pg_log	master, failed segments and one success segment
Target cluster pg_logs	$(gpupgrade config show –target-datadir)/pg_log The format of this directory name is “upgradeID.contentID”	master, failed segments and one success segment

You may use GPMT (Greenplum Magic Tool) version 1.4 or higher to collect the gpupgrade relevant logs by running:

gpmt gp_log_collector -with-gpupgrade -c <failedSegement>,<succeededSegment>

Understanding the Format of gpupgrade Errors

Consider the following example:

Error: rpc error: code = Unknown desc = substep "SAVING_SOURCE_CLUSTER_CONFIG": retrieve source configuration: querying gp_segment_configuration: ERROR: Unsupported startup parameter: search_path (SQLSTATE 08P01)

The following table summarizes the meaning of each element of this sample error message:

Error Message Element	Meaning
`Error: rpc error: code = Unknown`	This element is inherent to `gpupgrade`’s underlying protocol and is an implementation detail.
`desc = substep "SAVING_SOURCE_CLUSTER_CONFIG"`	Indicates which substep `gpupgrade` failed on, in this case the `SAVING_SOURCE_CLUSTER_CONFIG` substep.
`retrieve source configuration: querying gp_segment_configuration`	A series of prefixes providing additional context, from less specific to more specific.
`ERROR: Unsupported startup parameter: search_path (SQLSTATE 08P01)`	The actual error; in this example, there was an unsupported parameter `search_path` when querying the database.

General Troubleshooting Steps

When troubleshooting any gpupgrade error, follow the steps below to identify where the error is coming from and what logs you need to collect.

Step 1: Identify High Level Failure
- Which step failed? initialize, execute, finalize, or revert?
- Which specific substep failed? See Understanding the Format of gpupgrade Errors to identify the substep that generated the error.
- What is the upgrade mode? Copy or Link?
Step 2: Identify Failing Host
- Which specific host failed?
- Identify the failing host based on the error message and gpupgrade architecture: did the error originate from the hub process (master) or the agent (segment)?
Step 3: Identify Failing Utility
- Did gpupgrade itself fail, or an underlying utility that gpupgrade calls, such as pg_upgrade, pg_dump, gpinitsystem, gpstart, gpstop, gpaddmirrors?
Step 4: Identify Specific Failure
- Based on the context in which the error ocurred, what is the specific error?

Connecting to the Target Cluster

While the upgrade process is ongoing, two Greenplum Database versions are installed on the same hosts. When you need to work on the target cluster, follow these steps to avoid mixing environment variables between source and target systems.

Open a new terminal window and run the following commands:

source /usr/local/greenplum-db-<target-version>/greenplum_path.sh
export MASTER_DATA_DIRECTORY=$(gpupgrade config show --target-datadir)
export PGPORT=$(gpupgrade config show --target-port)

MASTER_DATA_DIRECTORY and PGPORT are now pointing to the target cluster variables.

Troubleshooting the Initialize Phase

gpstart Failures

During gpupgrade initialize, gpinitsystem can fail with the following errors when calling gpstart:

[CRITICAL]:-gpstart failed. (Reason='') exiting...

stderr='Error: unable to import module: /usr/local/greenplum-db-6.20.3/lib/libpq.so.5: symbol gss_acquire_cred_from, version gssapi_krb5_2_MIT not defined in file libgssapi_krb5.so.2 with link time reference

Other similar errors may include:

/usr/local/greenplum-db-6.19.3/bin/postgres: /usr/local/greenplum-db-5.29.1/lib/libxml2.so.2: no version information available (required by /usr/local/greenplum-db-6.19.3/bin/postgres)

This occurs when the source and target Greenplum environments are mixed, causing utilities to fail. To resolve this, perform the following steps:

On all segments, remove from .bashrc or .bash_profile files any lines that source greenplum_path.sh or set Greenplum variables.
Start a new shell and ensure that PATH, LD_LIBRARY_PATH, PYTHONHOME, and PYTHONPATH are clear of any Greenplum values.
Connect to a segment host and ensure the above values are clear of any Greenplum values.

Host Resolution Problems

When running on a single node system, particularly in a cloud environment, if you encounter a grpcDialer failed error it is possible that your local hostname is not resolvable. Verify that each host is resolvable by issuing the following command:

ping -q -c 1 -t 1 `hostname`

RPC Connection Errors

There may be a connection issue between gpupgrade’s various processes if you receive errors such as the following:

Error: rpc error: code = Unavailable desc = transport is closing
Error: connecting to hub on port 7527: context deadline exceeded

gpupgrade runs CLI, hub, and agent processes. For a variety of reasons, the underlying connections between them can break, resulting in the above errors. Try stopping these processes with gpupgrade kill-services, and restarting with gpupgrade restart-services.

Troubleshooting the Execute Phase

Failed to Connect to the Upgrade Hub

If you encounter the error Failed to connect to the upgrade hub:

You must run gpupgrade initialize before you can run gpupgrade execute. If you already ran gpupgrade initialize, try running gpupgrade restart-services to restart the hub and agent processes.
Verify that gpupgrade is installed in the same path on all hosts in the cluster.

Failed to Start an Agent

If you encounter the error: Failed to start an agent on a segment host, be sure that:

The segment hosts are up and the gpadmin user can log in with ssh.
gpupgrade is installed in the same location on all hosts in the Greenplum Database cluster.
No other applications are using the agent port (by default 6416) on any host in the cluster.

Hub Process is Down

If the gpupgrade_hub process fails to start or crashes:

Check the $HOME/gpAdminLogs/gpupgrade/hub.log file for messages that identify the problem causing the failure.
Check that no other applications are using the hub port (by default 7527) on the master host.

Target Cluster Fails to Start

If the target cluster is not starting, check the gpinitsystem and gpstart log files in the $HOME/gpAdminLogs directory.

pg_upgrade Verbose Logging

In order to enable verbose logging for pg_upgrade, re-run gpupgrade execute in verbose mode: gpupgrade execute --pg-upgrade-verbose --verbose.

Troubleshooting the Finalize Phase

Host Key Verification Problem

In link mode, when upgrading the mirror segments, the following error may occur:

Error: Finalize: rpc error: code = Unknown desc = substep "UPGRADE_MIRRORS": 4 errors occurred:
        * rpc error: code = Unknown desc = 2 errors occurred:
        * Host key verification failed.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]

        * Host key verification failed.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]

To resolve this, run the following commands in a separate shell:

source /usr/local/greenplum-db-<target-version>/greenplum_path.sh
gpssh-exkeys -f all_hosts
gpstart -a -d $(gpupgrade config show --target-datadir)

Stopping Detached gpupgrade Processes

If the gpupgrade finalize command does not finish successfully, it is possible that the hub and agent processes are still running. Shut them down using the following command:

gpupgrade kill-services

If it fails, you may manually stop them on all hosts with the pkill gpupgrade command.