This topic provides guidance on how to troubleshoot and debug issues you may encounter while using gpupgrade
. In order to address an issue, you must know about the different logs and where they are located, understand the format of a gpupgrade
log message, and follow basic steps to define and understand what is happening in your enviroment.
See below for the most commonly hit errors during the different gpupgrade
command phases.
The following table illustrates the different types of logs and their locations. Collect the logs from the specified hosts when troubleshooting an error.
Log Type | Log Location | Log Host(s) |
---|---|---|
gpupgrade_config | $HOME/gpupgrade/gpupgrade_config | master |
gpupgrade | $HOME/gpAdminLogs/gpupgrade This directory is archived and renamed after the upgrade finalizes or is reverted. For example: $HOME/gpAdminLogs/gpupgrade–V1uYwZeLL4-20220630T145533 $HOME/gpupgrade for data migration scripts and script logs. |
all hosts |
pg_upgrade | $HOME/gpAdminLogs/gpupgrade/pg_upgrade | all hosts |
Greenplum utility logs | $HOME/gpAdminLogs | all hosts |
Source cluster pg_logs | $MASTER_DATA_DIRECTORY/pg_log $SEGMENT_DATA_DIRECTORY/pg_log |
master, failed segments and one success segment |
Target cluster pg_logs | $(gpupgrade config show –target-datadir)/pg_log The format of this directory name is “upgradeID.contentID” |
master, failed segments and one success segment |
You may use GPMT (Greenplum Magic Tool) version 1.4 or higher to collect the gpupgrade
relevant logs by running:
gpmt gp_log_collector -with-gpupgrade -c <failedSegement>,<succeededSegment>
Consider the following example:
Error: rpc error: code = Unknown desc = substep "SAVING_SOURCE_CLUSTER_CONFIG": retrieve source configuration: querying gp_segment_configuration: ERROR: Unsupported startup parameter: search_path (SQLSTATE 08P01)
The following table summarizes the meaning of each element of this sample error message:
Error Message Element | Meaning |
---|---|
Error: rpc error: code = Unknown |
This element is inherent to gpupgrade ’s underlying protocol and is an implementation detail. |
desc = substep "SAVING_SOURCE_CLUSTER_CONFIG" |
Indicates which substep gpupgrade failed on, in this case the SAVING_SOURCE_CLUSTER_CONFIG substep. |
retrieve source configuration: querying gp_segment_configuration |
A series of prefixes providing additional context, from less specific to more specific. |
ERROR: Unsupported startup parameter: search_path (SQLSTATE 08P01) |
The actual error; in this example, there was an unsupported parameter search_path when querying the database. |
When troubleshooting any gpupgrade
error, follow the steps below to identify where the error is coming from and what logs you need to collect.
Step 1: Identify High Level Failure
Step 2: Identify Failing Host
gpupgrade
architecture: did the error originate from the hub process (master) or the agent (segment)?Step 3: Identify Failing Utility
gpupgrade
itself fail, or an underlying utility that gpupgrade calls, such as pg_upgrade
, pg_dump
, gpinitsystem
, gpstart
, gpstop
, gpaddmirrors
?Step 4: Identify Specific Failure
While the upgrade process is ongoing, two Greenplum Database versions are installed on the same hosts. When you need to work on the target cluster, follow these steps to avoid mixing environment variables between source and target systems.
Open a new terminal window and run the following commands:
source /usr/local/greenplum-db-<target-version>/greenplum_path.sh
export MASTER_DATA_DIRECTORY=$(gpupgrade config show --target-datadir)
export PGPORT=$(gpupgrade config show --target-port)
MASTER_DATA_DIRECTORY
and PGPORT
are now pointing to the target cluster variables.
During gpupgrade initialize
, gpinitsystem
can fail with the following errors when calling gpstart
:
[CRITICAL]:-gpstart failed. (Reason='') exiting...
stderr='Error: unable to import module: /usr/local/greenplum-db-6.20.3/lib/libpq.so.5: symbol gss_acquire_cred_from, version gssapi_krb5_2_MIT not defined in file libgssapi_krb5.so.2 with link time reference
Other similar errors may include:
/usr/local/greenplum-db-6.19.3/bin/postgres: /usr/local/greenplum-db-5.29.1/lib/libxml2.so.2: no version information available (required by /usr/local/greenplum-db-6.19.3/bin/postgres)
This occurs when the source and target Greenplum environments are mixed, causing utilities to fail. To resolve this, perform the following steps:
On all segments, remove from .bashrc
or .bash_profile
files any lines that source greenplum_path.sh
or set Greenplum variables.
Start a new shell and ensure that PATH
, LD_LIBRARY_PATH
, PYTHONHOME
, and PYTHONPATH
are clear of any Greenplum values.
Connect to a segment host and ensure the above values are clear of any Greenplum values.
When running on a single node system, particularly in a cloud environment, if you encounter a grpcDialer failed
error it is possible that your local hostname is not resolvable. Verify that each host is resolvable by issuing the following command:
ping -q -c 1 -t 1 `hostname`
There may be a connection issue between gpupgrade
’s various processes if you receive errors such as the following:
Error: rpc error: code = Unavailable desc = transport is closing
Error: connecting to hub on port 7527: context deadline exceeded
gpupgrade
runs CLI, hub, and agent processes. For a variety of reasons, the underlying connections between them can break, resulting in the above errors. Try stopping these processes with gpupgrade kill-services
, and restarting with gpupgrade restart-services
.
If you encounter the error Failed to connect to the upgrade hub
:
You must run gpupgrade initialize
before you can run gpupgrade execute
. If you already ran gpupgrade initialize
, try running gpupgrade restart-services
to restart the hub and agent processes.
Verify that gpupgrade
is installed in the same path on all hosts in the cluster.
If you encounter the error: Failed to start an agent on a segment host
, be sure that:
The segment hosts are up and the gpadmin
user can log in with ssh
.
gpupgrade
is installed in the same location on all hosts in the Greenplum Database cluster.
No other applications are using the agent port (by default 6416) on any host in the cluster.
If the gpupgrade_hub
process fails to start or crashes:
Check the $HOME/gpAdminLogs/gpupgrade/hub.log
file for messages that identify the problem causing the failure.
Check that no other applications are using the hub port (by default 7527) on the master host.
If the target cluster is not starting, check the gpinitsystem
and gpstart
log files in the $HOME/gpAdminLogs
directory.
In order to enable verbose logging for pg_upgrade
, re-run gpupgrade execute
in verbose mode: gpupgrade execute --pg-upgrade-verbose --verbose
.
In link
mode, when upgrading the mirror segments, the following error may occur:
Error: Finalize: rpc error: code = Unknown desc = substep "UPGRADE_MIRRORS": 4 errors occurred:
* rpc error: code = Unknown desc = 2 errors occurred:
* Host key verification failed.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
* Host key verification failed.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2]
To resolve this, run the following commands in a separate shell:
source /usr/local/greenplum-db-<target-version>/greenplum_path.sh
gpssh-exkeys -f all_hosts
gpstart -a -d $(gpupgrade config show --target-datadir)
If the gpupgrade finalize
command does not finish successfully, it is possible that the hub and agent processes are still running. Shut them down using the following command:
gpupgrade kill-services
If it fails, you may manually stop them on all hosts with the pkill gpupgrade
command.