During this phase you run the gpupgrade initialize
command. This phase prepares the source cluster for the upgrade and initializes the target cluster. Before proceeding, ensure you have reviewed and completed the pre-upgrade phase tasks.
IMPORTANT: Start the initialize phase during a scheduled downtime. Plan and notify all appropriate groups and users that the Greenplum Database cluster will be off-line for an extended period.
The following table summarizes the cluster state before and after gpupgrade initialize
:
Before Initialize | After Initialize | |||
---|---|---|---|---|
Source | Target | Source | Target | |
Master | UP | Non Existent | UP | Initialized but DOWN |
Standby | UP | Non Existent | UP | Non Existent |
Primaries | UP | Non Existent | UP | Initialized but DOWN |
Mirrors | UP | Non Existent | UP | Non Existent |
The gpupgrade initialize
command performs the following steps:
gpupgrade
hub process on the master host.gpupgrade
agents on the master and segment hosts, one agent process on each host.pg_upgrade --check
to check for known migration issues between the source and target Greenplum Database versions.
IMPORTANT: The minimum supported Greenplum Database 5.x version is 5.29.6. Upgrade the source cluster to 5.29.6 or the latest 5.29.x version. For more details on the upgrade process, see Upgrading to Greenplum Database 5.29.6 in the Greenplum documentation.
Ensure that the source Greenplum cluster is in a healthy state, with standby host and mirrors in their preferred roles. If they are not, gpupgrade initialize
will fail during the consistency checks. For further details, see gpstate. Verify the cluster state:
gpstate -e
For incremental recovery, run:
gprecoverseg -a
For full recovery:
gprecoverseg -F
To rebalance:
gprecoverseg -r
Check for sufficient disk space on the master and on all hosts. gpupgrade initialize
will check for 60% available space on each host in copy
mode, or 20% in link
mode.
Since gpupgrade
does major version in-place upgrades it can be easy to mix source and target environments, causing Greenplum utilities to fail. To prevent such failures, perform the following steps:
On all segments, remove from .bashrc
or .bash_profile
files any lines that source greenplum_path.sh
or set Greenplum variables.
Start a new shell and ensure PATH
, LD_LIBRARY_PATH
, PYTHONHOME
, and PYTHONPATH
are clear of any Greenplum values.
ssh to a segment host and also ensure the above values are clear of any Greenplum values.
During the initialization phase, run the pre-initialize migration scripts, edit the gpupgrade configuration file, and then use that file as a parameter when you run gpupgrade initialize
.
In the upgrade downtime window, run the gpupgrade-migration-sql-executor.bash pre-initialize
script. For details on the scripts, see About the Migration Scripts and Executing the SQL Migration Scripts.
The gpupgrade initialize
command requires a configuration file as an input. Review an example gpupgrade_config
file in the directory where you extracted the downloaded gpupgrade
utility.
Copy the example file to the $HOME/gpupgrade/
location and make edits according to your environment:
cp /usr/local/bin/greenplum/gpupgrade/gpupgrade_config $HOME/gpupgrade/
NOTE: The source_master_port
, source_gphome
, and target_gphome
parameters are blank and must be set to your environment’s values. If you are upgrading with extensions whose install location is outside of $target_gphome
, you must set the dynamic_library_path
parameter; also ensure that the latest supported version has been installed on the source cluster.
The remaining parameters are commented-out and have default values. Change these values as necessary for your upgrade scenario. See the gpupgrade_config file reference page for further details.
WARNING: If using link
mode, and the source Greenplum cluster does not have a standby host and mirrors, gpupgrade
generates a warning:
The source cluster does not have standby and/or mirror segments.
After “gpupgrade execute” has been run, there will be no way to
return the cluster to its original state using “gpupgrade revert”.
For source clusters with preinstalled extensions, run the command as described below but the first time you will see an error. For details, see Running Initialize with Extensions.
To run initialize use a command like:
gpupgrade initialize --file | -f PATH/TO/gpupgade_config [--verbose | -v] [--automatic | -a]
Where:
--file | -f
specifies the configuration file location--verbose | -v
is the flag for verbose output--automatic | -a
suppress summary and confirmation dialogFor example:
gpupgrade initialize --file $HOME/gpupgrade/gpupgrade_config --verbose
The utility displays a summary message and waits for user confirmation before proceeding:
You are about to initialize a major-version upgrade of Greenplum.
This should be done only during a downtime window.
...
Before proceeding, ensure the following have occurred:
- Take a backup of the source Greenplum cluster
- [Generate] and [execute] the data migration "start" scripts
- Run gpcheckcat to ensure the source catalog has no inconsistencies
- Run gpstate -e to ensure the source cluster's segments are up and in preferred roles
To skip this summary, use the --automatic | -a flag.
Continue with gpupgrade initialize? Yy|Nn:
The utility proceeds through various background steps, and displays its progress on the screen:
Initialize in progress.
Starting gpupgrade hub process... [IN PROGRESS]
Saving source cluster configuration... [COMPLETE]
Starting gpupgrade agent processes... [COMPLETE]
Checking environment... [COMPLETE]
Checking disk space... [COMPLETE]
Generating target cluster configuration... [COMPLETE]
Creating target cluster... [COMPLETE]
Stopping target cluster... [COMPLETE]
Backing up target master... [COMPLETE]
Running pg_upgrade checks... [COMPLETE]
Initialize completed successfully.
NEXT ACTIONS
------------
To proceed with the upgrade, run "gpupgrade execute" followed by "gpupgrade finalize".
To return the cluster to its original state, run "gpupgrade revert".
The status of each step can be COMPLETE, FAILED, SKIPPED, or IN PROGRESS. SKIPPED indicates that the command has been run before and the step has already been executed.
These steps are further described below:
gpupgrade
hub process on the master node.gpupgrade
state files to hold the source configuration.‑‑link
is specified, the requirement is 20%.‑‑disk_free_ratio
.‑‑disk_free_ratio: 0.0
in the configuration file.gpupgrade
state files with the target cluster details.pg_upgrade
on the postgres instances. See Creating Target Cluster Directories for a description of the target cluster data directories.execute
if any issues occur.pg_upgrade
checks: Runs a thorough list of Greenplum Database checks, see Initialize Phase pg_upgrade Checks.To resolve any [FAILED]
steps, review the screen error comments and recommendations, the server log files in the $HOME/gpAdminLogs
directory, including the gpupgrade initialize
log file in the gpAdminLogs/gpupgrade/
directory, and discuss with the VMware Greenplum team that’s supporting you during the upgrade.
If the source cluster contains extensions, the first time you run the initialize command, it will fail as the extensions are not yet installed in the target cluster. The generated error message is similar to:
Running pg_upgrade checks... [FAILED]
Error: initialize create cluster: InitializeCreateCluster: rpc error: code = Unknown desc = substep "CHECK_UPGRADE": 4 errors occurred:
* check master: Checking for presence of required libraries fatal
Your installation references loadable libraries that are missing from the
new installation. You can add these libraries to the new installation,
or remove the functions using them from the old installation. A list of
problem libraries is in the file:
/home/gpadmin/.gpupgrade/pg_upgrade/seg-1/loadable_libraries.txt
Resolve the error by performing these steps:
To avoid mixing the source and target Greenplum environmental variables, open a new terminal and start the target cluster:
source /usr/local/greenplum-db-<target>/greenplum_path.sh
export MASTER_DATA_DIRECTORY=$(gpupgrade config show --target-datadir)
export PGPORT=$(gpupgrade config show --target-port)
gpstart -a
Install the same version of the extension that is on the source cluster in the target cluster. See each extension’s documentation for installation specifics.
For GPText customers, copy the following GPText files from the source cluster $MASTER_DATA_DIRECTORY
to the target cluster $MASTER_DATA_DIRECTORY
:
cp $MASTER_DATA_DIRECTORY/{gptext.conf,gptxtenvs.conf,zoo_cluster.conf} /home/gpadmin/.gpupgrade/master.bak/
Note: Do NOT alter any of the files in the .gpupgrade
directory.
For PostGIS customers, drop the following views containing deprecated name datatypes.
DROP VIEW geography_columns;
DROP VIEW raster_columns;
DROP VIEW raster_overviews;
Make a note after the upgrade is complete to re-create these views following the PostGIS post-upgrade steps.
Stop the target cluster by issuing this command: gpstop -a
Re-run the initialize command.
When the gpupgrade initialize
command creates the target Greenplum cluster, it creates data directories for the target master segment instance and primary segment instances on the master and segment hosts, alongside the source cluster data directories. This applies both to copy
or link
mode.
The target cluster data directory names have this format:
<segment-prefix>.<hash-code>.<content-id>
Where:
<segment-prefix>
is the segment prefix string specified when the source Greenplum Database system was initialized. This is typically gpseg
.<hash-code>
is a 10-character string generated by gpupgrade
. The hash code is the same for all segment data directories belonging to the new target Greenplum cluster. In addition to distinguishing target directories from the source data directories, the unique hash code tags all data directories belonging to the current gpupgrade
instance.<content-id>
is the database content id for the segment. The master segment instance content id is always −1. The primary segment content ids are numbered consecutively from 0 to the number of primary segments.For example, if the $MASTER_DATA_DIRECTORY
environment variable value is /data/master/gpseg-1/
, the data directory for the target master is /data/master/gpseg.AAAAAAAAAA.-1
, where AAAAAAAAAA
is the hash code gpupgrade
generated for this target cluster. Primary segment data directories for the target cluster are located on the same host and at the same path as their source cluster counterparts. If the first primary segment for the source cluster is on host sdw1
in the directory /data/primary/gpseg0
, the target cluster segment directory is on the same host at /data/primary/gpseg.AAAAAAAAAA.0
.
When the gpugprade finalize
command has completed, source cluster data directory names are renamed as:
<segment-prefix>.<hash-code>.<content-id>.old
and the target cluster data directory names are renamed to the original source directory names:
<segment-prefix><content-id>
This section explains the format of gpupgrade
error messages.
Consider the following example:
Error: rpc error: code = Unknown desc = substep "SAVING_SOURCE_CLUSTER_CONFIG": retrieve source configuration: querying gp_segment_configuration: ERROR: Unsupported startup parameter: search_path (SQLSTATE 08P01)
The following table summarizes the meaning of each element of this sample error message:
Error Message Element | Meaning |
---|---|
Error: rpc error: code = Unknown |
This element is inherent to gpupgrade ’s underlying protocol and is an implementation detail. |
desc = substep "SAVING_SOURCE_CLUSTER_CONFIG" |
Indicates which substep gpupgrade failed on, in this case the “SAVING_SOURCE_CLUSTER_CONFIG” substep. |
retrieve source configuration: querying gp_segment_configuration |
A series of prefixes providing additional context, from less specific to more specific. |
ERROR: Unsupported startup parameter: search_path (SQLSTATE 08P01) |
The actual error; in this example, there was an unsupported parameter “search_path” when querying the database. |
If you cannot create the gpAdminLogs/gpupgrade/initialize.log
file, verify that you are logged in as gpadmin
and that all files in the gpAdminLogs
directory are owned by gpadmin and are writable by gpadmin.
During initialize
gpinitsystem can fail when calling gpstart with the following errors:
[CRITICAL]:-gpstart failed. (Reason='') exiting...
stderr='Error: unable to import module: /usr/local/greenplum-db-6.20.3/lib/libpq.so.5: symbol gss_acquire_cred_from, version gssapi_krb5_2_MIT not defined in file libgssapi_krb5.so.2 with link time reference
Other similar errors may include:
/usr/local/greenplum-db-6.19.3/bin/postgres: /usr/local/greenplum-db-5.29.1/lib/libxml2.so.2: no version information available (required by /usr/local/greenplum-db-6.19.3/bin/postgres)
This occurs when the source and target Greenplum environments are mixed, causing utilities to fail. To resolve this, perform the following steps:
On all segments, remove from .bashrc
or .bash_profile
files any lines that source greenplum_path.sh
or set Greenplum variables.
Start a new shell and ensure that PATH
, LD_LIBRARY_PATH
, PYTHONHOME
, and PYTHONPATH
are clear of any Greenplum values.
ssh to a segment host and also ensure the above values are clear of any Greenplum values.
If your Greenplum 5.x cluster has installed extensions, such as Greenplum Streaming Server, PL/Container or PL/Java, the gpupgrade initialize
checks will fail until you reinstall the missing extensions on the target Greenplum Database. The error message will look like this:
Running pg_upgrade checks... [FAILED]
Error: initialize create cluster: InitializeCreateCluster: rpc error: code = Unknown desc = substep "CHECK_UPGRADE": 4 errors occurred:
* check master: Checking for presence of required libraries
fatal
Your installation references loadable libraries that are missing from the
new installation. You can add these libraries to the new installation,
or remove the functions using them from the old installation. A list of
problem libraries is in the file:
/home/gpadmin/.gpupgrade/pg_upgrade/seg-1/loadable_libraries.txt
When running on a single node system, particularly in a cloud environment, if you encounter a grpcDialer failed:
error it is possible that your local hostname is not resolvable. Verify that each host is resolvable by issuing the following command:
$ ping -q -c 1 -t 1 `hostname`
There may be a connection issue between gpupgrade
’s various processes if you receive “transport is closing” or "context deadline “exceeded” errors such as the following:
Error: rpc error: code = Unavailable desc = transport is closing
Error: connecting to hub on port 7527: context deadline exceeded
gpupgrade
runs CLI, hub, and agent processes. For a variety of reasons, the underlying connections between them can break, resulting in the above errors. Try stopping these processes with gpupgrade kill-services
, and restarting with gpupgrade restart-services
.
Continue with the gpupgrade Execute Phase or gpupgrade revert.