During this phase you run the gpupgrade initialize command. This phase prepares the source cluster for the upgrade and initializes the target cluster. Before proceeding, ensure you have reviewed and completed the pre-upgrade phase tasks.

IMPORTANT: Start the initialize phase during a scheduled downtime. Plan and notify all appropriate groups and users that the Greenplum Database cluster will be off-line for an extended period.

The following table summarizes the cluster state before and after gpupgrade initialize:

Before Initialize After Initialize
Source Target Source Target
Master UP Non Existent UP Initialized but DOWN
Standby UP Non Existent UP Non Existent
Primaries UP Non Existent UP Initialized but DOWN
Mirrors UP Non Existent UP Non Existent

Initialize Workflow Summary

The gpupgrade initialize command performs the following steps:

  1. Starts the gpupgrade hub process on the master host.
  2. Saves the source cluster configuration.
  3. Starts the gpupgrade agents on the master and segment hosts, one agent process on each host.
  4. Checks the environment.
  5. Checks the disk space availability.
  6. Generates the target cluster configuration.
  7. Initializes the target cluster.
  8. Sets the dynamic library path on the target cluster.
  9. Shuts down the target cluster.
  10. Runs pg_upgrade --check to check for known migration issues between the source and target Greenplum Database versions.

Preparing to Initialize the Upgrade

IMPORTANT: The minimum supported Greenplum Database 5.x version is 5.29.6. Upgrade the source cluster to 5.29.6 or the latest 5.29.x version. For more details on the upgrade process, see Upgrading to Greenplum Database 5.29.6 in the Greenplum documentation.

  1. Ensure that the source Greenplum cluster is in a healthy state, with standby host and mirrors in their preferred roles. If they are not, gpupgrade initialize will fail during the consistency checks. For further details, see gpstate. Verify the cluster state:

    gpstate -e
    

    For incremental recovery, run:

    gprecoverseg -a
    

    For full recovery:

    gprecoverseg -F
    

    To rebalance:

    gprecoverseg -r
    
  2. Check for sufficient disk space on the master and on all hosts. gpupgrade initialize will check for 60% available space on each host in copy mode, or 20% in link mode.

  3. Since gpupgrade does major version in-place upgrades it can be easy to mix source and target environments, causing Greenplum utilities to fail. To prevent such failures, perform the following steps:

    • On all segments, remove from .bashrc or .bash_profile files any lines that source greenplum_path.sh or set Greenplum variables.

    • Start a new shell and ensure PATH, LD_LIBRARY_PATH, PYTHONHOME, and PYTHONPATH are clear of any Greenplum values.

    • ssh to a segment host and also ensure the above values are clear of any Greenplum values.

Running gpupgrade Initialize

During the initialization phase, run the pre-initialize migration scripts, edit the gpupgrade configuration file, and then use that file as a parameter when you run gpupgrade initialize.

Run the pre-initialize migration script

In the upgrade downtime window, run the gpupgrade-migration-sql-executor.bash pre-initialize script. For details on the scripts, see About the Migration Scripts and Executing the SQL Migration Scripts.

Edit the gpupgrade configuration file

The gpupgrade initialize command requires a configuration file as an input. Review an example gpupgrade_config file in the directory where you extracted the downloaded gpupgrade utility.

Copy the example file to the $HOME/gpupgrade/ location and make edits according to your environment:

cp /usr/local/bin/greenplum/gpupgrade/gpupgrade_config  $HOME/gpupgrade/

NOTE: The source_master_port, source_gphome, and target_gphome parameters are blank and must be set to your environment’s values. If you are upgrading with extensions whose install location is outside of $target_gphome, you must set the dynamic_library_path parameter; also ensure that the latest supported version has been installed on the source cluster.

The remaining parameters are commented-out and have default values. Change these values as necessary for your upgrade scenario. See the gpupgrade_config file reference page for further details.

WARNING: If using link mode, and the source Greenplum cluster does not have a standby host and mirrors, gpupgrade generates a warning:
The source cluster does not have standby and/or mirror segments.
After “gpupgrade execute” has been run, there will be no way to
return the cluster to its original state using “gpupgrade revert”.

Run Initialize

For source clusters with preinstalled extensions, run the command as described below but the first time you will see an error. For details, see Running Initialize with Extensions.

To run initialize use a command like:

gpupgrade initialize --file | -f PATH/TO/gpupgade_config [--verbose | -v] [--automatic | -a]

Where:

  • --file | -f specifies the configuration file location
  • --verbose | -v is the flag for verbose output
  • --automatic | -a suppress summary and confirmation dialog

For example:

gpupgrade initialize --file $HOME/gpupgrade/gpupgrade_config --verbose

The utility displays a summary message and waits for user confirmation before proceeding:

You are about to initialize a major-version upgrade of Greenplum.
This should be done only during a downtime window.

...

Before proceeding, ensure the following have occurred:
 - Take a backup of the source Greenplum cluster
 - [Generate] and [execute] the data migration "start" scripts
 - Run gpcheckcat to ensure the source catalog has no inconsistencies
 - Run gpstate -e to ensure the source cluster's segments are up and in preferred roles

To skip this summary, use the --automatic | -a  flag.

Continue with gpupgrade initialize?  Yy|Nn:

The utility proceeds through various background steps, and displays its progress on the screen:

Initialize in progress.

Starting gpupgrade hub process...                                  [IN PROGRESS]
Saving source cluster configuration...                             [COMPLETE]   
Starting gpupgrade agent processes...                              [COMPLETE]   
Checking environment...                                            [COMPLETE]   
Checking disk space...                                             [COMPLETE]   
Generating target cluster configuration...                         [COMPLETE]   
Creating target cluster...                                         [COMPLETE]   
Stopping target cluster...                                         [COMPLETE]   
Backing up target master...                                        [COMPLETE]   
Running pg_upgrade checks...                                       [COMPLETE]   

Initialize completed successfully.

NEXT ACTIONS
------------
To proceed with the upgrade, run "gpupgrade execute" followed by "gpupgrade finalize".

To return the cluster to its original state, run "gpupgrade revert".

The status of each step can be COMPLETE, FAILED, SKIPPED, or IN PROGRESS. SKIPPED indicates that the command has been run before and the step has already been executed.

These steps are further described below:

  • Starting gpupgrade hub process: Starts up the gpupgrade hub process on the master node.
  • Saving source cluster configuration: Collects the source cluster configuration details and generates gpupgrade state files to hold the source configuration.
  • Starting gpupgrade agent processes: Starts up agents on the standby master and segment hosts.
  • Checking Environment: Checks the environment paths for source and target to avoid mixing the two.
  • Checking disk space: Checks for available disk space.
    The default requirement is 60% free disk space. If ‑‑link is specified, the requirement is 20%.
    Can be altered by providing a different ratio with ‑‑disk_free_ratio.
    To skip this check entirely, specify ‑‑disk_free_ratio: 0.0 in the configuration file.
  • Generating target cluster configuration: Populates the gpupgrade state files with the target cluster details.
  • Creating target cluster: Initializes the target master and segment hosts, in order to run pg_upgrade on the postgres instances. See Creating Target Cluster Directories for a description of the target cluster data directories.
  • Stopping target cluster: Shuts down the target cluster.
  • Backing up target master: Creates a backup copy of target master, to be used during execute if any issues occur.
  • Running pg_upgrade checks: Runs a thorough list of Greenplum Database checks, see Initialize Phase pg_upgrade Checks.

To resolve any [FAILED] steps, review the screen error comments and recommendations, the server log files in the $HOME/gpAdminLogs directory, including the gpupgrade initialize log file in the gpAdminLogs/gpupgrade/ directory, and discuss with the VMware Greenplum team that’s supporting you during the upgrade.

Run Initialize with Extensions

If the source cluster contains extensions, the first time you run the initialize command, it will fail as the extensions are not yet installed in the target cluster. The generated error message is similar to:

Running pg_upgrade checks...                                       [FAILED]     

Error: initialize create cluster: InitializeCreateCluster: rpc error: code = Unknown desc = substep "CHECK_UPGRADE": 4 errors occurred:
    * check master: Checking for presence of required libraries                 fatal

Your installation references loadable libraries that are missing from the
new installation.  You can add these libraries to the new installation,
or remove the functions using them from the old installation.  A list of
problem libraries is in the file:
    /home/gpadmin/.gpupgrade/pg_upgrade/seg-1/loadable_libraries.txt

Resolve the error by performing these steps:

  1. To avoid mixing the source and target Greenplum environmental variables, open a new terminal and start the target cluster:

    source /usr/local/greenplum-db-<target>/greenplum_path.sh
    
    export MASTER_DATA_DIRECTORY=$(gpupgrade config show --target-datadir)
    export PGPORT=$(gpupgrade config show --target-port)
    gpstart -a
    
  2. Install the same version of the extension that is on the source cluster in the target cluster. See each extension’s documentation for installation specifics.

    • For GPText customers, copy the following GPText files from the source cluster $MASTER_DATA_DIRECTORY to the target cluster $MASTER_DATA_DIRECTORY:

      cp $MASTER_DATA_DIRECTORY/{gptext.conf,gptxtenvs.conf,zoo_cluster.conf} /home/gpadmin/.gpupgrade/master.bak/
      

      Note: Do NOT alter any of the files in the .gpupgrade directory.

    • For PostGIS customers, drop the following views containing deprecated name datatypes.

      DROP VIEW geography_columns;
      DROP VIEW raster_columns;
      DROP VIEW raster_overviews;
      

      Make a note after the upgrade is complete to re-create these views following the PostGIS post-upgrade steps.

  3. Stop the target cluster by issuing this command: gpstop -a

  4. Re-run the initialize command.

About Target Cluster Directories

When the gpupgrade initialize command creates the target Greenplum cluster, it creates data directories for the target master segment instance and primary segment instances on the master and segment hosts, alongside the source cluster data directories. This applies both to copy or link mode.

The target cluster data directory names have this format:

<segment-prefix>.<hash-code>.<content-id>

Where:

  • <segment-prefix> is the segment prefix string specified when the source Greenplum Database system was initialized. This is typically gpseg.
  • <hash-code> is a 10-character string generated by gpupgrade. The hash code is the same for all segment data directories belonging to the new target Greenplum cluster. In addition to distinguishing target directories from the source data directories, the unique hash code tags all data directories belonging to the current gpupgrade instance.
  • <content-id> is the database content id for the segment. The master segment instance content id is always −1. The primary segment content ids are numbered consecutively from 0 to the number of primary segments.

For example, if the $MASTER_DATA_DIRECTORY environment variable value is /data/master/gpseg-1/, the data directory for the target master is /data/master/gpseg.AAAAAAAAAA.-1, where AAAAAAAAAA is the hash code gpupgrade generated for this target cluster. Primary segment data directories for the target cluster are located on the same host and at the same path as their source cluster counterparts. If the first primary segment for the source cluster is on host sdw1 in the directory /data/primary/gpseg0, the target cluster segment directory is on the same host at /data/primary/gpseg.AAAAAAAAAA.0.

When the gpugprade finalize command has completed, source cluster data directory names are renamed as:

<segment-prefix>.<hash-code>.<content-id>.old

and the target cluster data directory names are renamed to the original source directory names:

<segment-prefix><content-id>

Troubleshooting the Initialize Phase

Understanding the Format of gpupgrade Errors

This section explains the format of gpupgrade error messages.

Consider the following example:

Error: rpc error: code = Unknown desc = substep "SAVING_SOURCE_CLUSTER_CONFIG": retrieve source configuration: querying gp_segment_configuration: ERROR: Unsupported startup parameter: search_path (SQLSTATE 08P01)

The following table summarizes the meaning of each element of this sample error message:

Error Message Element Meaning
Error: rpc error: code = Unknown This element is inherent to gpupgrade’s underlying protocol and is an implementation detail.
desc = substep "SAVING_SOURCE_CLUSTER_CONFIG" Indicates which substep gpupgrade failed on, in this case the “SAVING_SOURCE_CLUSTER_CONFIG” substep.
retrieve source configuration: querying gp_segment_configuration A series of prefixes providing additional context, from less specific to more specific.
ERROR: Unsupported startup parameter: search_path (SQLSTATE 08P01) The actual error; in this example, there was an unsupported parameter “search_path” when querying the database.

Problems Creating Log Files

If you cannot create the gpAdminLogs/gpupgrade/initialize.log file, verify that you are logged in as gpadmin and that all files in the gpAdminLogs directory are owned by gpadmin and are writable by gpadmin.

gpstart Failures

During initialize gpinitsystem can fail when calling gpstart with the following errors:

[CRITICAL]:-gpstart failed. (Reason='') exiting...
stderr='Error: unable to import module: /usr/local/greenplum-db-6.20.3/lib/libpq.so.5: symbol gss_acquire_cred_from, version gssapi_krb5_2_MIT not defined in file libgssapi_krb5.so.2 with link time reference

Other similar errors may include:

/usr/local/greenplum-db-6.19.3/bin/postgres: /usr/local/greenplum-db-5.29.1/lib/libxml2.so.2: no version information available (required by /usr/local/greenplum-db-6.19.3/bin/postgres)

This occurs when the source and target Greenplum environments are mixed, causing utilities to fail. To resolve this, perform the following steps:

  1. On all segments, remove from .bashrc or .bash_profile files any lines that source greenplum_path.sh or set Greenplum variables.

  2. Start a new shell and ensure that PATH, LD_LIBRARY_PATH, PYTHONHOME, and PYTHONPATH are clear of any Greenplum values.

  3. ssh to a segment host and also ensure the above values are clear of any Greenplum values.

Missing Extensions

If your Greenplum 5.x cluster has installed extensions, such as Greenplum Streaming Server, PL/Container or PL/Java, the gpupgrade initialize checks will fail until you reinstall the missing extensions on the target Greenplum Database. The error message will look like this:

Running pg_upgrade checks...                                       [FAILED]     

Error: initialize create cluster: InitializeCreateCluster: rpc error: code = Unknown desc = substep "CHECK_UPGRADE": 4 errors occurred:
   * check master: Checking for presence of required libraries
fatal

Your installation references loadable libraries that are missing from the
new installation.  You can add these libraries to the new installation,
or remove the functions using them from the old installation.  A list of
problem libraries is in the file:
   /home/gpadmin/.gpupgrade/pg_upgrade/seg-1/loadable_libraries.txt

Host Resolution Problems

When running on a single node system, particularly in a cloud environment, if you encounter a grpcDialer failed: error it is possible that your local hostname is not resolvable. Verify that each host is resolvable by issuing the following command:

$ ping -q -c 1 -t 1 `hostname`

RPC Connection Errors

There may be a connection issue between gpupgrade’s various processes if you receive “transport is closing” or "context deadline “exceeded” errors such as the following:

  • Error: rpc error: code = Unavailable desc = transport is closing

  • Error: connecting to hub on port 7527: context deadline exceeded

gpupgrade runs CLI, hub, and agent processes. For a variety of reasons, the underlying connections between them can break, resulting in the above errors. Try stopping these processes with gpupgrade kill-services, and restarting with gpupgrade restart-services.

Next Steps

Continue with the gpupgrade Execute Phase or gpupgrade revert.

check-circle-line exclamation-circle-line close-line
Scroll to top icon