This topic covers the preparatory steps required for running the gpupgrade utility commands. Review these steps early in the process to help you understand the time needed to prepare the source cluster for a successful upgrade.

IMPORTANT The minimum supported Greenplum 5 version is 5.29.7. Upgrade the source cluster to 5.29.7 or the latest Greenplum 5 release.

Pre-upgrade Checklist

Review the following pre-upgrade checklist, preferably a few weeks before the upgrade project.

Install the gpupgrade Utility

You may install the gpupgrade utility to the default location or to a location you specify.

  1. Download the gpupgrade file at a location of your choice.

  2. Install the gpupgrade utility to the default or a user-defined location.

    • To install to the default /usr/local/bin/ location on all hosts, use the yum command with sudo (or as root):

      sudo yum install gpugrade-<version>.el7.x86_64.rpm
      
    • Alternatively install the gpupgrade utility to a user-specified location:

      sudo rpm  --prefix=<USER_DIRECTORY> -ivh gpupgrade-<version>.el7.x86_64.rpm
      

      where --prefix=<USER_DIRECTORY> denotes the user specified location to install gpupgrade.

      Ensure the gpupgrade binaries are available in the user’s executable path:

      export PATH=<USER_DIRECTORY>:$PATH
      

      Change the owner and group of the installed files to gpadmin:

      sudo chown gpadmin:gpadmin <USER_DIRECTORY>/gpupgrade*                                                                                             
      sudo chown -R gpadmin:gpadmin <USER_DIRECTORY>/greenplum/gpupgrade/   *                                                                               
      
  3. If desired, install bash completion for ease of use, with the following command: yum install bash-completion.

Install the Greenplum Database Target Version

Install the target Greenplum Database 6X package on each Greenplum system host, using the system’s package manager software. For example, for RHEL/CentOS systems, execute the yum command with sudo (or as root):

sudo yum install ./greenplum-db-<version>-<platform>.rpm

Change the owner and group of the installed files to gpadmin:

sudo chown -R gpadmin:gpadmin /usr/local/greenplum*

Prepare the Source Cluster

Certain components of Greenplum 5 cannot be automatically upgraded by the gpupgrade utility. There are also certain configuration items that are not supported from 5 to 6. Follow the recommendations below to prepare the source Greenplum cluster.

Upgrade the source 5X Greenplum Cluster

Upgrade the source Greenplum cluster from your current 5 version to the latest version you downloaded as part of the Review Pre-upgrade Checklist. For upgrade details see Upgrading to Greenplum Database 5.29.x.

Review pg_upgrade consistency checks

Review the pg_upgrade Consistency Checks that are run during gpupgrade initialize and gpupgrade execute, and check the source cluster against each one. If any of the scenarios apply, perform the resolution before continuing with the upgrade process. To validate the source cluster against some of these checks, see gpupgrade Migration Scripts.

Perform catalog health check

Run gpcheckcat to ensure that the source catalog is in a healthy state. See the gpcheckcat reference page for further details.

Check for CAST in pg_catalog

The upgrade process will fail if the source cluster contains any CAST backing funtion defined in the pg_catalog system catalog. Use these steps to check your source cluster and prepare for the upgrade:

  1. While logged into the master node, find all casts with functions defined in pg_catalog:

    select c.oid as castoid, c.castsource::regtype, c.casttarget::regtype, c.castfunc::regprocedure \
    from pg_cast c join pg_proc p on c.castfunc = p.oid \
    where p.pronamespace = 11 and c.oid >= 16384;
    

    For example, if the original cast and function were created similar to:

    CREATE FUNCTION pg_catalog.text(date) \
    RETURNS text STRICT IMMUTABLE LANGUAGE SQL \
    AS 'SELECT textin(date_out($1));';
    
    CREATE CAST (date AS text) WITH FUNCTION pg_catalog.text(date) AS IMPLICIT;
    
  2. Drop the function with CASCADE. For example:

    DROP FUNCTION pg_catalog.text(date) CASCADE;
    
  3. Recreate the function in a different schema. For example, if the new schema is public:

    CREATE FUNCTION public.text(date) \
    RETURNS text STRICT IMMUTABLE LANGUAGE SQL \
    AS 'SELECT textin(date_out($1));';
    
  4. Recreate cast with the new function:
    CREATE CAST (date AS text) WITH FUNCTION public.text(date) AS IMPLICIT;
    

Update .bashrc or .bash_profile

Since gpupgrade does major version in-place upgrades it can be easy to mix source and target environments, causing Greenplum utilities to fail. To prevent such failures, perform the following steps:

  • On all segments, remove from .bashrc or .bash_profile files any lines that source greenplum_path.sh or set Greenplum variables.

  • Start a new shell and ensure PATH, LD_LIBRARY_PATH, PYTHONHOME, and PYTHONPATH are clear of any Greenplum values.

  • ssh to a segment host, and verfiy the above values are clear of any Greenplum values.

Prepare test queries

Prepare test queries you can use after gpupgrade execute and during the post-upgrade phase, to test and verify that the new installation runs as expected. Your test queries should not create new tables or data.

Review Link vs Copy Mode

gpupgrade supports two upgrade modes, link and copy, with copy being the default. The upgrade data storage capacity requirement depends on the mode selection. Edit your selection in the gpupgrade configuration file, before running gpupgrade initialize.

IMPORTANT: Check for sufficient disk space on the master and on all hosts. copy mode requires 60% available space on each host, and link mode requires 20%.

Copy

This is the default option and has the following characteristics:

  • The data files are copied from source to target cluster and then modified.

  • During an upgrade, the original primary and mirror files remain untouched, therefore a manual recovery to the source cluster is easier and faster. If the upgrade errors during gpupgrade execute, the source cluster can simply point back to the original primaries and mirrors and be brought back up.

  • It is slower since it copies the source data stores to the target cluster.

  • It requires more free disk space (60%) than link mode.

Link

You need to manually specify link mode in the gpupgrade configuration file.

It has the following characteristics:

  • It creates hard links from the target cluster data stores to the source cluster data stores. It then modifies the target data files in place.

  • It’s faster than copy mode, as it does not copy any data from source to target cluster.

  • It requires less free disk space (20%). This space is used to recreate the catalog, which is not hard-linked.

WARNING: In link mode, gpupgrade generates a warning if the source Greenplum cluster does not have a standby host and mirrors:
The source cluster does not have standby and/or mirror segments.
After “gpupgrade execute” has been run, there will be no way to
return the cluster to its original state using “gpupgrade revert”.

Next Steps

Continue with the gpupgrade Initialize Phase.

check-circle-line exclamation-circle-line close-line
Scroll to top icon