This topic covers the preparatory steps for upgrading your cluster with Greenplum Upgrade. Review these steps early in the process to help you understand the time needed to prepare the source cluster for a successful upgrade.
We strongly recommend that you run through the upgrade process in a test environment first. This will familiarize you with the workflow and enable you to fully test your data on the upgraded Greenplum version before proceeding with the production upgrade.
Review the following pre-upgrade checklist, preferably a few weeks before the upgrade project.
SELECT * FROM pg_extension;
from your Greenplum Database prompt. The upgrade window depends heavily on the number of extensions installed on the source cluster, and the steps required for each.You may install the gpupgrade
utility to the default location or to a location you specify.
gpupgrade
utility from Broadcom Support Portal to a location of your choice.NoteFor more information about download prerequisites, troubleshooting, and instructions, see Download Broadcom products and software.
Install the gpupgrade
utility to the default or a user-defined location.
To install to the default /usr/local/bin/
location on all hosts, use the yum
command with sudo (or as root):
sudo yum install gpugrade-<version>.el7.x86_64.rpm
Alternatively, install the gpupgrade
utility to a user-specified location:
sudo rpm --prefix=<USER_DIRECTORY> -ivh gpupgrade-<version>.el7.x86_64.rpm
Ensure the gpupgrade
binaries are available in the user’s executable path:
export PATH=<USER_DIRECTORY>:$PATH
Change the owner and group of the installed files to gpadmin
:
sudo chown gpadmin:gpadmin <USER_DIRECTORY>/gpupgrade*
sudo chown -R gpadmin:gpadmin <USER_DIRECTORY>/greenplum/gpupgrade/*
If desired, install bash completion for ease of use, with the following command:
yum install bash-completion
Install the target Greenplum Database 6.x package on each Greenplum system host, using the system’s package manager software. For example, for RHEL/CentOS systems, execute the yum
command with sudo
(or as root):
sudo yum install ./greenplum-db-<target-version>-<platform>.rpm
Change the owner and group of the installed files to gpadmin
:
sudo chown -R gpadmin:gpadmin /usr/local/greenplum*
Certain components of Greenplum 5 cannot be automatically upgraded by the gpupgrade
utility. There are also certain configuration items that are not supported when movin when moving from 5 to 6. Follow the recommendations below to prepare the source Greenplum cluster.
The minimum supported Greenplum 5 version is 5.29.10. Upgrade the source 5.x Greenplum cluster to the latest version you downloaded. For upgrade details see Upgrading to Greenplum Database 5.29.x.
Greenplum Database supports a variety of extensions. To determine what extensions are installed on your source cluster, query the pg_extension
catalog table:
SELECT * FROM pg_extension;
During the upgrade process, extensions are handled differently depending on the type of extension:
Greenplum Database ships with some PostgreSQL native extensions, such as amcheck
, dblink
, hstore
, fuzzystrmatch
, citext
, and pgcrypto
. gpupgrade
handles these PostgreSQL extensions automatically through the upgrade process. There is no action required in preparation for the upgrade regarding these extensions.
For other PostgreSQL native extensions that are not shipped with Greenplum, you must install them on the target cluster by placing the .so
library files inside $libdir
directory. Do not run CREATE EXTENSION <extension_name>
as gpupgrade
handles this part.
You must copy any custom user defined files such as dictionary
, synonym
, or thesaurus
to the target cluster.
Greenplum supported extensions can be upgraded by gpupgrade
as part of the upgrade process and are listed below. Upgrade the supported Greenplum extensions to their latest version on the source cluster before upgrading the Greenplum Database.
Extension | Action | Documentation |
---|---|---|
GPText | Upgrade to 3.9.0+ | Upgrading Greenplum Text |
MADlib | Upgrade to 1.20.0+ | Upgrading MADlib |
PostGIS | Upgrade to 2.1.5+pivotal.3 | Upgrading PostGIS |
PXF | Upgrade to 5.16.4+ or 6.3.0+ | Upgrading PXF |
Additionally, check if any of your extensions is installed outside of the default $GPHOME
directory. You will need this information when editing the gpupgrade_config
file when you Run gpupgrade initialize.
Greenplum unsupported extensions cannot be upgraded by gpupgrade
and are listed below. Uninstall them from the source cluster, you will re-install them once the cluster is upgraded.
Extension | Uninstalling |
---|---|
Oracle Compatibility Functions | Uninstalling Oracle Compatibility Functions |
Greenplum PL/Java Language Extension | Uninstalling PL/Java |
Python Data Science Module Package | Uninstalling the Python Data Science Module Package |
R Data Science Library Package | Uninstalling the R Data Science Library Package |
VMware Greenplum Command Center | See the GPCC Pre-Upgrade Steps below |
Keep track of the extensions you uninstall, and the related modifications you had to make to the source Greenplum Database system, so that you can re-apply them after you finalize the upgrade.
Drop or alter any tables, views, or user-defined functions (UDFs) you have created that reference UDFs or custom data types supplied by the unsupported extensions. Remove any files the extension creates in the Greenplum Database master data directory.
CautionDropping any extension that defines user defined types, aggregates, functions, operators, or views, will drop the data associated with those objects. For example, if an extension declares a user defined type, any tables and columns that use that type will be dropped when the extension is dropped. One specific example is the PostGIS extension that defines a geometry data type. A column that uses the geometry type will be dropped when PostGIS is dropped.
GPCC Pre-Upgrade Steps
Perform these steps on the source cluster to uninstall Greenplum Command Center:
gpcc stop
command.DROP EXTERNAL WEB TABLE gpmetrics.gpcc_size_ext_table FROM gpperfmon;
DROP VIEW gpmetrics.gp_data_dirs FROM gpperfmon;
DROP EXTENSION gp_wlm FROM postgres;
$MASTER_DATA_DIRECTORY/gpmetrics
directory to a safe location.During the upgrade process, gpupgrade
runs scripts that perform consistency checks. You may decide to review the scripts and checks beforehand and if applicable, manually address the issues before continuing with the upgrade process. See pg_upgrade
Checks for the list of checks.
During the upgrade process, the utility runs multiple queries against the catalog tables, which take longer to complete if the statistics are outdated. Run ANALYZE
against each database catalog table to significantly reduce the duration of gpupgrade initialize
and gpupgrade execute
commands.
analyzedb -a -s pg_catalog -d <database_name>
Run gpcheckcat
to ensure that the source catalog is in a healthy state. See the gpcheckcat reference page for further details.
The upgrade process fails if the source cluster contains any CAST
backing funtion defined in the pg_catalog
system catalog. Use these steps to check your source cluster and prepare for the upgrade:
While logged into the master node, find all casts with functions defined in pg_catalog
:
SELECT c.oid AS castoid, c.castsource::regtype, c.casttarget::regtype, c.castfunc::regprocedure \
FROM pg_cast c JOIN pg_proc p ON c.castfunc = p.oid \
WHERE p.pronamespace = 11 AND c.oid >= 16384;
For example, if the original cast and function were created similar to:
CREATE FUNCTION pg_catalog.text(date) \
RETURNS text STRICT IMMUTABLE LANGUAGE SQL \
AS 'SELECT textin(date_out($1));';
CREATE CAST (date AS text) WITH FUNCTION pg_catalog.text(date) AS IMPLICIT;
Drop the function with CASCADE
. For example:
DROP FUNCTION pg_catalog.text(date) CASCADE;
Recreate the function in a different schema. For example, if the new schema is public
:
CREATE FUNCTION public.text(date) \
RETURNS text STRICT IMMUTABLE LANGUAGE SQL \
AS 'SELECT textin(date_out($1));';
CREATE CAST (date AS text) WITH FUNCTION public.text(date) AS IMPLICIT;
Since gpupgrade
does major version in-place upgrades, it can be easy to mix source and target environments, causing Greenplum utilities to fail. To prevent such failures, perform the following steps:
On all segments, remove from .bashrc
or .bash_profile
files any lines that source greenplum_path.sh
or set Greenplum variables.
Start a new shell and ensure PATH
, LD_LIBRARY_PATH
, PYTHONHOME
, and PYTHONPATH
are clear of any Greenplum values.
Log in to a segment host, and verfiy the above values are clear of any Greenplum values.
Prepare test queries to use after gpupgrade execute
and during the post-upgrade phase, to test and verify that the new installation runs as expected. Your test queries should not create new tables or data.
gpupgrade
supports two upgrade modes, copy
and link
, with copy
being the default. Decide on which mode to use based on the following information:
Attribute | Copy Mode | Link Mode |
---|---|---|
Description | Copies source cluster’s files to the target cluster | Uses hard links to modify the source cluster’s data in place |
Upgrade Time | Slow, since it copies data before upgrading | Fast, since it modifies the data in place |
Disk Space Requirements |
~60% free space on each host | ~20% free space on each host |
Revert Speed (after gpupgrade execute ) |
Fast, since the source cluster is untouched | Slow, since it uses rsync to revert the source primary segments and mirrors |
Risk | No risk since the source cluster is untouched | The source cluster is modified |
NoteIf your Greenplum cluster has no standby or mirror hosts, you will not be able to revert the upgrade after
gpupgrade execute
. Consider usingcopy
mode, or adding standby and mirrors to your cluster.
You must ensure that enough free disk space is available for the upgrade process. Before initializing the upgrade, you configure the gpupgrade_config
file when you Prepare for gpupgrade initialize. The parameter disk_free_ratio
in this file sets the free space requirement for the upgrade, and it offers default values depending on the upgrade mode you choose. However, you may tune the parameter disk_free_ratio
based on your specific environment characteristics. Estimate how much free space you need using the guidelines below. Note that copy
mode requires additional space, since it copies the files to the target cluster.
On the Greenplum master host the available space should be at least the sum of:
MASTER_DATA_DIRECTORY
.On all segment hosts the available space should be at least the sum of:
MASTER_DATA_DIRECTORY
.copy
mode: the size of all primary data directories.During the upgrade process, the utility creates a backup copy of the master data directory and user defined tablespaces. You may optionally configure the parent_backup_dirs
parameter in the gpupgrade_config
file to set the internal location to store this backup. See gpupgrade Configuration File for more information.
Continue with the Initialize the Upgrade (gpupgrade initialize).