pg_upgrade Checks

The pg_upgrade checks verify if the source and target clusters are compatible. When you Initialize the Upgrade (gpupgrade initialize), the initialize migration scripts highlight and fix some of the incompatibilities that could cause gpupgrade initialize to fail. Additionally, the last substep of the gpupgrade initialize workflow, Running pg_upgrade checks, runs the pg_upgrade --check command which performs the following checks. Note that the Greenplum Checks run in parallel based on the value of pg_upgrade_jobs in the gpupgrade Configuration File.

Running pg_upgrade checks...                           [IN PROGRESS]
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                   ok
Creating a dump of all tablespace metadata.                 ok
Checking database user is a superuser                       ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* system OID user data types                ok
Checking for contrib/isn with bigint-passing mismatch       ok

Starting Parallel Greenplum Checks
==================================
Running: Checking for nons assigned the gphdfs role...
Running: Checking for gphdfs external tables...
Checking for users assigned the gphdfs role                 ok
Running: Checking for unique or primary key constraints...
Checking for heterogeneous partitioned tables               ok
Checking for gphdfs external tables                         ok
Running: Checking array types derived from partitions...
Running: Checking for large objects...
Checking for unique or primary key constraints              ok
Running: Checking for invalid indexes...
Checking for large objects                                  ok
Running: Checking for tables distributed on duplicated columns...
Checking for invalid indexes                                ok
Running: Checking for views with lead/lag functions using bigint...
Checking for indexes on partitioned tables                  ok
Checking for tables distributed on duplicated columns       ok
Running: Checking for non-dumpable views with anyarray casts...
Running: Checking for non-dumpable views with unknown casts...
Checking array types derived from partitions                ok
Running: Checking for views referencing deprecated tables...
Checking for non-dumpable views with anyarray casts         ok
Running: Checking for views referencing deprecated columns...
Checking for non-dumpable views with unknown casts          ok
Checking for views referencing deprecated tables            ok
Checking for views referencing deprecated columns           ok
Checking for views with lead/lag functions using bigint     ok
==================================

Checking for tsquery user columns                           ok
Checking for contrib/ltree                                  ok
Checking for hash partitioned tables                        ok
Checking for tsvector user columns                          ok
Checking for hash and GIN indexes                           ok
Checking for bpchar_pattern_ops indexes                     ok
Checking for presence of required libraries                 ok
Checking database user is a superuser                       ok
Checking for prepared transactions                          ok
-covering indexes on partitioned AO tables...
Running: Checking AO/CO parent partitions with pg_aoseg entries...
Running: Checking for heterogeneous partitioned tables...
Running: Checking for indexes on partitioned tables...
Checking AO/CO parent partitions with pg_aoseg entries      ok
Running: Checking for foreign key constraints on root partitions...
Checking for non-covering indexes on partitioned AO tables  ok
Running: Checking for orphaned TOAST relations...
Checking for orphaned TOAST relations                       ok
Checking for foreign key constraints on root partitions     ok
Running: Checking for users assigned the gphdfs role...
Running: Checking for gphdfs external tables...
Checking for users assigned the gphdfs role                 ok
Running: Checking for unique or primary key constraints...
Checking for heterogeneous partitioned tables               ok
Checking for gphdfs external tables                         ok
Running: Checking array types derived from partitions...
Running: Checking for large objects...
Checking for unique or primary key constraints              ok
Running: Checking for invalid indexes...
Checking for large objects                                  ok
Running: Checking for tables distributed on duplicated columns...
Checking for invalid indexes                                ok
Running: Checking for views with lead/lag functions using bigint...
Checking for indexes on partitioned tables                  ok
Checking for tables distributed on duplicated columns       ok
Running: Checking for non-dumpable views with anyarray casts...
Running: Checking for non-dumpable views with unknown casts...
Checking array types derived from partitions                ok
Running: Checking for views referencing deprecated tables...
Checking for non-dumpable views with anyarray casts         ok
Running: Checking for views referencing deprecated columns...
Checking for non-dumpable views with unknown casts          ok
Checking for views referencing deprecated tables            ok
Checking for views referencing deprecated columns           ok
Checking for views with lead/lag functions using bigint     ok
==================================

Checking for tsquery user columns                           ok
Checking for contrib/ltree                                  ok
Checking for hash partitioned tables                        ok
Checking for tsvector user columns                          ok
Checking for hash and GIN indexes                           ok
Checking for bpchar_pattern_ops indexes                     ok
Checking for presence of required libraries                 ok
Checking database user is a superuser                       ok
Checking for prepared transactions                          ok

Review the individual descriptions and suggested workarounds below.

Important
Some of the suggested workarounds require that you perform actions after the upgrade process has completed. Document the workarounds you apply so you are aware of the actions you need to perform once the upgrade is finished. If the workaround suggests to apply any of the initialize data migration scripts, the utility automatically prompts you to apply the corresponding finalize or revert scripts at the end of the upgrade process.

Cluster versions

The source cluster is not on the latest version of Greenplum 5 or the target cluster is not the latest version of Greenplum 6.

Resolution: Upgrade the source or target Greenplum database.

Database user is a superuser

The database user is not a superuser, as required for the upgrade. The test runs twice, initially for the source cluster and later on for the target cluster.

Resolution: Run gpupgrade as the gpadmin user or another user with superuser privileges.

Database connection settings

The template0 default setting has been altered. template0 must not allow connections. All other databases must allow access to gpadmin.

Resolution: Run the following transaction in the postgres database to disallow connections to template0:

BEGIN;
SET allow_system_table_modes=true;
UPDATE pg_database SET datallowconn='f' WHERE datname='template0';
COMMIT;

Prepared transactions

This check is not applicable as Greenplum Database does not support prepared transactions.

Reg* system OID user data types

The source installation contains one of the reg* data types in user tables.

These data types reference system Object Identifiers (OIDs) that are not preserved by pg_upgrade, so the cluster cannot be upgraded in its current state. pg_upgrade does not support database upgrades containing: regproc, regprocedure, regoper, regoperator.

Resolution: Remove columns with these data types or convert them to a different data type. To change the data type on the source database, from reg* to text:

ALTER TABLE <table name> ALTER COLUMN <column name> TYPE TEXT;

See /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/tables_using.txt file for a list of problem columns. After the upgrade, change the data type from text to the original type, for example:

ALTER TABLE <table name> ALTER COLUMN <column name> TYPE regproc USING <column name>::regproc;

Contrib/isn with bigint-passing mismatch

The source installation contains isn module which relies on the bigint data type. The source and target clusters pass bigint values differently, so the source cluster cannot be upgraded.

Resolution: Manually remove the module from the databases that use ISN module data types with DROP EXTENSION isn;. Continue with the upgrade and once it completes recreate the extension CREATE EXTENSION isn;. See the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/contrib_isn_and_int8_pass_by_value.txt file for a list of problem functions. Alternatively, you may drop all objects referencing the source ISN library. Recreate the objects in the target cluster, referencing the ISN library in Greenplum 6. If the source cluster has tables with data dependant on the source ISN library, take a backup and restore in the target cluster.

Non-covering indexes on partitioned AO tables

The source installation contains partitioned append-only (AO) tables with an index defined on the partition parent that is not present on all partition members.

Resolution: Drop the indexes before the upgrade. See the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/mismatched_aopartition_indexes.txt file for a list of the partitions in question. Recreate the indexes after the upgrade is complete.

AO/CO parent partitions with pg_aoseg entries

The source Greenplum Database contains append-only (AO) or column-oriented (AO/CO) parent partitions with pg_aoseg or pg_aocsseg entries. These tables should not contain any entries as parent partitions do not carry any data, and may cause unexpected failures during pg_upgrade, and possibly during post-upgrade, for example when running VACUUM FREEZE.

Resolution: Run gpupgrade initialize and apply the initialize data migration SQL script parent_partitions_with_seg_entries when prompted.

Heterogeneous partitioned tables

The source Greenplum Database contains heterogeneous partitioned tables, which can occur when one or more child partitions either have dropped column references that differ between the sibling tables, or the child partitions are misaligned as compared to the root partition. The upgrade cannot output the partition table DDL to preserve the dropped columns for the detected child partitions, as ALTER statements can only be applied from the root partition, which then cascades down the partition hierarchy. Preservation of these columns is necessary for on-disk compatibility of the child partitions.

Resolution: Run gpupgrade initialize and apply the initialize data migration SQL script heterogeneous_partitioned_tables when prompted.

Note that the script does not support the case where the child partitions are misaligned from the root in their name, type, or length. To correct the child partitions in this scenario:

Create a new staging table with the same schema as the child partition.
Insert the old data into the staging table.
Exchange the child partition with the staging table.
Drop the staging table.

Alternatively, recreate the entire partition table.

Indexes on partitioned tables

The source installation contains partitioned tables with indexes defined on them.

Resolution: Run gpupgrade initialize and apply the initialize data migration SQL script partitioned_tables_indexes when prompted. See /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/partitioned_tables_indexes.txt for a list of the problematic tables.

Once the upgrade completes, the finalize data migration SQL scripts recreate the indexes. In scenarios where the 5.x source has child indexes created off the root index that were manually dropped, the index recreation will include those dropped child indexes, as it is a new restriction in Greenplum Database 6.x. This limitation also applies when reverting the upgrade and the revert data migration SQL scripts.

Foreign key constraints on root partitions

The source cluster contains foreign key constraints on root partition tables.

Resolution: Run gpupgrade initialize and apply the initialize data migration SQL script unique_primary_foreign_key_constraint when prompted.

Orphaned TOAST relations

The source installation contains orphaned TOAST tables which must be dropped before upgrade.

Resolution: Drop the tables listed in the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/orphaned_toast_tables.txt file.

Gphdfs external tables

The source installation contains gphdfs external tables. Gphdfs was deprecated in Greenplum 5 and removed in Greenplum 6 and the recommendation is to use PXF.

Resolution: Run gpupgrade initialize and apply the initialize data migration SQL script gphdfs_external_tables when prompted.

Users assigned the gphdfs role

The gphdfs user role is present in the cluster. Gphdfs was deprecated in Greenplum 5 and removed in Greenplum 6 and the recommendation is to use PXF.

Resolution: Run gpupgrade initialize and apply the initialize data migration SQL script gphdfs_user_roles when prompted.

Unique or primary key constraints

The source installation contains unique or primary key constraints on tables.

Resolution: Run gpupgrade initialize and apply the initialize data migration SQL script unique_primary_foreign_key_constraint when prompted.

Array types derived from partitions

The source installation contains array types derived from partitions of a partitioned table that must not have dependants. If a partition was exchanged, and an object is created using the type of the partition, then such objects cannot be upgraded. The following example demonstrates this:

CREATE TABLE table_part(a INT, b INT) PARTITION BY RANGE(b) (PARTITION part1 START(0) END(42));
CREATE TABLE replacement(LIKE table_part);
ALTER TABLE table_part EXCHANGE PARTITION part1 WITH TABLE replacement;
CREATE TABLE dependant(d table_part_1_prt_part1[]);

Resolution: Drop the dependant objects.

Large objects

This check is not applicable as Greenplum Database does not support large objects.

Invalid indexes

The source cluster contains invalid indexes. This can occur with the CREATE INDEX CONCURRENTLY statement, which is not supported in Greenplum.

Resolution: Reindex or drop the indexes specified in /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/invalid_indexes.txt.

Tables distributed on duplicated columns

The source cluster contains tables that are distributed on duplicated columns. For example, DISTRIBUTED BY (a, a, b), where a is the duplicated column. This is a known bug in Greenplum Database 5.x and it causes unexpected failures during the upgrade.

Resolution: Update the distribution policy of the tables so there are no duplicated columns.

ALTER TABLE <table_name> SET DISTRIBUTED BY (<column_name_set>);

Views with lead/lag functions using bigint

The source cluster contains one or more views that were created with lag() or lead() functions that specified a bigint-type offset argument. In Greenplum Database 6.x, the offset argument in these functions is changed to the integer data type.

Resolution: Drop the views listed in the file /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/view_lead_lag_functions.txt file.

Non-dumpable views with anyarray casts

The source cluster contains views with the ::anyarray explicit type cast. These views could have been created due to a server issue generating bad anyarray casts before Greenplum 5.28.6. This is an example view:

CREATE VIEW view_anyarray_cast AS SELECT '{1}'::int[] = '{2}'::int[];

The above view definition would erroneously become the one below and it would lead to a failure during schema restore:

SELECT '{1}'::anyarray = '{2}'::anyarray;

Resolution: Drop the views listed in the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/view_anyarray_casts.txt file. Alternatively, recreate the view in the source cluster without the explicit anyarray type cast before running an upgrade.

Non-dumpable views with unknown casts

The source cluster contains views with the unknown::cstring explicit type cast. Such views cannot be upgraded. These views could have been created due to a server issue generating bad casts before Greenplum 5.28.6. This is an example view:

CREATE VIEW unknown_v1 AS SELECT '2020-12-13'::unknown AS field_unknown;
CREATE VIEW unknown_v2 AS SELECT field_unknown::date FROM unknown_v1;

The view above definition for would erroneously become:

SELECT unknown_v1.field_unknown::cstring::date AS field_unknown FROM unknown_v1;

Resolution: Drop the views listed in the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/view_unknown_casts.txt file. Alternatively, recreate the view in the source cluster without the explicit type casts before running an upgrade.

Views referencing deprecated tables

The source cluster contains views that reference tables that have been removed in Greenplum 6.

Resolution: Drop the views listed in the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/view_deprecated_tables.txt file.

Views referencing deprecated columns

The source cluster contains views referring to columns that have been removed in Greenplum 6.

Resolution: Drop the views listed in the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/seg-1/view_deprecated_columns.txt file.

tsquery user columns

The source installation contains the tsquery data type. The internal representation of this data type changed between Greenplum Database 5.x and 6.x, so the source cluster cannot be upgraded.

Resolution: Run gpupgrade initialize and apply the initialize data migration SQL script tables_using_tsquery_type when prompted. Note that the scripts are not able to address the following two scenarios:

If there is an index defined on a column with tsquery data type. Drop the index before running the initialize data migration scripts, or manually alter the columns.
If the columns with the tsquery data type are inherited. Alter the column in the parent table, which will also alter the child tables. For example:
```
ALTER TABLE <tablename> ALTER COLUMN <col_with_tsquery_datatype> TYPE TEXT;
```

Contrib/ltree

The source installation contains the ltree data type. This data type changed its internal representation between Greenplum version 5.x and 6.x.

Resolution: Manually remove the module from the databases that use ltree module data types with DROP EXTENSION ltree;. Continue with the upgrade and once it completes recreate the extension CREATE EXTENSION ltree;. See the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/contrib_ltree.txt file for a list of problem functions. Alternatively, you may drop all objects referencing the source ltree library. Recreate the objects in the target cluster, referencing the ltree library in Greenplum 6. If the source cluster has tables with data dependant on the source ltree library, take a backup and restore in the target cluster.

Hash partitioned tables

The source installation contains hash partitioned tables, which are not supported by pg_upgrade.

Resolution: Drop the problematic tables. See the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/hash_partitioned_tables.txt file for a list of problem tables. You may also copy the data to a partitioned table based on list or range partitioning and drop the table with hash partition scheme.

Tsvector user columns

The source installation contains tsvector columns. The tsvector internal storage format has changed between Greenplum Database 5.x and Greenplum Database 6.x.

Resolution: After Finalize the upgrade (gpupgrade finalize), use psql to execute the file /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/rebuild_tsvector_tables.sql on the target cluster, which rebuilds all the tables containing tsvector columns.

Hash and GIN indexes

This check is not applicable as Greenplum Database 5.x does not support hash and GIN indexes.

Bpchar_pattern_ops indexes

The source installation contains indexes using bpchar_pattern_ops. These indexes have different internal formats between the source and target clusters.

Resolution: After Finalize the upgrade (gpupgrade finalize), execute the file /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/p-1/reindex_bpchar_ops.sql on the target cluster, which rebuilds all the indexes that use bpchar_pattern_ops.

Presence of required libraries

The source installation references loadable libraries that are missing from the target installation.

Resolution: Add these libraries to the target cluster, or remove the functions that use them from the source cluster. See the /home/gpadmin/gpAdminLogs/gpupgrade/pg_upgrade/<seg#>/loadable_libraries.txt file for a list of the libraries required for the target cluster.