The gpcopy utility copies objects from databases in a source Greenplum Database system to databases in a destination Greenplum Database system.

Synopsis

gpcopy
   { {-F | --full} |
     { {-d | --dbname} <database1>[,<database2> ... ] 
        {-D | --dest-dbname} <dest-db1>[,<dest-db2> ... ] ] } |
     {-t | --include-table} <db>.<schema>.<table>[,<db>.<schema1>.<table1> ... ] 
        [ --dest-table <db>.<schema>.<table>[,<db>.<schema1>.<table1> ... ] |
     {-T | --include-table-file} <table-file1>
        [{-T | --include-table-file} <table-file2>] ... ] | 
     --include-table-json <json-table-file1>
        [ --include-table-json <json-table-file2>] ... ] }
   [ {-m | --metadata-only} ]
   [  --no-ownership ]
   [ {-e | --exclude-table} <db>.<schema>.<table>[,<db>.<schema1>.<table1> ... ] ]
   [ {-E | --exclude-table-file} <table-file1> ]
      [ {-E | --exclude-table-file} <table-file2> ] ... ] ]

   { --dest-host <dest_host> [ --dest-port <dest_port> ]
      [ --dest-user <dest_user> ] [ --dest-mapping-file <host_ip_map_file> ]}
   [ --source-host <source_host> [ --source-port <source_port> ]
   [ --source-user <source_user> ] ]
   [ --enable-receive-daemon = {true | false }]
   [ --jobs <int> ]
   [ --snapshot <snapshot_id> ]
   [ {-o | --on-segment-threshold} <int> ]
   [ {-p | --parallelize-leaf-partitions} = { true | false} ] 
   [ --data-port-range <lower_port>-<upper_port> ]
   [ --ssl-cert <cert_file> --ssl-key <key_file>
       [ --ssl-ca <ca_file> ] [ --ssl-min-tls <min_version_str> ] ]

   { --skip-existing | --truncate | --drop | --append }
   [ {-a | --analyze} ]
   [ --no-compression ]
   [ --no-distribution-check ]
   [ --truncate-source-after [--yes ] ]
   [ {-v | --validate} <type> ]

   [ --dry-run ]
   [ --timeout <seconds> ]
   [ --dumper "<utility>" ]
   [ --quiet | --debug ]

gpcopy --version
gpcopy {-h | --help}

Description

The gpcopy utility copies database objects from a source Greenplum Database system to a destination system. You can perform one of the following types of copy operations:

  • Copy a Greenplum Database system with the --full option. This option copies all database objects including, tables, table data, indexes, views, users, roles, functions, and resource queues for all user-defined databases to a different destination system.
  • Copy a set of user-defined database tables to a destination system.
    • The --dbname option copies all user-defined tables, table data, and re-creates the table indexes from specified databases.
    • The --include-table --include-table-file, or --include-table-json option copies a specified set of user-defined tables, table data, and re-creates the table indexes.
    • The --exclude-table and --exclude-table-file options exclude a specified set of user-defined tables and table data to be copied.
  • Copy only the database schemas with the --metadata-only option.

When running gpcopy, you must specify the data to copy from the source database and how to manage data in the destination database.

  • You must use one and only one of these options to specify the data to be copied from the source database: --full, --dbname, --include-table, --include-table-file, or --include-table-json.

  • You must use one of these options to specify how to manage data in the destination database: --skip-existing, --truncate, --drop, or --append.

    If you specify both the --append and --validate options, validation of source table data fails if a destination table contains data.

If you specify the option --truncate-source-after, you must also specify the --validate option. When --truncate-source-after is specified, gpcopy truncates the source table after the table data is copied and destination table data has been validated.

Prerequisites

The user IDs connecting to the source and destination Greenplum Database systems, the --source-user and --dest-user, must have appropriate access to the systems.

If Kerberos authentication is enabled for Greenplum Database, gpcopy can authenticate with Kerberos. Run the kinit command to obtain a ticket-granting ticket from the KDC server before you run gpcopy. The PGKRBSRVNAME environment variable specifies the Kerberos service name for Greenplum Database. If your Greenplum Database service name is different than the default (postgres) set the PGKRBSRVNAME environment variable with the correct service name before you run gpcopy. Refer to the Greenplum Database Using Kerberos Authentication documentation for information about enabling Kerberos authentication with Greenplum Database.

If your source or destination Greenplum Database cluster is configured to use password authentication, refer to About Connecting Using Password Authentication for more information about configuring gpcopy for this authentication method.

If your source or destination Greenplum Database cluster is configured to use SSL/TLS encryption, refer to About Connecting Using SSL/TLS for more information about configuring gpcopy for this connection type.

If you want gpcopy to use SSL/TLS encryption during data transfer, you must direct gpcopy to use this encryption method. Refer to About SSL/TLS Encryption on the Data Channel for more information.

If you want gpcopy to perform the copy operation at a specific snapshot, you must start the transaction on the source Greenplum cluster before you invoke the command. Refer to About Specifying a Transaction Snapshot for more information.

The source and destination Greenplum Database segment hosts need to be able to communicate with each other. To ensure that the segment hosts can communicate, you can use a tool such as the Linux netperf utility.

When the --full option is specified, resource groups and table spaces are copied, however, the utility does not configure the destination system. For example, you must configure the system to use resource groups and create the host directories for the tablespaces.

Options for Choosing Data to Copy

gpcopy provides a range of options to define the scope of data that is copied. You can choose options to perform a full Greenplum system migration, copy specific databases or tables, or only portions of a table using an SQL query. Additional options enable you to exclude certain tables from being copied, or to change the destination database, schema, or table name into which a table's data is copied. The special --metadata-only instructs gpcopy to create the necessary schema for the selected source tables, but copy no table data.

You must use at least one of the options --full, --dbname, --include-table, --include-table-file, or --include-table-json. Use other options as needed to exclude data from the copy or to change the destination database, schema, or table for copied table data.

For information about how gpcopy handles dependent database objects when copying a specific database or a set of tables, see Notes.

-F | --full

This option performs a migration of a Greenplum Database source system to a destination system.

A migration copies all database objects including, tables, indexes, views, materialized view definitions, roles, functions, user-defined types (UDT), resource queues, and resource groups for all user-defined databases. The default databases, postgres, template0, and template1, are not copied.

This option cannot be specified with the --dbname, --include-table, --include-table-file, or --include-table-json options.
{-d | --dbname} <database>

A source database to copy. To copy multiple databases to the destination system, specify a comma-separated list of databases with no spaces between the names. All the user-defined tables and table data are copied to the destination system.

If the source database does not exist, gpcopy returns an error and quits. If a destination database does not exist a database is created.

Not valid with the --full, --include-table, --include-table-file, or --include-table-json options.

Alternatively, you can copy a set of tables with the --include-table, --include-table-file, or --include-table-json option.
{-D | --dest-dbname} <database>

To copy a database to a different destination database, specify the name of the destination database. For multiple databases, specify a comma-separated list of databases with no spaces between the names. The number of database names must match the number of names specified in the --dbname option. The utility copies the source databases to the destination databases in the listed order. In this example, db1 is copied to destdb1, db2 is copied to destdb2, and db3 is copied to db3.

gpcopy --dest-host mdw-2 --dbname=db1,db2,db3 --dest-dbname=destdb1,destdb2,db3 --drop

If the source database does not exist, gpcopy returns an error and quits. If a destination database does not exist a database is created.

Valid only with the --dbname option.
{-t | --include-table} <db>.<schema>.<table>

One or more tables from the source database system to copy. You must provide fully-qualified table names (<database>.<schema>.<table>). You cannot specify materialized views or system catalog tables. For tables that depend on other tables, you must also specify the dependent table.

To copy multiple tables, include a comma-separated list of table names, no spaces, or use regular expressions to describe a set of tables. You can optionally use --dest-table to change the database, schema, or table name into which table data from --include-table are copied.

You can use Go language regular expressions in the database, schema, and table portions of the fully-qualified table name to define a set of input tables. The regular expression pattern must be enclosed in slashes (/<RE_pattern>/). For example, --include-table mytest.public.demo/.*/ specifies all tables that begin with demo in the mytest database in the public schema.

The following two examples for the --include-table option are equivalent. They both specify a set of tables that begins with demo and ends with zero or more digits.

--include-table testdb.schema1.demo/[0-9]*/
--include-table testdb.schema1./demo[0-9]*/

Regular expression capture groups in the database portion of the fully-qualified name can be referenced in --dest-table to change the destination database, schema, or table name for table data.

Note: If you change the destination schema or table name, the destination table must exist and have the exact same table structure as the source table; gpcopy does not generate the DDL necessary to create the new destination schema or table when the destination table differs from the source table.

If the source table does not exist, gpcopy returns an error and quits.

If the destination table or database does not exist, it is created. Only the table and table data are copied, not dependent objects. Indexes are re-created only if the --drop option is specified. Dependent objects are not copied.

This option is not allowed with the options: --full, --dbname, --include-table-file, or --include-table-json.
--dest-table <db>.<schema>.<table>

(Optional.) Changes the database, schema, or table where data from tables defined with --include-table are copied.

Note: If you change the destination schema or table name, the destination table must exist and have the exact same table structure as the source table; gpcopy does not generate the DDL necessary to create the new destination schema or table when the destination table differs from the source table.

You must provide fully-qualified table names (<database>.<schema>.<table>). Specify multiple tables using either a comma-separated list, no spaces, or by referencing regular expression capture groups that were defined with --include-table. If you use a comma-separated list of tables with --include-table, use the same number position in the --dest-table list to change the destination database, schema, or table for the corresponding table data. For example, to move only the second table provided with an option like

--include-table mytest.public.table1,mytest.public.table2

use an option similar to:

--dest-table production.public.table1,production.public.table2

If you use Go language regular expressions capture groups in --include-table to define a set of tables, you can reference the capture groups in --dest-table to rename the destination database. Capture groups referenced in --include-table or --dest-table must use forward slash delimiters (/) to indicate regular expression processing. For example, to reference the capture group defined in the database string of

--include-table testdb/(\d+)/.myschema/(\d+)/.mytable/(\d+)/

use an option similar to:

--dest-table productiondb/$1/.myschema/$1/.mytable/$1/

Note that the capture group numbering starts at 1 and restarts for each component of the fully-qualified table name (database, schema, and table). Therefore, in the previous example each capture group is referenced with /$1/ in --dest-table. If you run the command in a command shell you must escape the $ with a \. These would be the options from the previous example when running the command in a command shell such as a bash shell.

--include-table testdb/(\d+)/.myschema/(\d+)/.mytable/(\d+)/ --dest-table productiondb/\$1/.myschema/\$1/.mytable/\$1/
If regular expression rules would cause more than one source table name to be remapped to the same destination table, then the --skip-existing, --truncate, --drop, or --append options determine how gpcopy handles subsequent copy requests to the existing table.
{-T | --include-table-file} <table-file>

The location and name of a text file that defines the tables and data to copy. To use multiple files, specify this option for each file.

--include-table-file <<path_to_file1>> --include-table-file <<path_to_file2>>

In the text file, specify a single fully qualified table per line (database.schema.table). You cannot specify views, materialized views, or system catalog tables. For tables that depend on other tables, you must also specify the dependent table.

You can use Go language regular expression syntax to select multiple tables. See the --dest-table option for information about using regular expressions to select tables.

This option cannot be specified with the --full, --dbname, or --include-table options.
--include-table-json <json-table-file>

The location and name of a JSON-format file that defines the tables and data to copy. In contrast to the text file used with --include-table-file, the JSON file can include a destination table name used to change the database, schema, or table into which the table data is copied.

Note: The --include-table-json option is only supported with gpcopy that ships with Greenplum Database 5.20 or later, or with any gpcopy version provided as a separate download from Broadcom Support Portal.

The JSON file that you provide must define one or more objects with key-value pairs to describe a source table, an optional query of the source table data to copy, and an optional destination table to indicate the database where the data is copied. Place the name-value pairs for multiple table objects in a JSON array. For example:

[
     {
         "source": "<database>.<schema>.<table>",
         "sql": "<query>",
         "dest": "<database>.<schema>.<table>"
     },
     {
         ...     }
]

You cannot specify views, materialized views, or system catalog tables as source tables. For tables that depend on other tables, you must also specify the dependent table. See Notes for information about specifying the sql: key with dependent tables.

Any query that you provide must reference a single source table and output to the same columns of the source table.

If the query includes an ORDER BY clause, then the target Greenplum system must be the same size (the same number of segments) as the source system.

Keep in mind that certain characters must be escaped in strings in order for them to be parsed in JSON, specifically:

Character Escape Sequence
Backspace \b
Form Feed \f
Newline \n
Tab \t
Double Quote \"
Backslash \\

If the file cannot be parsed as JSON, gpcopy exits with an error.

You can use Go language regular expression syntax to select multiple tables. Capture groups defined in the source: key can be referenced in the dest: key to change the destination database where the data is copied. For example:

[
     {
         "source": "testdb/(\\d+)/.myschema/(\\d+)/.mytable/(\\d+)/",
         "dest": "productiondb/$1/.myschema/$1/.mytable/$1/"
     },
]

Note: If you change the destination schema or table name, the destination table must exist and have the exact same table structure as the source table; gpcopy does not generate the DDL necessary to create the new destination schema or table when the destination table differs from the source table.

See the --dest-table option for more information about using regular expressions and capture groups.

Note: If you use regular expressions to copy multiple tables, you cannot include the sql: key.

This option cannot be specified with the --full, --dbname, --include-table options. You cannot use this option with --parallelize-leaf-partitions if the JSON file includes a sql: key that queries a partitioned table.
-m | --metadata-only

Create only the schemas specified by the command. Data is not transferred.

If specified with the --full option, gpcopy replicates the complete database schema, including all tables, indexes, materialized views, views, user-defined types (UDT), and user-defined functions (UDF) for the source databases. No data is transferred.

If you specify databases with the --dbname option or tables with the --include-table --include-table-file, or --include-table-json options, gpcopy creates only the tables and indexes. No data is transferred.

This option cannot be used with the --truncate option.
--no-ownership

By default, gpcopy copies owner and privilege information for tables from tables in the source database to tables in the destination database if the role exists in the destination database. If you specify --no-ownership, owner and privilege information is not copied to the destination table. The owner of the tables in the destination database is set to gpadmin.

This option cannot be used with the --full option.
{-e | --exclude-table} <db>.<schema>.<table>

A table from the source database system to exclude from transfer. The fully qualified table name must be specified (<database>.<schema>.<table>).

To exclude multiple tables, specify a comma-separated list of table names.

A set of tables can be specified using the Go language regular expression syntax. See the --include-table option for information about using regular expressions.

Only the specified tables are excluded, not dependent objects. You cannot specify views, materialized views, or system catalog tables.

This option must be specified with one of these options: --full, --dbname, --include-table, --include-table-file, or --include-table-json. If the option --exclude-table results in no table to copy, the database or schema is not created in the destination system.
{-E | --exclude-table-file} <table-file>

The location and name of file containing a list of fully qualified table names to exclude from copying to the destination system. In the text file, specify a single fully qualified table per line (<database>.<schema>.<table>). To specify multiple files, specify this option for each file.

--exclude-table-file <<path_to_file1>> --exclude-table-file <<path_to_file2>>

In the file, a set of tables can be specified using the Go language regular expression syntax. See the --include-table option for information about using regular expressions.

If a source table does not exist, gpcopy displays a warning.

Only the specified tables are excluded. You cannot specify views, materialized views, or system catalog tables.

This option must be specified with one of these options: --full, --dbname, --include-table, --include-table-file, or --include-table-json. If the option --exclude-table results in no tables to copy, the database or schema is not created in the destination system.

Connection Options

The following options specify connection information for the destination and source Greenplum systems. Only --dest-host is required. --jobs, --on-segment-threshold, and --parallelize-leaf-partitions affect the number of simultaneous connections used for data transfer. --data-port-range defines the ports used for data transfer to destination segments or the destination coordinator.

--dest-host <dest_host>
Required. The destination Greenplum Database coordinator segment hostname or IP address.
--dest-mapping-file <host_ip_map_file>

Optional. <host_ip_map_file> is the location and name of a text file that lists all the destination Greenplum Database host names and their corresponding IP addresses (coordinator, standby coordinator, primary and mirror hosts). Each line in the file lists a host and its corresponding IP address in this format:

host, IP

Use this option if for some reason gpcopy cannot resolve the destination host names to their corresponding IP addresses.

--dest-port <dest_port>
The destination Greenplum Database coordinator segment port number. If --dest-port is not specified, then the default is 5432.
--dest-user <dest_user>
The user ID that is used to connect to the destination Greenplum coordinator. If not specified, the default is gpadmin.
--source-host <source_host>
The source Greenplum Database coordinator segment host name or IP address. If not specified, the default host is the system running gpcopy (127.0.0.1).
--source-port <source_port>
The source Greenplum Database coordinator port number. If not specified, the default is 5432.
--source-user <source_user>
The user ID that is used to connect to the source Greenplum Database system. If not specified, the default is gpadmin.
--ssl-cert <cert_file>
(Required when the destination Greenplum Database cluster is configured to use SSL/TLS.) The file system location of the SSL/TLS certificate file. The file must reside in the same location on all hosts (coordinator, segment) in the destination Greenplum Database cluster. You must also provide the SSL/TLS private key ( --ssl-key) when you specify a certificate file.
--ssl-key <key_file>
(Required when the destination Greenplum Database cluster is configured to use SSL/TLS.) The file system location of the SSL/TLS private key file. The file must reside in the same location on all hosts (coordinator, segment) in the destination Greenplum Database cluster. You must also provide the SSL/TLS certificate ( --ssl-cert) when you specify a private key.
--ssl-ca <ca_file>
(Optional.) The file system location of the SSL/TLS certificate authority root cert file. When specified, the file must reside in the same location on all hosts (coordinator, segment) in the source Greenplum Database cluster. gpcopy validates the key and certificate with the root certificate authority.
--ssl-min-tls <min_version_str>
(Optional.) The minimum TLS version to request on the connection. Valid <min_version_str> values are '1.0', '1.1', '1.2', or '1.3'. The default minimum TLS version is 1.0.
--jobs <int>

The maximum number processes that gpcopy runs in parallel. The default is 4. The range is from 1 to 64512.

The option --jobs produces 2*<n>+1 database connections. The default value, 4, creates 9 connections.

By default, this option might affect the copying of partitioned tables. For information about how gpcopy handles partitioned tables, see the --parallelize-leaf-partitions option.

If you increase this option, ensure that Greenplum Database systems are configured with a sufficient maximum concurrent connection value to accommodate the gpcopy connections and other concurrent connections such as user connections. See the Greenplum Database server configuration parameter max_connections.

If --enable-receive-daemon = false and --data-port-range is specified, the number of ports specified by the --data-port-range port range must be greater than or equal to the number of parallel processes created with --jobs.
--snapshot <snapshot_id>
Specifies the identifier of the transaction snapshot in which gpcopy runs the copy operation.
Note

gpcopy does not support the --snapshot option with version 5.x source Greenplum clusters.

{-o | --on-segment-threshold} <int>

Identifies where (source and destination Greenplum Database coordinator or source and destination Greenplum Database segment instances) gpcopy performs the data transfer. Valid values are -2, -1, or a positive integer greater than 0.

The default value is -1, gpcopy copies all table data through the source and destination segment instances.

When you specify the value -2, gpcopy copies all table data through the source and destination Greenplum Database coordinators.

Alternatively, you can specify a positive integer value that identifies a row number threshold. If a table contains this number of rows or less, gpcopy copies the table data using the source and destination coordinators. If the number of rows in a table is more than the threshold, gpcopy copies the data using the source and destination segement instances.

Note: When you specify a positive integer value, gpcopy uses the source table statistics to determine the number of table rows. If a source table is not analyzed, gpcopy assumes that the table is a small table, ignores the threshold setting, and copies the table data using only the coordinators. If your database includes large tables without statistics, set this option to -1 to force gpcopy to copy table data using the Greenplum segment instances.

-p | --parallelize-leaf-partitions
Copy the leaf partition tables of a partitioned table in parallel or as a single table based on the root partition table. The default is true, copy the leaf partition tables in parallel. To copy a partitioned table as a single table, set this option to false.
Note
Note

If you are copying data from a VMware Greenplum system running on RHEL7 for loading to a system running on RHEL8, the default --parallelize-leaf-partitions setting may lead to data being copied to an incorrect partition in range-partitioned tables because of glibc changes between the operating system versions. In this scenario, always set this option to false to copy as a single table based on the root partition table. See Migrating VMware Greenplum from Enterprise Linux 7 to 8 in the VMware Greenplum documentation for more information.

: The --jobs option specifies the maximum number of processes that gpcopy runs in parallel and might affect the number of leaf partitions that are copied in parallel.

If the --validate option is also specified, the utility validates each leaf partition table during the copy process and then validates the entire partitioned table.

This option cannot be specified with --include-table-json if the JSON file includes a sql: key that queries a partitioned table.
--enable-receive-daemon = {true | false}

Controls whether to use a single port when destination hosts receive data from source hosts. gpcopy enables a daemon process on destination hosts to that uses a single port when receiving data from source hosts. The default is true, use a single port.

When a single port is used and --data-port-range is specified, gpcopy chooses a port in the range specified by --data-port-range. Otherwise, gpcopy uses an available port.
--data-port-range <lower_port>-<upper_port>

A range of port numbers to use on Greenplum Database destination hosts for data transfer. This applies to destination segment hosts or, if data is transferred using the coordinator segment, only the coordinator segment host. gpcopy uses the first available port specified in the range (inclusive). <lower_port> must be greater than or equal to 1024 (to avoid reserved system ports), and <upper_port> must be a greater value.

If --data-port-range is not specified, then gpcopy uses any available port.

If this option is specified and --enable-receive-daemon=false, the number of ports specified by the range must be greater than or equal to the number of parallel processes created with --jobs.
--timeout <seconds>
The maximum time in seconds to wait until both source and destination systems are ready for data transfer. The default is 30 seconds. A value of 0 deactivates the timeout.

Options for Configuring how Data is Copied

gpcopy provides additional options that affect the way data is copied between systems. You must use one of these options to specify how to manage data in the destination database: --skip-existing, --truncate, --drop, or --append. Other options can be included as necessary, for example, to perform additional data validation.

--skip-existing

Specify this option to skip copying a table from the source database if the table already exists in the destination database.

At most, only one of the options can be specified: --skip-existing, --truncate, --drop, or --append.
--truncate

Specify this option to truncate the table that is in the destination database if it already exists.

A gpcopy command that specifies the option --truncate does not truncate a table defined with a foreign key in the target database. gpcopy reports an error and continues the copy operation.

At most, only one of the options can be specified: --skip-existing, --truncate, --drop, or --append.
--drop

Specify this option to drop the table that is in the destination database if it already exists. Before copying table data, gpcopy drops the table and creates it again.

If a gpcopy command attempts to copy a view or a table with a dependency (such as a foreign key or table dependency) with this option, the utility reports an error and continues the copy operation.

At most, only one of the options can be specified: --skip-existing, --truncate, --drop, or --append.
--append

Append data to the table in the destination database if it already exists.

At most, only one of the options can be specified: --skip-existing, --truncate, --drop, or --append.
-a | --analyze
Run the ANALYZE command on non-system tables. The default is to not run the ANALYZE command. The operation is performed for each table after the table data is copied.
--no-compression

If specified, data is transferred without compression. By default, gpcopy compresses data during transfer from the source to the destination database when copying data to a different host.

The utility does not compress data when copying data to the same host.
--no-distribution-check

Specify this option to deactivates table data distribution checking. By default, gpcopy performs data distribution checking to ensure data is distributed to segment instances correctly. If distribution checking fails, the table copy fails.

The utility does not support table data distribution checking when copying a partitioned table that is defined with a leaf table that is an external table or if a leaf table is defined with a distribution policy that is different from the root partitioned table.

Caution

Before you perform a gpcopy operation with the --no-distribution-check option, ensure that you have a backup of the destination database and that the distribution policies of the tables that are being copied are the same in the source and destination database. Copying data into segment instances with incorrect data distribution can cause incorrect query results and can cause database corruption.

--truncate-source-after

Specify this option to truncate the table that is in the source database after gpcopy copies the table and validates the table data in the destination database.

If you specify this option, you must also specify the --validate option.

--yes
Optional. Automatic confirmation to truncate source table data after copying and validating table data. The prompt to truncate source tables does not appear. The default is to prompt to confirm truncating source table data after copying and validating the data.
{-v | --validate} <type>

Perform data validation on the table data in the destination database after the table data is copied. These are the supported type of validation.

  • count - compares row counts between source and destination table data.
  • md5xor - calculates the MD5 value of all rows, then performs an XOR over the MD5 values

If you specify the --append option, and the destination table contains data, validation fails for the table.

If validation for a table fails, gpcopy will abort the failed transaction and will continue to copy other tables.

Additional Options

--dry-run

When you specify this option, gpcopy generates a list of the migration operations that would have been performed with the specified options. The data is not migrated.

The information is displayed at the command line and written to the log file.
--quiet

If specified, suppress status messages at the command prompt. The messages are sent only to the log file. Higher level messages such as warning and error messages are still displayed.

This option cannot be specified with the --debug option.
--debug

If specified, debug messages are displayed at the command prompt.

This option cannot be specified with the --quiet option.
--dumper "<utility>" (Beta)

Specifies the utility that gpcopy uses to generate DDL for the copy operation. The default is value is pgdump. As an alternative, you can specify gpbackup. If you specify gpbackup, gpbackup 1.16.0 or later must be installed on your source system.

Caution

The --dumper option is a Beta feature and is not supported in a production environment.

--version
Displays the version of this utility.
-h | --help
Displays the online help.

Notes

If a gpcopy command specifies an invalid option, or specifies a source table or database that does not exist, the utility returns an error and quits. No data is copied.

The source table data can change while the data is being copied. A lock is not acquired on the source table when data is copied.

The utility cannot copy a row with a width greater than 1GB (a PostgreSQL limitation).

If you copy a set of database tables with the --dbname, --include-table, --include-table-file, or --include-table-json options, and the destination database does not exist, the utility creates the database before copying the tables. If the destination database exists, the utility creates the tables in the database if required.

If you use $ in a regular expression in a gpcopy command and run the command in a shell such as a bash shell, you must escape the $ with a \. See the --dest-table option for information about using regular expressions.

gpcopy includes a list of tables and views that were successfully copied to the destination system in the text file gpcopy_<date>_success.list in ~/gpAdminLogs directory on the coordinator host.

If gpcopy fails to copy tables, the utility creates a text file gpcopy_<date>_failure.list that lists the failed tables in the ~/gpAdminLogs directory on the coordinator host. After resolving issues that caused the failures, you can run gpcopy with the --include-table-file option to copy the tables that were not copied.

Copying Dependent Database Objects

The gpcopy utility does not automatically copy dependent database objects unless you specify the --full option.

If you use the --include-table-json option with a table, tbl_a that inherits table, tbl_b, you must specify both tables, tbl_a and tbl_b.

[
     {
         "source": "test.public.tbl_a",
         "dest": "test.public.tbl_a"
     },
     {
         "source": "test.public.tbl_b",
         "dest": "test.public.tbl_b"
     }
]

If you do not use the sql: tag, each table's data is copied from the source tables to the destination tables. This is how gpcopy handles the the sql: tag with dependent tables.

  • If you specify a query in the sql: tag only for tbl_a, gpcopy inserts only the data specified by the query into the data into tbl_a and tbl_b. For tbl_b, gpcopy also copies data that are only in tbl_b (data in tbl_b that is not referenced in tbl_a). Using the SELECT keyword ONLY, you can list the data in tbl_b that is not referenced in tbl_a with this command.

    SELECT * FROM ONLY tbl_b
    
  • If you specify a query in the sql: tag for tbl_b, gpcopy inserts the query data from tbl_b into the destination table, and might also insert data that is referenced in the tbl_a. This might cause issues with duplicated data in the destination table. You can use the ONLY keyword in the query to limit the data inserted into the destination table to only the data not referenced in tbl_a.

A view that is defined to use an external table cannot be copied.

When copying tables, sequences and indexes defined for tables are considered table data and are copied. A sequence is copied if a table is created with a SERIAL column or if a sequence is specified as a default value. A sequence is reset when both --full and --metadata-only are specified.

If a table has a default value on a column that is a user-defined function, that function must exist in the destination system database when using the --dbname, --include-table, --include-table-file, or --include-table-json options.

The utility re-creates table indexes only with --full or --drop options.

Limitations for Materialized Views and Table Dependencies

Materialized view definitions are copied only if the gpcopy option --full is specified. Otherwise, materialized views are ignored. gpcopy does not copy materialized view data, only the materialized view definition. To populate the materialized view with data after it is copied, use REFRESH MATERIALIZED VIEW.

These are gpcopy limitations when copying a view or a table that inherits another table or is defined with a foreign key.

  • gpcopy does not automatically copy the dependent table. The dependent table must be included as part of the gpcopy operation.

    If a dependent table is missing, gpcopy reports an error that lists the object and the missing dependent table and continues the copy operation.

    For example, you use gpcopy to copy table tbl1 that inherits table tbl2, you must include both tables in the copy operation. If you only include tbl1, gpcopy returns an error listing both tbl1 and the missing table tbl2.

    Note: gpcopy checks for tables referenced by foreign keys as a dependency. Foreign keys are not supported by Greenplum Database, however the syntax is allowed when creating a table.

  • If a table inherits from multiple tables, performing a gpcopy operation to a Greenplum Database 5.x database fails with the message pq: tuple concurrently updated. The gpcopy operation is supported only when copying to a Greenplum Database 6.x database.

  • A gpcopy command that specifies the option --drop does not copy a view or a table defined with a foreign key or an inherited table. gpcopy reports an error and continues the copy operation. The --append option is supported.

  • A gpcopy command that specifies the option --truncate does not truncate a table in the target database if the table is defined with a foreign key. gpcopy reports an error and continues the copy operation. The --append option is supported.

Handling External Files and Objects

The gpcopy utility does not copy configuration files such as postgresql.conf and pg_hba.conf. You must set up the destination system configuration separately.

The gpcopy utility does not copy external objects such as Greenplum Database extensions, third party jar files, and shared object files. You must install the external objects separately.

Specifying Table Names with Special Characters

When you list a table with the option --include-table or --exclude-table, and the table name or schema name contains single quote ('), double quote (") , or backslash (\), you must escape the character with a backslash (\). These are the escaped characters: \', \", \\. If the table or schema name contains a period (.), comma (,), or a space character, you must enclose the entire name in single quotes (') and enclose the table and schema in double quotes (") at the command prompt shell. In this example, the fully qualified name of the table is testdb.schema'1.table"test.

--include-table 'testdb."schema\'1"."table\"test"'

For fully qualified table names listed in a file that is used with the options --include-table-file, --include-table-json, or --exclude-table-file, if the table name or schema name contains a period (.), the name must be enclosed in double quotes ("). In this example, the table table.test is in the testdb, and belongs to the schema schema.1.

testdb."schema.1"."table.test"

Copying Partitioned Tables

Before performing a gpcopy operation on a partitioned table that is defined with a foreign key, you must create the partitioned table in the target database.

When copying data for a partitioned table, if a leaf partition has been exchanged with an external table, that leaf partition is created, but data is not copied.

If you specify copying only leaf partitions of a partitioned table with an option such as --include-table, --include-table-file, or --include-table-json, the partitioned table must exist in the destination system. If it does not exist, gpcopy returns an error. You can create the table before you copy the leaf partition data, or you can copy the entire partitioned table.

gpcopy does not support table data distribution checking when copying a partitioned table that is defined with a leaf table that is an external table or if a leaf table is defined with a distribution policy that is different from the root partitioned table. You can copy those tables in a gpcopy operation and specify the option --no-distribution-check to deactivates checking of data distribution.

Caution

Before you perform a gpcopy operation with the --no-distribution-check option, ensure that you have a backup of the destination database and that the distribution policies of the tables that are being copied are the same in the source and destination database. Copying data into segment instances with incorrect data distribution can cause incorrect query results and can cause database corruption.

Handling gpcopy Errors

When gpcopy encounters errors and quits or is cancelled by the user, current copy operations on tables in the destination database are rolled back. Copy operations that have completed are not rolled back.

If an error occurs during the process of copying a table, or table validation fails, gpcopy continues copying the other specified tables. After gpcopy finishes, it displays a list of tables where errors occurred or validation failed and displays a gpcopy command. You can use the provided command to retry copying the failed tables.

The gpcopy utility logs messages in log file gpcopy_<date>.log in the ~/gpAdminLogs directory on the coordinator host. If you run multiple gpcopy commands on the same day, the utility appends messages to that day's log file.

After gpcopy completes, it displays a summary of the operations performed. If the utility fails to copy tables, they are highlighted in summary, and there is a gpcopy command provided in summary for user to just copy the failed tables. The information is displayed at the command prompt and in the gpcopy log file.

After resolving the issues that caused the copy operations to fail, you can run the provided command to copy the tables that failed in the previous gpcopy command.

Database Connections Created by gpcopy

The option --jobs produces 2*<n>+1 database connections. The default value 4, creates 9 connections.

If you increase this option, ensure that both source and destination Greenplum Database is configured with a sufficient maximum concurrent connection value to accommodate the gpcopy connections and other concurrent connections such as user connections. Refer to the Greenplum Database max_connections server configuration parameter documentation for more information.

Examples

This command copies all user created databases in a source system to a destination system with the --full option. And drops the table and creates it again if it already exists in the destination.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
    --dest-host demohost --dest-port 1234 --dest-user gpuser \
    --full --drop

This command copies the specified databases in a source system to a destination system with the --dbname option. The --truncate option truncates the table data before copying table data from the source table.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --dbname database1,database2 --truncate

This command copies the specified tables in a source system to a destination system with the --include-table option. The --skip-existing option skips the table if it already exists in the destination database.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --include-table database.schema.table1,database.schema.table2 --skip-existing

This command copies the tables from the source database to the destination system, excluding the tables specified in the specified tables in /home/gpuser/mytables with --exclude-table-file option. The --truncate option truncates tables that already exist in the destination system. With the options --analyze and --validate count, the utility performs an ANALYZE operation on the copied tables, and validates the copied table data by comparing row counts between source and destination tables.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --dbname database1 --exclude-table-file /home/gpuser/mytables \
   --truncate --analyze --validate count

This command specifies the --full and --metadata-only options to copy the complete database schema, including all tables, indexes, views, user-defined types (UDT), and user-defined functions (UDF) from all the source databases. No data is copied, The --drop option specifies that the table is dropped in the destination database before it is created again if the table exists in both the source and destination database.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --full --metadata-only --drop

This command copies the specified databases in a source system to a destination system with the --dbname option and specifies 8 parallel processes with the --jobs option. The command specifies the --truncate option to truncate the table and create it again if it already exists in the destination database, and uses uses ports in the range 2000-2010 for the parallel process connections.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --dbname database1,database2 --truncate --jobs 8 --data-port-range 2000-2010

This command copies the specified database in a source system to a destination system with the --dbname option and specifies 16 parallel processes with the --jobs option. The --truncate option truncates the table and creates it again if it already exists in the destination database. The --truncate-source-after option truncates the tables in source database after that table data has been validated in the destination database.

gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
   --dest-host demohost --dest-port 1234 --dest-user gpuser \
   --dbname database1 --truncate --jobs 16 --truncate-source-after --validate count

In the previous example, if --truncate was not specified and the destination table contained data, validation would fail.

This is an example table file that uses regular expressions.

"test1.arc/.*/./.*/"
"test1.c/(..)/y./.*/"

In the first line, the regular expressions for the schemas, arc/.*/, and for the tables, /.*/, limit the transfer to all tables with the schema names that start with arc.

In the second line, the regular expressions for the schemas, c/(..)/y, and for the tables, /.*/, limit the transfer to all tables with the schema names that are four characters long and that start with c and end with y, for example, crty.

When the command is run, tables in the database test1 that satisfy either condition are copied to the destination database.

See Also

For information about migrating data with gpcopy, see Migrating Data with gpcopy.

check-circle-line exclamation-circle-line close-line
Scroll to top icon