The gpcopy
utility copies objects from databases in a source Greenplum Database system to databases in a destination Greenplum Database system.
gpcopy
{ {-F | --full} |
{ {-d | --dbname} <database1>[,<database2> ... ]
{-D | --dest-dbname} <dest-db1>[,<dest-db2> ... ] ] } |
{-t | --include-table} <db>.<schema>.<table>[,<db>.<schema1>.<table1> ... ]
[ --dest-table <db>.<schema>.<table>[,<db>.<schema1>.<table1> ... ] |
{-T | --include-table-file} <table-file1>
[{-T | --include-table-file} <table-file2>] ... ] |
--include-table-json <json-table-file1>
[ --include-table-json <json-table-file2>] ... ] }
[ {-m | --metadata-only} ]
[ --no-ownership ]
[ {-e | --exclude-table} <db>.<schema>.<table>[,<db>.<schema1>.<table1> ... ] ]
[ {-E | --exclude-table-file} <table-file1> ]
[ {-E | --exclude-table-file} <table-file2> ] ... ] ]
{ --dest-host <dest_host> [ --dest-port <dest_port> ]
[ --dest-user <dest_user> ] [ --dest-mapping-file <host_ip_map_file> ]}
[ --source-host <source_host> [ --source-port <source_port> ]
[ --source-user <source_user> ] ]
[ --enable-receive-daemon = {true | false }]
[ --jobs <int> ]
[ --snapshot <snapshot_id> ]
[ {-o | --on-segment-threshold} <int> ]
[ {-p | --parallelize-leaf-partitions} = { true | false} ]
[ --data-port-range <lower_port>-<upper_port> ]
[ --ssl-cert <cert_file> --ssl-key <key_file>
[ --ssl-ca <ca_file> ] [ --ssl-min-tls <min_version_str> ] ]
{ --skip-existing | --truncate | --drop | --append }
[ {-a | --analyze} ]
[ --no-compression ]
[ --no-distribution-check ]
[ --truncate-source-after [--yes ] ]
[ {-v | --validate} <type> ]
[ --dry-run ]
[ --timeout <seconds> ]
[ --dumper "<utility>" ]
[ --quiet | --debug ]
gpcopy --version
gpcopy {-h | --help}
The gpcopy
utility copies database objects from a source Greenplum Database system to a destination system. You can perform one of the following types of copy operations:
--full
option. This option copies all database objects including, tables, table data, indexes, views, users, roles, functions, and resource queues for all user-defined databases to a different destination system.--dbname
option copies all user-defined tables, table data, and re-creates the table indexes from specified databases.--include-table
--include-table-file
, or --include-table-json
option copies a specified set of user-defined tables, table data, and re-creates the table indexes.--exclude-table
and --exclude-table-file
options exclude a specified set of user-defined tables and table data to be copied.--metadata-only
option.When running gpcopy
, you must specify the data to copy from the source database and how to manage data in the destination database.
You must use one and only one of these options to specify the data to be copied from the source database: --full
, --dbname
, --include-table
, --include-table-file
, or --include-table-json
.
You must use one of these options to specify how to manage data in the destination database: --skip-existing
, --truncate
, --drop
, or --append
.
If you specify both the --append
and --validate
options, validation of source table data fails if a destination table contains data.
If you specify the option --truncate-source-after
, you must also specify the --validate
option. When --truncate-source-after
is specified, gpcopy
truncates the source table after the table data is copied and destination table data has been validated.
The user IDs connecting to the source and destination Greenplum Database systems, the --source-user
and --dest-user
, must have appropriate access to the systems.
If Kerberos authentication is enabled for Greenplum Database, gpcopy
can authenticate with Kerberos. Run the kinit
command to obtain a ticket-granting ticket from the KDC server before you run gpcopy
. The PGKRBSRVNAME
environment variable specifies the Kerberos service name for Greenplum Database. If your Greenplum Database service name is different than the default (postgres
) set the PGKRBSRVNAME
environment variable with the correct service name before you run gpcopy
. Refer to the Greenplum Database Using Kerberos Authentication documentation for information about enabling Kerberos authentication with Greenplum Database.
If your source or destination Greenplum Database cluster is configured to use password authentication, refer to About Connecting Using Password Authentication for more information about configuring gpcopy
for this authentication method.
If your source or destination Greenplum Database cluster is configured to use SSL/TLS encryption, refer to About Connecting Using SSL/TLS for more information about configuring gpcopy
for this connection type.
If you want gpcopy
to use SSL/TLS encryption during data transfer, you must direct gpcopy
to use this encryption method. Refer to About SSL/TLS Encryption on the Data Channel for more information.
If you want gpcopy
to perform the copy operation at a specific snapshot, you must start the transaction on the source Greenplum cluster before you invoke the command. Refer to About Specifying a Transaction Snapshot for more information.
The source and destination Greenplum Database segment hosts need to be able to communicate with each other. To ensure that the segment hosts can communicate, you can use a tool such as the Linux netperf
utility.
When the --full
option is specified, resource groups and table spaces are copied, however, the utility does not configure the destination system. For example, you must configure the system to use resource groups and create the host directories for the tablespaces.
gpcopy
provides a range of options to define the scope of data that is copied. You can choose options to perform a full Greenplum system migration, copy specific databases or tables, or only portions of a table using an SQL query. Additional options enable you to exclude certain tables from being copied, or to change the destination database, schema, or table name into which a table's data is copied. The special --metadata-only
instructs gpcopy
to create the necessary schema for the selected source tables, but copy no table data.
You must use at least one of the options --full
, --dbname
, --include-table
, --include-table-file
, or --include-table-json
. Use other options as needed to exclude data from the copy or to change the destination database, schema, or table for copied table data.
For information about how gpcopy
handles dependent database objects when copying a specific database or a set of tables, see Notes.
This option performs a migration of a Greenplum Database source system to a destination system.
A migration copies all database objects including, tables, indexes, views, materialized view definitions, roles, functions, user-defined types (UDT), resource queues, and resource groups for all user-defined databases. The default databases, postgres
, template0
, and template1
, are not copied.
--dbname
,
--include-table
,
--include-table-file
, or
--include-table-json
options.
A source database to copy. To copy multiple databases to the destination system, specify a comma-separated list of databases with no spaces between the names. All the user-defined tables and table data are copied to the destination system.
If the source database does not exist, gpcopy
returns an error and quits. If a destination database does not exist a database is created.
Not valid with the --full
, --include-table
, --include-table-file
, or --include-table-json
options.
--include-table
,
--include-table-file
, or
--include-table-json
option.
To copy a database to a different destination database, specify the name of the destination database. For multiple databases, specify a comma-separated list of databases with no spaces between the names. The number of database names must match the number of names specified in the --dbname
option. The utility copies the source databases to the destination databases in the listed order. In this example, db1
is copied to destdb1
, db2
is copied to destdb2
, and db3
is copied to db3
.
gpcopy --dest-host mdw-2 --dbname=db1,db2,db3 --dest-dbname=destdb1,destdb2,db3 --drop
If the source database does not exist, gpcopy
returns an error and quits. If a destination database does not exist a database is created.
--dbname
option.
One or more tables from the source database system to copy. You must provide fully-qualified table names (<database>.<schema>.<table>). You cannot specify materialized views or system catalog tables. For tables that depend on other tables, you must also specify the dependent table.
To copy multiple tables, include a comma-separated list of table names, no spaces, or use regular expressions to describe a set of tables. You can optionally use --dest-table
to change the database, schema, or table name into which table data from --include-table
are copied.
You can use Go language regular expressions in the database, schema, and table portions of the fully-qualified table name to define a set of input tables. The regular expression pattern must be enclosed in slashes (/<RE_pattern>/
). For example, --include-table mytest.public.demo/.*/
specifies all tables that begin with demo
in the mytest
database in the public
schema.
The following two examples for the --include-table
option are equivalent. They both specify a set of tables that begins with demo and ends with zero or more digits.
--include-table testdb.schema1.demo/[0-9]*/
--include-table testdb.schema1./demo[0-9]*/
Regular expression capture groups in the database portion of the fully-qualified name can be referenced in --dest-table
to change the destination database, schema, or table name for table data.
Note: If you change the destination schema or table name, the destination table must exist and have the exact same table structure as the source table; gpcopy
does not generate the DDL necessary to create the new destination schema or table when the destination table differs from the source table.
If the source table does not exist, gpcopy
returns an error and quits.
If the destination table or database does not exist, it is created. Only the table and table data are copied, not dependent objects. Indexes are re-created only if the --drop
option is specified. Dependent objects are not copied.
--full
,
--dbname
,
--include-table-file
, or
--include-table-json
.
(Optional.) Changes the database, schema, or table where data from tables defined with --include-table
are copied.
Note: If you change the destination schema or table name, the destination table must exist and have the exact same table structure as the source table; gpcopy
does not generate the DDL necessary to create the new destination schema or table when the destination table differs from the source table.
You must provide fully-qualified table names (<database>.<schema>.<table>). Specify multiple tables using either a comma-separated list, no spaces, or by referencing regular expression capture groups that were defined with --include-table
. If you use a comma-separated list of tables with --include-table
, use the same number position in the --dest-table
list to change the destination database, schema, or table for the corresponding table data. For example, to move only the second table provided with an option like
--include-table mytest.public.table1,mytest.public.table2
use an option similar to:
--dest-table production.public.table1,production.public.table2
If you use Go language regular expressions capture groups in --include-table
to define a set of tables, you can reference the capture groups in --dest-table
to rename the destination database. Capture groups referenced in --include-table
or --dest-table
must use forward slash delimiters (/) to indicate regular expression processing. For example, to reference the capture group defined in the database string of
--include-table testdb/(\d+)/.myschema/(\d+)/.mytable/(\d+)/
use an option similar to:
--dest-table productiondb/$1/.myschema/$1/.mytable/$1/
Note that the capture group numbering starts at 1 and restarts for each component of the fully-qualified table name (database, schema, and table). Therefore, in the previous example each capture group is referenced with /$1/
in --dest-table
. If you run the command in a command shell you must escape the $
with a \
. These would be the options from the previous example when running the command in a command shell such as a bash shell.
--include-table testdb/(\d+)/.myschema/(\d+)/.mytable/(\d+)/ --dest-table productiondb/\$1/.myschema/\$1/.mytable/\$1/
--skip-existing
,
--truncate
,
--drop
, or
--append
options determine how
gpcopy
handles subsequent copy requests to the existing table.
The location and name of a text file that defines the tables and data to copy. To use multiple files, specify this option for each file.
--include-table-file <<path_to_file1>> --include-table-file <<path_to_file2>>
In the text file, specify a single fully qualified table per line (database.schema.table). You cannot specify views, materialized views, or system catalog tables. For tables that depend on other tables, you must also specify the dependent table.
You can use Go language regular expression syntax to select multiple tables. See the --dest-table option for information about using regular expressions to select tables.
--full
,
--dbname
, or
--include-table
options.
The location and name of a JSON-format file that defines the tables and data to copy. In contrast to the text file used with --include-table-file
, the JSON file can include a destination table name used to change the database, schema, or table into which the table data is copied.
Note: The --include-table-json
option is only supported with gpcopy
that ships with Greenplum Database 5.20 or later, or with any gpcopy
version provided as a separate download from Broadcom Support Portal.
The JSON file that you provide must define one or more objects with key-value pairs to describe a source table, an optional query of the source table data to copy, and an optional destination table to indicate the database where the data is copied. Place the name-value pairs for multiple table objects in a JSON array. For example:
[
{
"source": "<database>.<schema>.<table>",
"sql": "<query>",
"dest": "<database>.<schema>.<table>"
},
{
... }
]
You cannot specify views, materialized views, or system catalog tables as source tables. For tables that depend on other tables, you must also specify the dependent table. See Notes for information about specifying the sql:
key with dependent tables.
Any query that you provide must reference a single source table and output to the same columns of the source table.
If the query includes an ORDER BY
clause, then the target Greenplum system must be the same size (the same number of segments) as the source system.
Keep in mind that certain characters must be escaped in strings in order for them to be parsed in JSON, specifically:
Character | Escape Sequence |
---|---|
Backspace | \b |
Form Feed | \f |
Newline | \n |
Tab | \t |
Double Quote | \" |
Backslash | \\ |
If the file cannot be parsed as JSON, gpcopy
exits with an error.
You can use Go language regular expression syntax to select multiple tables. Capture groups defined in the source:
key can be referenced in the dest:
key to change the destination database where the data is copied. For example:
[
{
"source": "testdb/(\\d+)/.myschema/(\\d+)/.mytable/(\\d+)/",
"dest": "productiondb/$1/.myschema/$1/.mytable/$1/"
},
]
Note: If you change the destination schema or table name, the destination table must exist and have the exact same table structure as the source table; gpcopy
does not generate the DDL necessary to create the new destination schema or table when the destination table differs from the source table.
See the --dest-table option for more information about using regular expressions and capture groups.
Note: If you use regular expressions to copy multiple tables, you cannot include the sql:
key.
--full
,
--dbname
,
--include-table
options. You cannot use this option with
--parallelize-leaf-partitions
if the JSON file includes a
sql:
key that queries a partitioned table.
Create only the schemas specified by the command. Data is not transferred.
If specified with the --full
option, gpcopy
replicates the complete database schema, including all tables, indexes, materialized views, views, user-defined types (UDT), and user-defined functions (UDF) for the source databases. No data is transferred.
If you specify databases with the --dbname
option or tables with the --include-table
--include-table-file
, or --include-table-json
options, gpcopy
creates only the tables and indexes. No data is transferred.
--truncate
option.
By default, gpcopy
copies owner and privilege information for tables from tables in the source database to tables in the destination database if the role exists in the destination database. If you specify --no-ownership
, owner and privilege information is not copied to the destination table. The owner of the tables in the destination database is set to gpadmin
.
--full
option.
A table from the source database system to exclude from transfer. The fully qualified table name must be specified (<database>.<schema>.<table>).
To exclude multiple tables, specify a comma-separated list of table names.
A set of tables can be specified using the Go language regular expression syntax. See the --include-table option for information about using regular expressions.
Only the specified tables are excluded, not dependent objects. You cannot specify views, materialized views, or system catalog tables.
--full
,
--dbname
,
--include-table
,
--include-table-file
, or
--include-table-json
. If the option
--exclude-table
results in no table to copy, the database or schema is not created in the destination system.
The location and name of file containing a list of fully qualified table names to exclude from copying to the destination system. In the text file, specify a single fully qualified table per line (<database>.<schema>.<table>). To specify multiple files, specify this option for each file.
--exclude-table-file <<path_to_file1>> --exclude-table-file <<path_to_file2>>
In the file, a set of tables can be specified using the Go language regular expression syntax. See the --include-table option for information about using regular expressions.
If a source table does not exist, gpcopy
displays a warning.
Only the specified tables are excluded. You cannot specify views, materialized views, or system catalog tables.
--full
,
--dbname
,
--include-table
,
--include-table-file
, or
--include-table-json
. If the option
--exclude-table
results in no tables to copy, the database or schema is not created in the destination system.
The following options specify connection information for the destination and source Greenplum systems. Only --dest-host
is required. --jobs
, --on-segment-threshold
, and --parallelize-leaf-partitions
affect the number of simultaneous connections used for data transfer. --data-port-range
defines the ports used for data transfer to destination segments or the destination coordinator.
Optional. <host_ip_map_file> is the location and name of a text file that lists all the destination Greenplum Database host names and their corresponding IP addresses (coordinator, standby coordinator, primary and mirror hosts). Each line in the file lists a host and its corresponding IP address in this format:
host, IP
Use this option if for some reason gpcopy
cannot resolve the destination host names to their corresponding IP addresses.
--dest-port
is not specified, then the default is 5432.
gpcopy
(127.0.0.1).
--ssl-key
) when you specify a certificate file.
--ssl-cert
) when you specify a private key.
gpcopy
validates the key and certificate with the root certificate authority.
<min_version_str>
values are
'1.0'
,
'1.1'
,
'1.2'
, or
'1.3'
. The default minimum TLS version is
1.0
.
The maximum number processes that gpcopy
runs in parallel. The default is 4. The range is from 1 to 64512.
The option --jobs
produces 2*<n>+1
database connections. The default value, 4, creates 9 connections.
By default, this option might affect the copying of partitioned tables. For information about how gpcopy
handles partitioned tables, see the --parallelize-leaf-partitions
option.
If you increase this option, ensure that Greenplum Database systems are configured with a sufficient maximum concurrent connection value to accommodate the gpcopy
connections and other concurrent connections such as user connections. See the Greenplum Database server configuration parameter max_connections
.
--enable-receive-daemon = false
and
--data-port-range
is specified, the number of ports specified by the
--data-port-range
port range must be greater than or equal to the number of parallel processes created with
--jobs
.
gpcopy
runs the copy operation.
Note
gpcopy
does not support the--snapshot
option with version 5.x source Greenplum clusters.
Identifies where (source and destination Greenplum Database coordinator or source and destination Greenplum Database segment instances) gpcopy
performs the data transfer. Valid values are -2
, -1
, or a positive integer greater than 0.
The default value is -1
, gpcopy
copies all table data through the source and destination segment instances.
When you specify the value -2
, gpcopy
copies all table data through the source and destination Greenplum Database coordinators.
Alternatively, you can specify a positive integer value that identifies a row number threshold. If a table contains this number of rows or less, gpcopy
copies the table data using the source and destination coordinators. If the number of rows in a table is more than the threshold, gpcopy
copies the data using the source and destination segement instances.
Note: When you specify a positive integer value,
gpcopy
uses the source table statistics to determine the number of table rows. If a source table is not analyzed,gpcopy
assumes that the table is a small table, ignores the threshold setting, and copies the table data using only the coordinators. If your database includes large tables without statistics, set this option to-1
to forcegpcopy
to copy table data using the Greenplum segment instances.
true
, copy the leaf partition tables in parallel. To copy a partitioned table as a single table, set this option to
false
.
NoteNoteIf you are copying data from a VMware Greenplum system running on RHEL7 for loading to a system running on RHEL8, the default
--parallelize-leaf-partitions
setting may lead to data being copied to an incorrect partition in range-partitioned tables because ofglibc
changes between the operating system versions. In this scenario, always set this option tofalse
to copy as a single table based on the root partition table. See Migrating VMware Greenplum from Enterprise Linux 7 to 8 in the VMware Greenplum documentation for more information.
--jobs
option specifies the maximum number of processes that
gpcopy
runs in parallel and might affect the number of leaf partitions that are copied in parallel.
If the --validate
option is also specified, the utility validates each leaf partition table during the copy process and then validates the entire partitioned table.
--include-table-json
if the JSON file includes a
sql:
key that queries a partitioned table.
Controls whether to use a single port when destination hosts receive data from source hosts. gpcopy
enables a daemon process on destination hosts to that uses a single port when receiving data from source hosts. The default is true
, use a single port.
--data-port-range
is specified,
gpcopy
chooses a port in the range specified by
--data-port-range
. Otherwise,
gpcopy
uses an available port.
A range of port numbers to use on Greenplum Database destination hosts for data transfer. This applies to destination segment hosts or, if data is transferred using the coordinator segment, only the coordinator segment host. gpcopy
uses the first available port specified in the range (inclusive). <lower_port> must be greater than or equal to 1024 (to avoid reserved system ports), and <upper_port> must be a greater value.
If --data-port-range
is not specified, then gpcopy
uses any available port.
--enable-receive-daemon=false
, the number of ports specified by the range must be greater than or equal to the number of parallel processes created with
--jobs
.
gpcopy
provides additional options that affect the way data is copied between systems. You must use one of these options to specify how to manage data in the destination database: --skip-existing
, --truncate
, --drop
, or --append
. Other options can be included as necessary, for example, to perform additional data validation.
Specify this option to skip copying a table from the source database if the table already exists in the destination database.
--skip-existing
,
--truncate
,
--drop
, or
--append
.
Specify this option to truncate the table that is in the destination database if it already exists.
A gpcopy
command that specifies the option --truncate
does not truncate a table defined with a foreign key in the target database. gpcopy
reports an error and continues the copy operation.
--skip-existing
,
--truncate
,
--drop
, or
--append
.
Specify this option to drop the table that is in the destination database if it already exists. Before copying table data, gpcopy
drops the table and creates it again.
If a gpcopy
command attempts to copy a view or a table with a dependency (such as a foreign key or table dependency) with this option, the utility reports an error and continues the copy operation.
--skip-existing
,
--truncate
,
--drop
, or
--append
.
Append data to the table in the destination database if it already exists.
--skip-existing
,
--truncate
,
--drop
, or
--append
.
ANALYZE
command on non-system tables. The default is to not run the
ANALYZE
command. The operation is performed for each table after the table data is copied.
If specified, data is transferred without compression. By default, gpcopy
compresses data during transfer from the source to the destination database when copying data to a different host.
Specify this option to deactivates table data distribution checking. By default, gpcopy
performs data distribution checking to ensure data is distributed to segment instances correctly. If distribution checking fails, the table copy fails.
The utility does not support table data distribution checking when copying a partitioned table that is defined with a leaf table that is an external table or if a leaf table is defined with a distribution policy that is different from the root partitioned table.
CautionBefore you perform a
gpcopy
operation with the--no-distribution-check
option, ensure that you have a backup of the destination database and that the distribution policies of the tables that are being copied are the same in the source and destination database. Copying data into segment instances with incorrect data distribution can cause incorrect query results and can cause database corruption.
Specify this option to truncate the table that is in the source database after gpcopy
copies the table and validates the table data in the destination database.
If you specify this option, you must also specify the --validate
option.
Perform data validation on the table data in the destination database after the table data is copied. These are the supported type of validation.
count
- compares row counts between source and destination table data.md5xor
- calculates the MD5 value of all rows, then performs an XOR over the MD5 valuesIf you specify the --append
option, and the destination table contains data, validation fails for the table.
gpcopy
will abort the failed transaction and will continue to copy other tables.
When you specify this option, gpcopy
generates a list of the migration operations that would have been performed with the specified options. The data is not migrated.
If specified, suppress status messages at the command prompt. The messages are sent only to the log file. Higher level messages such as warning and error messages are still displayed.
--debug
option.
If specified, debug messages are displayed at the command prompt.
--quiet
option.
Specifies the utility that gpcopy
uses to generate DDL for the copy operation. The default is value is pgdump
. As an alternative, you can specify gpbackup
. If you specify gpbackup
, gpbackup
1.16.0 or later must be installed on your source system.
CautionThe
--dumper
option is a Beta feature and is not supported in a production environment.
If a gpcopy
command specifies an invalid option, or specifies a source table or database that does not exist, the utility returns an error and quits. No data is copied.
The source table data can change while the data is being copied. A lock is not acquired on the source table when data is copied.
The utility cannot copy a row with a width greater than 1GB (a PostgreSQL limitation).
If you copy a set of database tables with the --dbname
, --include-table
, --include-table-file
, or --include-table-json
options, and the destination database does not exist, the utility creates the database before copying the tables. If the destination database exists, the utility creates the tables in the database if required.
If you use $
in a regular expression in a gpcopy
command and run the command in a shell such as a bash shell, you must escape the $
with a \
. See the --dest-table option for information about using regular expressions.
gpcopy
includes a list of tables and views that were successfully copied to the destination system in the text file gpcopy_<date>_success.list
in ~/gpAdminLogs
directory on the coordinator host.
If gpcopy
fails to copy tables, the utility creates a text file gpcopy_<date>_failure.list
that lists the failed tables in the ~/gpAdminLogs
directory on the coordinator host. After resolving issues that caused the failures, you can run gpcopy
with the --include-table-file
option to copy the tables that were not copied.
Copying Dependent Database Objects
The gpcopy
utility does not automatically copy dependent database objects unless you specify the --full
option.
If you use the --include-table-json
option with a table, tbl_a
that inherits table, tbl_b
, you must specify both tables, tbl_a
and tbl_b
.
[
{
"source": "test.public.tbl_a",
"dest": "test.public.tbl_a"
},
{
"source": "test.public.tbl_b",
"dest": "test.public.tbl_b"
}
]
If you do not use the sql:
tag, each table's data is copied from the source tables to the destination tables. This is how gpcopy
handles the the sql:
tag with dependent tables.
If you specify a query in the sql:
tag only for tbl_a
, gpcopy
inserts only the data specified by the query into the data into tbl_a
and tbl_b
. For tbl_b
, gpcopy
also copies data that are only in tbl_b
(data in tbl_b
that is not referenced in tbl_a
). Using the SELECT
keyword ONLY
, you can list the data in tbl_b
that is not referenced in tbl_a
with this command.
SELECT * FROM ONLY tbl_b
If you specify a query in the sql:
tag for tbl_b
, gpcopy
inserts the query data from tbl_b
into the destination table, and might also insert data that is referenced in the tbl_a
. This might cause issues with duplicated data in the destination table. You can use the ONLY
keyword in the query to limit the data inserted into the destination table to only the data not referenced in tbl_a
.
A view that is defined to use an external table cannot be copied.
When copying tables, sequences and indexes defined for tables are considered table data and are copied. A sequence is copied if a table is created with a SERIAL
column or if a sequence is specified as a default value. A sequence is reset when both --full
and --metadata-only
are specified.
If a table has a default value on a column that is a user-defined function, that function must exist in the destination system database when using the --dbname
, --include-table
, --include-table-file
, or --include-table-json
options.
The utility re-creates table indexes only with --full
or --drop
options.
Limitations for Materialized Views and Table Dependencies
Materialized view definitions are copied only if the gpcopy
option --full
is specified. Otherwise, materialized views are ignored. gpcopy
does not copy materialized view data, only the materialized view definition. To populate the materialized view with data after it is copied, use REFRESH MATERIALIZED VIEW.
These are gpcopy
limitations when copying a view or a table that inherits another table or is defined with a foreign key.
gpcopy
does not automatically copy the dependent table. The dependent table must be included as part of the gpcopy
operation.
If a dependent table is missing, gpcopy
reports an error that lists the object and the missing dependent table and continues the copy operation.
For example, you use gpcopy
to copy table tbl1
that inherits table tbl2
, you must include both tables in the copy operation. If you only include tbl1
, gpcopy
returns an error listing both tbl1
and the missing table tbl2
.
Note: gpcopy
checks for tables referenced by foreign keys as a dependency. Foreign keys are not supported by Greenplum Database, however the syntax is allowed when creating a table.
If a table inherits from multiple tables, performing a gpcopy
operation to a Greenplum Database 5.x database fails with the message pq: tuple concurrently updated
. The gpcopy
operation is supported only when copying to a Greenplum Database 6.x database.
A gpcopy
command that specifies the option --drop
does not copy a view or a table defined with a foreign key or an inherited table. gpcopy
reports an error and continues the copy operation. The --append
option is supported.
A gpcopy
command that specifies the option --truncate
does not truncate a table in the target database if the table is defined with a foreign key. gpcopy
reports an error and continues the copy operation. The --append
option is supported.
Handling External Files and Objects
The gpcopy
utility does not copy configuration files such as postgresql.conf
and pg_hba.conf
. You must set up the destination system configuration separately.
The gpcopy
utility does not copy external objects such as Greenplum Database extensions, third party jar files, and shared object files. You must install the external objects separately.
Specifying Table Names with Special Characters
When you list a table with the option --include-table
or --exclude-table
, and the table name or schema name contains single quote ('
), double quote ("
) , or backslash (\
), you must escape the character with a backslash (\). These are the escaped characters: \'
, \"
, \\
. If the table or schema name contains a period (.
), comma (,
), or a space character, you must enclose the entire name in single quotes ('
) and enclose the table and schema in double quotes ("
) at the command prompt shell. In this example, the fully qualified name of the table is testdb.schema'1.table"test
.
--include-table 'testdb."schema\'1"."table\"test"'
For fully qualified table names listed in a file that is used with the options --include-table-file
, --include-table-json
, or --exclude-table-file
, if the table name or schema name contains a period (.
), the name must be enclosed in double quotes ("). In this example, the table table.test
is in the testdb
, and belongs to the schema schema.1
.
testdb."schema.1"."table.test"
Copying Partitioned Tables
Before performing a gpcopy
operation on a partitioned table that is defined with a foreign key, you must create the partitioned table in the target database.
When copying data for a partitioned table, if a leaf partition has been exchanged with an external table, that leaf partition is created, but data is not copied.
If you specify copying only leaf partitions of a partitioned table with an option such as --include-table
, --include-table-file
, or --include-table-json
, the partitioned table must exist in the destination system. If it does not exist, gpcopy
returns an error. You can create the table before you copy the leaf partition data, or you can copy the entire partitioned table.
gpcopy
does not support table data distribution checking when copying a partitioned table that is defined with a leaf table that is an external table or if a leaf table is defined with a distribution policy that is different from the root partitioned table. You can copy those tables in a gpcopy
operation and specify the option --no-distribution-check
to deactivates checking of data distribution.
CautionBefore you perform a
gpcopy
operation with the--no-distribution-check
option, ensure that you have a backup of the destination database and that the distribution policies of the tables that are being copied are the same in the source and destination database. Copying data into segment instances with incorrect data distribution can cause incorrect query results and can cause database corruption.
Handling gpcopy Errors
When gpcopy
encounters errors and quits or is cancelled by the user, current copy operations on tables in the destination database are rolled back. Copy operations that have completed are not rolled back.
If an error occurs during the process of copying a table, or table validation fails, gpcopy
continues copying the other specified tables. After gpcopy
finishes, it displays a list of tables where errors occurred or validation failed and displays a gpcopy
command. You can use the provided command to retry copying the failed tables.
The gpcopy
utility logs messages in log file gpcopy_<date>.log
in the ~/gpAdminLogs
directory on the coordinator host. If you run multiple gpcopy
commands on the same day, the utility appends messages to that day's log file.
After gpcopy
completes, it displays a summary of the operations performed. If the utility fails to copy tables, they are highlighted in summary, and there is a gpcopy
command provided in summary for user to just copy the failed tables. The information is displayed at the command prompt and in the gpcopy
log file.
After resolving the issues that caused the copy operations to fail, you can run the provided command to copy the tables that failed in the previous gpcopy
command.
Database Connections Created by gpcopy
The option --jobs
produces 2*<n>+1
database connections. The default value 4, creates 9 connections.
If you increase this option, ensure that both source and destination Greenplum Database is configured with a sufficient maximum concurrent connection value to accommodate the gpcopy
connections and other concurrent connections such as user connections. Refer to the Greenplum Database max_connections server configuration parameter documentation for more information.
This command copies all user created databases in a source system to a destination system with the --full
option. And drops the table and creates it again if it already exists in the destination.
gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--full --drop
This command copies the specified databases in a source system to a destination system with the --dbname
option. The --truncate
option truncates the table data before copying table data from the source table.
gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--dbname database1,database2 --truncate
This command copies the specified tables in a source system to a destination system with the --include-table
option. The --skip-existing
option skips the table if it already exists in the destination database.
gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--include-table database.schema.table1,database.schema.table2 --skip-existing
This command copies the tables from the source database to the destination system, excluding the tables specified in the specified tables in /home/gpuser/mytables
with --exclude-table-file
option. The --truncate
option truncates tables that already exist in the destination system. With the options --analyze
and --validate count
, the utility performs an ANALYZE operation on the copied tables, and validates the copied table data by comparing row counts between source and destination tables.
gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--dbname database1 --exclude-table-file /home/gpuser/mytables \
--truncate --analyze --validate count
This command specifies the --full
and --metadata-only
options to copy the complete database schema, including all tables, indexes, views, user-defined types (UDT), and user-defined functions (UDF) from all the source databases. No data is copied, The --drop
option specifies that the table is dropped in the destination database before it is created again if the table exists in both the source and destination database.
gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--full --metadata-only --drop
This command copies the specified databases in a source system to a destination system with the --dbname
option and specifies 8 parallel processes with the --jobs
option. The command specifies the --truncate
option to truncate the table and create it again if it already exists in the destination database, and uses uses ports in the range 2000-2010 for the parallel process connections.
gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--dbname database1,database2 --truncate --jobs 8 --data-port-range 2000-2010
This command copies the specified database in a source system to a destination system with the --dbname
option and specifies 16 parallel processes with the --jobs
option. The --truncate
option truncates the table and creates it again if it already exists in the destination database. The --truncate-source-after
option truncates the tables in source database after that table data has been validated in the destination database.
gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--dbname database1 --truncate --jobs 16 --truncate-source-after --validate count
In the previous example, if --truncate
was not specified and the destination table contained data, validation would fail.
This is an example table file that uses regular expressions.
"test1.arc/.*/./.*/"
"test1.c/(..)/y./.*/"
In the first line, the regular expressions for the schemas, arc/.*/
, and for the tables, /.*/
, limit the transfer to all tables with the schema names that start with arc
.
In the second line, the regular expressions for the schemas, c/(..)/y
, and for the tables, /.*/
, limit the transfer to all tables with the schema names that are four characters long and that start with c
and end with y
, for example, crty
.
When the command is run, tables in the database test1
that satisfy either condition are copied to the destination database.
For information about migrating data with gpcopy
, see Migrating Data with gpcopy.