The S3 storage plugin application lets you use an Amazon Simple Storage Service (Amazon S3) location to store and retrieve backups when you run gpbackup and gprestore. Amazon S3 provides secure, durable, highly-scalable object storage. The S3 plugin streams the backup data from a named pipe (FIFO) directly to the S3 bucket without generating local disk I/O.
The S3 storage plugin can also connect to an Amazon S3 compatible service such as Dell EMC Elastic Cloud Storage, Minio, and Cloudian HyperStore.
Using Amazon S3 to back up and restore data requires an Amazon AWS account with access to the Amazon S3 bucket. These are the Amazon S3 bucket permissions required for backing up and restoring data:
For information about Amazon S3, see Amazon S3. For information about Amazon S3 regions and endpoints, see AWS service endpoints. For information about S3 buckets and folders, see the Amazon S3 documentation.
The S3 storage plugin is included with the VMware Greenplum Backup and Restore release. Use the latest S3 plugin release with the latest VMware Greenplum Backup and Restore, to avoid any incompatibilities.
Open Source Greenplum Backup and Restore customers may get the utility from gpbackup-s3-plugin. Build the utility following the steps in Building and Installing the S3 plugin.
The S3 storage plugin application must be in the same location on every Greenplum Database host, for example $GPHOME/bin/gpbackup_s3_plugin
. The S3 storage plugin requires a configuration file, installed only on the coordinator host.
To use the S3 storage plugin application, specify the location of the plugin, the S3 login credentials, and the backup location in a configuration file. For information about the configuration file, see S3 Storage Plugin Configuration File Format.
When running gpbackup
or gprestore
, specify the configuration file with the option --plugin-config
.
gpbackup --dbname <database-name> --plugin-config /<path-to-config-file>/<s3-config-file>.yaml
When you perform a backup operation using gpbackup
with the --plugin-config
option, you must also specify the --plugin-config
option when restoring with gprestore
.
gprestore --timestamp <YYYYMMDDHHMMSS> --plugin-config /<path-to-config-file>/<s3-config-file>.yaml
The S3 plugin stores the backup files in the S3 bucket, in a location similar to:
<folder>/backups/<datestamp>/<timestamp>
Where folder is the location you specified in the S3 configuration file, and datestamp and timestamp are the backup date and time stamps.
The S3 storage plugin logs are in <gpadmin_home>/gpAdmin/gpbackup_s3_plugin_timestamp.log
on each Greenplum host system. The timestamp format is YYYYMMDDHHMMSS
.
Example
This is an example S3 storage plugin configuration file, named s3-test-config.yaml
, that is used in the next gpbackup
example command.
executablepath: $GPHOME/bin/gpbackup_s3_plugin
options:
region: us-west-2
aws_access_key_id: test-s3-user
aws_secret_access_key: asdf1234asdf
bucket: gpdb-backup
folder: test/backup3
This gpbackup
example backs up the database demo using the S3 storage plugin with absolute path /home/gpadmin/s3-test
.
gpbackup --dbname demo --plugin-config /home/gpadmin/s3-test/s3-test-config.yaml
The S3 storage plugin writes the backup files to this S3 location in the AWS region us-west-2.
gpdb-backup/test/backup3/backups/<YYYYMMDD>/<YYYYMMDDHHMMSS>/
This example restores a specific backup set defined by the 20201206233124
timestamp, using the S3 plugin configuration file.
gprestore --timestamp 20201206233124 --plugin-config /home/gpadmin/s3-test/s3-test-config.yaml
The configuration file specifies the absolute path to the Greenplum Database S3 storage plugin executable, connection credentials, and S3 location.
The S3 storage plugin configuration file uses the YAML 1.1 document format and implements its own schema for specifying the location of the Greenplum Database S3 storage plugin, connection credentials, and S3 location and login information.
The configuration file must be a valid YAML document. The gpbackup
and gprestore
utilities process the control file document in order and use indentation (spaces) to determine the document hierarchy and the relationships of the sections to one another. The use of white space is significant. White space should not be used simply for formatting purposes, and tabs should not be used at all.
This is the structure of a S3 storage plugin configuration file.
executablepath: <absolute-path-to-gpbackup_s3_plugin>
options:
region: <aws-region>
endpoint: <S3-endpoint>
aws_access_key_id: <aws-user-id>
aws_secret_access_key: <aws-user-id-key>
bucket: <s3-bucket>
folder: <s3-location>
encryption: [on|off]
backup_max_concurrent_requests: [int]
# default value is 6
backup_multipart_chunksize: [string]
# default value is 500MB
restore_max_concurrent_requests: [int]
# default value is 6
restore_multipart_chunksize: [string]
# default value is 500MB
http_proxy:
<http://<your_username>:<your_secure_password>@proxy.example.com:proxy_port>
Note: The S3 storage plugin does not support filtered restore operations and the associated restore_subset
plugin configuration property.
$GPHOME/bin/gpbackup_s3_plugin
. The plugin must be in the same location on every Greenplum Database host.
Region
setting on the Minio server side you must set this
region
option to the same value.
Required for an S3 compatible service. Specify this option to connect to an S3 compatible service such as ECS. The plugin connects to the specified S3 endpoint (hostname or IP address) to access the S3 compatible data store.
region
option and does not use AWS to resolve the endpoint. When this option is not specified, the plugin uses the
region
to determine AWS S3 endpoint.
Optional. The S3 ID to access the S3 bucket location that stores backup files.
Required only if you specify aws_access_key_id
. The S3 passcode for the S3 ID to access the S3 bucket location.
If aws_access_key_id
and aws_secret_access_key
are not specified in the configuration file, the S3 plugin uses S3 authentication information from the system environment of the session running the backup operation. The S3 plugin searches for the information in these sources, using the first available source.
AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
.aws configure
.Optional. Activate or deactivate use of Secure Sockets Layer (SSL) when connecting to an S3 location. Default value is on
, use connections that are secured with SSL. Set this option to off
to connect to an S3 compatible service that is not configured to use SSL.
on
or
off
is not accepted.
Optional. The segment concurrency level for a file artifact within a single backup/upload request. The default value is set to 6. Use this parameter in conjuction with the gpbackup --jobs
flag, to increase your overall backup concurrency.
Example: In a 4 node cluster, with 12 segments (3 per node), if the --jobs
flag is set to 10, there could be 120 concurrent backup requests. With the backup_max_concurrent_requests
parameter set to 6, the total S3 concurrent upload threads during a single backup session would reach 720 (120 x 6).
backup_max_concurrent_requests
parameter would not take effect since the file is smaller than the chunk size.
--jobs
flag and the
backup_max_concurrent_requests
parameter to fine tune your backups. Set the chunksize based on your individual segment file size. S3 supports upto 10,000 max total partitions for a single file upload.
restore_max_concurrent_requests
parameter to fine tune your restores.
http://username:[email protected]:proxy_port
or
http://proxy.example.com:proxy_port
.
Parent topic:Using gpbackup Storage Plugins