GPDR Workflows

This topic describes the two types of Greenplum Disaster Recovery workflows -- incremental and continuous -- and helps you decide which is best for you. Each workflow type has its advantages and disadvantages, particularly with respect to storage capacity and backup/restore performance requirements. You should decide early which workflow best meets your needs, and plan accordingly. Once you have chosen one, it is possible, if necessary, to switch to the other later on. See the section Workflow Tradeoffs, below, for help deciding.

As discussed in the Overview topic, point-in-time-recovery (PITR) lets you protect your data in case of disaster through both physical data backups and WAL archive files associated with restore points. Largely, you take backups and then VMware Greenplum (GPDB) replays WAL archive files over the backup data to restore your cluster to the state it was in before the disaster occurred.

Both incremental recovery and continuous recovery workflows use a combination of physical data backups and restore points (and their associated WAL archive files) to ensure data safety in case of a disaster or planned downtime. As discussed in subsequent sections, incremental workflow emphasizes phsyical backups -- full and incremental -- while the continuous workflow emphasizes taking restore points at frequent intervals to increase the probability of getting an exact picture of how your cluster looked before it went down.

Incremental Recovery

In an incremental recovery workflow, you take a full backup and subsequent incremental backups on the primary cluster at a desired frequency. You then restore a particular incremental backup on the recovery cluster simply by performing an incremental restore that passes in the restore point associated with that backup.

Continuous Recovery

In a continuous recovery workflow, you take a full physical backup on the primary cluster and create subsequent snapshots of the cluster state by creating restore points on the primary cluster. Each time you take a restore point, GPDR records snapshots of the cluster state. You create restore points at your desired frequency, which can be as often as every 5 to 10 minutes.

The more restore points you create, the more possible points in time are available to restore from and therefore the higher the probability you will have a restore point that occurs close in time to a disaster. So you can optimize for data safety by creating more restore points.

As part of this workflow, you should frequently run gpdr restore -t continuous --restore-point latest. This command fetches the latest restore point and restores its associated WAL archive files to the recovery cluster. This ensures that the recovery cluster almost constantly keeps up with the recovery cluster.

Note
Once you have promoted the recovery cluster (to make it available for querying or to prepare for it to fail back to being the primary cluster), it is not possible to resume the continuous restore workflow. You must first perform an incremental restore to the restore point required; you can then resume the continuous restore workflow.

Workflow Tradeoffs

VMware generally recommends the continuous workflow over the incremental workflow because it is more lightweight in all respects: taking incremental backups takes up both more storage space and more time than creating restore points.

Creating a restore point on the primary cluster is fast and extremely lightweight compared to a full or incremental backup. As a result, this workflow is generally recommended for disaster recovery use cases whose focus is on having the best RTO and/or RPO.

Backups take more storage space and time. Hence, the user would have to frequently expire older backups that are no longer required. If a large quantity of WAL archive files have accumulated since the last backup it would take a substantial amount of time to restore until the most recent restore point.

The most compelling reason to choose incremental over continuous is if you want your recovery cluster to be available for querying when it is not in active recovery mode. Making the recovery queryable requires promoting the cluster and then switching it back to recovery mode.

Switching a promoted cluster back to recovery mode is more seamless in an incremental workflow than in a continuous workflow.

Workflow Scenarios

This section takes you through three workflow scenarios:

An incremental workflow recovery where only a full backup is taken
An incremental workflow recovery where a full backup and multiple incremental backups are taken
A continuous workflow recovery

Scenario 1: Incremental Recovery with Full Backup Only

The following diagram illustrates the timeline of an incremental workflow in which the user takes a single full backup and no incremental backkups.

Incremental Workflow Timeline Full Backup Only

In this scenario, the user initiates a full backup at t1; GPDR starts copying data files into the repository. At t2, when the backup completes, GPDR creates an implicit restore point, writing a restore point record in the latest WAL archive file.

The physical data files GPDR copies into the repository will represent the database snapshot at t1. All the events that happened between t1 and t2 will be captured in WAL archive files. A WAL archive file consists of sequential WAL records that capture database events in chronological order. WAL archive files have a fixed size. GPDB creates as many of these files as needed to capture changes made to the database.

To restore the backup, GPDR needs both the data files and the WAL archive files. GPDR first restores the backup's data files on the recovery cluster and then VMware Greenplum (GPDB) replays the WAL files over the backup; this means that all the database changes that occurred between t1 and t2 will be sequentially applied, restoring the database to its state at t2. The replay will stop when GPDB reaches the restore point record.

Scenario 2: Incremental Recovery with Full and Incremental Backups

The following diagram illustrates the timeline of an incremental workflow in which the user takes a single full backup followed by three incremental backups.

Incremental Workflow Timeline Full and Incremental Backups

In this scenario, the user initiates a full backup at t1; GPDR starts copying data files into the repository. When the backup completes, GPDR creates an implicit restore point, writing a restore point record in the latest WAL archive file.

The user then takes an incremental backup, at t2, t3, and t4, respectively. At the end of each of them, GPDR creates an implicit restore point associated with the backup.

During a restore:

GPDR will restore physical data from all of the backups and then
GPDB will replay the portion of WAL archive corresponding to incremental backup 3's implicit restore point

Scenario 3: Continuous Recovery

The following diagram illustrates the timeline of a continuous recovery timeline, in which the user takes a single full backup followed by several explicit restore points.

Continuous Workflow Timeline

In this scenario, the user takes a full backup at t1; GPDR creates implicit restore point -- RP1 -- once the backup completes. Over the course of this timeline, the user continues making changes to the database and creates three explicit restore points: RP2, RP3, and RP4.

During a restore, the WAL archive files corresponding to all four of the restore points depicted here will be replayed over the backup that ended at the beginning of the timeline.

Note
You cannot create explicit restore points before the full backup completes, because the physical data backed up must be available in the repository before you create explicit restore points.