An example of a test failover illustrates how a test failover works.
In this example, a failover operation is using a recovery plan that:
- Orchestrates the test failover to a recovery SDDC, for a protection group called 'Users' that regularly replicates snapshots to the cloud file system.
- Returns the VMs to the initial power state after failover. When the VMs recover, they are powered on or off based on their power state when the snapshot was taken. For example, if the VM was powered off when the selected snapshot was taken, then the VM will be powered off after the test failover is complete.
This example illustrates the main steps in running a recovery plan as a test failover:
- Select snapshots
- Define runtime settings
- Preview plan steps and confirm test operation
- Clean up test plan
- Acknowledge test plan cleanup
Select Snapshots
- From the left navigation, select Recovery plans.
- In the list of recovery plans, click the recovery plan you want to test.
In this example, we are selecting a recovery plan called User DR, which contains VMs for end user computer systems. The Status indicates Ready, which means the plan is ready to run as a failover or as a test failover (or, the plan can be deactivated).
- In the Plan details page, to test a plan click the DR Failover Test button.
- You see the Snapshot page of the Test plan wizard.
This page allows you to select a snapshot to use for the test failover. By default, the Test plan wizard selects the most recent snapshot taken of the protection group VMs. If you want to select an older snapshot, click the Use different snapshot button.
- In the Select protection group snapshot dialog box, you can select older snapshots, depending on the disaster recovery scenario you want to test.
- Click OK to select the selected snapshot. Click Next to continue.
Define Runtime Settings
- In the Runtime settings page of the wizard, you set two test failover parameters: error handling and test storage migration.
- Under Error handling, select one of the following options:
- Ignore all errors. To test a failover plan quickly and see the results, you can choose to ignore all errors and run the plan unattended.
- Stop on every error. This option stops the test execution on the first failure, and then requires user intervention to resolve each error before continuing. This option is useful if your plan definition is complex and you want to troubleshoot errors in failover as they occur during plan operation.
- Under the Test storage migration section, choose one of the following options:
- Run VMs live on the cloud file system. After failover, VMs run live directly on the cloud file system, which offers a faster failover time for better RTO. Another benefit of running VMs on the cloud file system is that subsequent failback operations are also faster, resulting in less downtime. Some VMs recovered on the cloud file system might require performance that is better suited to vSAN. After a recovery plan operation completes, you can selectively Storage vMotion workloads to the vSAN datastore to improve performance. If you Storage vMotion the VMs manually, it can cause a longer failback process for those VMs.
Another benefit of using the cloud file system for disaster recovery operations is that it you will likely require fewer, and potentially less expensive, host types to operating during disaster recovery. You only have to size and scale your SDDC for CPU and memory to avoid adding hosts to meet requirements for vSAN capacity, which is often the constraint for sizing of an SDDC.
- Full storage migration to recovery SDDC. Performs a full Storage vMotion migration from the staging datastore to the SDDC vSAN datastore as the final step of running a plan.
Using this option increases RTO, as the plan cannot be committed or finished until all Storage vMotion operations are complete. At scale, this can take hours or days. Without committing a successful failover plan, even with all VMs up and running, you cannot then run a failback operation until the initial Storage vMotion is complete. Also during failback operations, there will be a longer failback outage to recover workloads that have been migrated to vSAN. Fully migrated VMs provide higher IOPS performance, which is suitable for VMs that require higher performance, such as database VMs. This option might require more hosts on the cluster, depending on the size of the VMs.
Note: Test failover plans cannot be failed back. - Run VMs live on the cloud file system. After failover, VMs run live directly on the cloud file system, which offers a faster failover time for better RTO. Another benefit of running VMs on the cloud file system is that subsequent failback operations are also faster, resulting in less downtime. Some VMs recovered on the cloud file system might require performance that is better suited to vSAN. After a recovery plan operation completes, you can selectively Storage vMotion workloads to the vSAN datastore to improve performance. If you Storage vMotion the VMs manually, it can cause a longer failback process for those VMs.
- Click Next.
Preview Steps and Run the Test Plan
- In the Preview page of the Test plan wizard, review the steps that the plan will take when it is run, and then click Next.
- In the Confirmation page, enter the words TEST PLAN and then click the Run test button.
When the plan starts, an email alert is sent to users configured for notification in the recovery plan. You can watch the progress of the test failover from the Tasks list on the right side of the VMware Live Cyber Recovery UI. You can also observe the progress in vCenter on the recovery SDDC.
Clean Up and Acknowledge Test Plan
If you are satisfied with the results of the test recovery plan operation, you can clean up the plan and acknowledge the results.
Cleaning up a test plan reverses the plan's instructions and undoes all of the failover operations and results by unregistering and deleting VMs on the test recovery site.
In this example, the test recovery plan finished with no errors. To view more information about what happened during the failover, you can expand each step of the plan operation.
- After you review the test plan failover, click the Clean up button.
- In the Clean up dialog box, review the details, and then enter CLEAN UP TEST. Click the Clean up button to initiate the test plan cleanup.
- After cleanup, you need to explicitly acknowledge that a test failover ran successfully and you want to tear down the test failover. Click the Acknowledge button.
- In the Acknowledge dialog box, review the clean up operation details, enter an optional note, and then click the Acknowledge button.
After you acknowledge a test failover, VMware sends an email to users configured for notification in the recovery plan, along with a PDF report describing the test failover, including a summary of the plan, the plan configuration, and logs for runtime, failover, and any errors.
You now have the option to run the plan again as a test failover, or as a regular failover operation.