Orchestrate complex failover or migration to and from paired cloud and on-premises sites by using recovery plans. These plans attach existing replications to ordered steps with optional delay or prompt attributes. Prioritize which workloads failover or migrate first and power on, followed by workloads pending specific conditions before recovering or migrating and powering on.
- Recovery Plans
-
Each recovery plan consists of sequential actions, called steps. The plans can contain an unlimited number of ordered steps.
- Recovery plans contain steps that perform only test failover or failover of the protected workloads.
- Migration plans allow scheduling of the synchronization and contain steps that perform only migrations.
- Steps
- Each step in the recovery plan can perform multiple existing replication tasks such as a test failover, a failover, or a migration of the workload with optional attributes after the step completes, like a delay or a prompt.
- Delay
- This step attribute allows configuring a waiting time before executing the next step. The delay applies after completing all replication tasks in the current step.
- Prompt
- This step attribute allows configuring a user prompt message, suspending the current step execution before the next step occurs, until approval of the prompt in the current step.
Scheduling Migrations Synchronization
- Scheduling the initial synchronization of a migration
- You can schedule the initial synchronization time when creating any migration.
- Scheduling the migrations auto synchronization of a plan
- You can schedule the migrations auto synchronization when creating or editing a migration plan.
- Delayed synchronization
- At the migration plan scheduled time, if a migration is started paused, meaning the virtual machine is not running or the initial synchronization time of the migration is scheduled in the future compared with the plan scheduled time, the migration performs its initial synchronization.
- Synchronize before migrate
- At the migration plan scheduled time, if a migration is already synchronized, meaning the virtual machine is running and no initial synchronization time of the migration is scheduled at all or it has been scheduled but the synchronization already passed, the migration performs a subsequent synchronization, for reducing the Recovery Time Objective (RTO) near the actual migration.
Step and Recovery Plan Execution
The execution of a recovery plan repeats for each step the following fixed execution sequence, according to the configured attributes.
- Execute and complete the step of the plan. In parallel, for each workload in the step:
- First, perform the replication task like test failover or failover by using the latest available instance for the replication. Migrate tasks perform at least one synchronization before falling over.
- After the replication task completes, power on the workload.
- Skip, unless a delay is configured.
- Else, the step waits for the configured seconds or minutes.
- After the delay, the plan resumes executing #3.
- Skip, unless a prompt is configured.
- Else, suspend the plan after completing the current step, until approving the prompt.
- Prompt the user. Approving the prompt resumes executing #4.
- Repeat this sequence with the next step in the plan, if any more, executing from #1.
- After the last step, the recovery plan completes with a Completed Failover or a Completed Migrate state, regardless of whether certain replication operations completed with a warning.
-
Alternatively, the recovery plan suspends with a Suspended... state on a prompt, or when clicking Suspend, or at any step where the replication operation fails with an error message.
For example, any recovery plan suspends at a migration step that requires authentication with the remote site.
Recovery Plan States
The allowed operations on a recovery plan depend on its current state and on the last operation in the plan.
- Not started recovery plans
- Not started state persists before executing any recovery plan operation, or after executing test cleanup operation. The recovery plans allow all operations, like test failover, failover or migrate, editing and modifying the steps, and attaching and detaching replications.
- Running recovery plans
- While running, recovery plans only allow clicking Suspend, suspending the plan after the current step executes. Running recovery plans do not allow any other replication operation, nor modifying the steps, nor their order, nor attaching and detaching replications.
- Suspended recovery plans
-
- Suspended on prompt recovery plans resume after clicking Approve Prompt. Alternatively, they resume by using failover or migrate.
- Suspended recovery plans after test or cleanup step allow resuming by using test failover or test cleanup, failover or migrate.
- Suspended recovery plans after a failover or a migrate step, allow resuming by using failover or migrate.
- All suspended recovery plans allow editing and modifying the steps and attaching and detaching replications. For example, detaching replications that suspend the recovery plan, allows resuming the plan execution.
- Modifying the steps order then resuming uses the previous step order before the modification. New steps execute according to their order, for example, adding a step and moving it before the currently suspended step resumes execution with the new step first.
- Completed recovery plans
-
- Completed failover and completed migrate plans only allow deleting or cloning in a new plan. Such plans do not allow editing nor modifying their steps, nor attaching and detaching replications.
- Completed test failover plans, allow test cleanup, failover, migrate, and editing but do not allow attaching and detaching replications.
- Migration plans migrate their workloads and complete. Similarly, failover plans perform failover and complete.
- Empty steps execute and complete, performing no operations and continue with the next step.
- Empty recovery plans without steps or with empty steps execute without performing any tasks and have a Completed state.
Replications Implications
- Steps can only use existing replications and do not create new replications.
- The recovery plan steps treat the replicated workload similarly, regardless of its type.
-
One replication task can be part of multiple recovery plans but not in multiple steps in the same plan.
When using the same replication task in more than one recovery plan and several plans using this task start simultaneously, the plan that first starts the replication task completes its steps. The remaining recovery plans steps also complete while skipping this reused replication task as already performed when the step completed. If the step is in-progress, remaining recovery plans can fail.
For example, running two recovery plans that contain steps with replication tasks for the same workload. The first plan executes a step performing a failover task then the second plan executes a step performing a test failover task. As a result, the recovery plan executing the test failover task fails, at the step containing the already failed over replication.
- Deleting a replication while used in a recovery plan, detaches the replication from the step where attached, without causing the plan failing.
- To change advanced replication settings, like network settings or disk settings, directly modify the replication settings. After the modifications, all plans using the modified replication execute by using the updated replication settings.
Recovery Plans Operations
To perform plans operations, log in to the cloud site, then in the left pane, under the Replications section click Recovery Plans.
- New recovery plan
- Allows entering a name and optional description then creates a blank recovery plan for adding steps that perform protections.
- New migration plan
- Allows entering a name, optional description, optional synchronization schedule of the migrations then creates a blank migration plan for adding steps that perform migrations. Scheduling the migration in the plan overrides the usual scheduled migration.
- New step
- Adds a step in the selected plan. For information about the actions of the steps, see the next section.
- Edit
- Editing allows modifying the selected plan name and description and for migration plans modifying the automatic synchronization schedule. Editing is available for plans in a Not started, or Completed Test, or Suspended... state.
- Delete
-
Prompts a confirmation for removing the selected plan. Deleting is available for suspended, completed, and not started plans. Deleting is not available only for plans in a
Running state.
Deleting a recovery plan also removes all of its reports.
- Suspend
- Suspending is available only for plans in a Running state. Suspending the selected plan requests pausing its execution after completing the currently running replication task in the current step. While suspended, the plan allows attaching and detaching replications, re-ordering the steps, and adding or removing steps. Modifying the steps or their order causes resuming the plan execution at the first step and skipping completed steps, where an already approved prompt means a completed step. When a prompt suspends the step, after reordering the steps and then approving the prompt resumes with the original next step as before reordering and the plan completes.
- Test
- Performs a test failover task for all workloads in the selected plan. Testing is inactive after a test or after a failover or a migrate task completes.
- Test Cleanup
- Performs a cleanup of the test failover tasks for all workloads in the selected plan. Cleanup is inactive, until completing a test.
- Failover
- Performs a failover task for all workloads in the selected plan. Failover is inactive after a failover or a migrate task completes. Failover is available for plans in a Not started, or Completed Test, or Suspended state.
- Migrate
- Performs a migration task for all workloads in the selected plan. Migrate suspends unless authenticated with the remote site. Migrate is inactive after a failover or a migrate task completes. Similar to failover, migrate is available for plans in a Not started, or in a Completed Test, or in a Suspended... state.
- Monitor tasks
- Opens Replication Tasks, filtered to only display the tasks of the selected plan.
- Other actions
-
- For sites backed by VMware Cloud Director: Change owner - allows selecting a new owner organization for the selected plan. The ownership and the visibility of a plan belong to the user who initially created it. For example, plans created by the service provider are not visible to a tenant user, until the changing the owner. Change owner is inactive after failover or migrate complete. Change owner is not available for vSphere DR and migration.
- Clone - prompts for a name of the duplicate plan and copies the steps of the selected plan in the duplicate plan. Optionally, cloning allows detaching all replications from the steps of the duplicate plan, while preserving the steps. Cloning a recovery plan creates a recovery plan duplicate, similarly cloning a migration plan, creates a migration plan duplicate. Both completed and suspended plans allow cloning. The cloned plan is in a Not started state with Not started steps, regardless of whether any steps completed in the source plan.
- Reports - shows the Recovery Plan Reports window for the selected recovery plan. This page contains entries for the performed operation of each completed plan execution, the start and end timestamps and the result of each execution. For example, the following recovery plan executed four times, with the latest performed operation on top:
Table 1. Recovery Plan Reports Operation Start Date End Date Result Failover d/m/yyyy, h:mm:ss d/m/yyyy, h:mm:ss Success Test d/m/yyyy, h:mm:ss d/m/yyyy, h:mm:ss Error Cleanup d/m/yyyy, h:mm:ss d/m/yyyy, h:mm:ss Success Test d/m/yyyy, h:mm:ss d/m/yyyy, h:mm:ss Error
Recovery Plan Execution Report
To see reports for each execution of a recovery plan, select the plan and click
.- Plan: name.
- Type: RECOVERY or MIGRATION.
- Site: name.
- Owner, depending on the deployment type:
- Owner: org@site for sites backed by VMware Cloud Director.
- Owner: System for vSphere DR and migration.
- Steps: X executed of Y total.
- Duration: start date - end date.
- Operation: Test, or Cleanup, or Failover, or Migrate, with operation state Completed or Failed.
Operation suspended by user. operations show as Failed.
- Step information: Step name, Delay if exists, Duration, Outcome, Prompt if exists.
- Recovery information: Workload Name, Source site, State, Duration if executed, Outcome.
Steps Operations
- New step
-
- For recovery plans, completing the New Recovery Step wizard allows attaching multiple protections for recovery in the step and creates a recovery step.
- For migration plans, completing the New Migration Step wizard allows attaching multiple migrations for recovery in the step and creates a migration step.
- Edit
- Allows modifying the name, the optional delay, and the optional prompt of the selected step.
- Delete
- Prompts a confirmation for removing the selected step from the current plan. Deleting is available for completed steps but not while a step is running.
- Attach
-
The
Attach replications window allows selecting replications for attaching in the selected step.
Note: For sites backed by VMware Cloud Director:
- When attaching a vApp replication, changing the number of replicated virtual machines in that vApp replication, affects the recovery plan. Adding virtual machine replications for that vApp attaches the new virtual machine replications to the step with the attached the vApp. Similarly, removing virtual machine replications from the vApp detaches them from the step.
- Alternatively, attaching all the virtual machine replications of a vApp replication in the step permanently fixes those virtual machine replications as part of the step. Adding or removing virtual machine replications to the same vApp replication does not affect the step or the recovery plan.
Health Validation
Since VMware Cloud Director Availability 4.7, each user can validate that the environment is actually ready to recover the replicated workloads in a recovery plan by performing a check against a predefined set of rules. These validations are available only for the replications that the user has permissions to see.
- Click on an existing recovery plan name to expand its details, then click the Health tab to see the following three sections: Failover, Migrate, and Test Failover with the applicable validations for each.
- Validations results:
- No issues found - shows when the plan contains all necessary settings to recover the workloads in the destination site.
- Warning – shows a yellow triangle and identifies each issue that does not prevent the recovery process but can mismatch the expected recovery due to either the replication state or the infrastructure in the destination site. For example, not all virtual machines within a vApp are included in the recovery plan, or the configured network in the recovery settings for the replication is no longer available.
- Error – shows a red triangle and identifies each issue that fails the successful recovery of the replicated workload. For example, for a replication has no instances yet or the storage configuration includes a missing or disabled storage policy.
- N/A - shows with no steps defined or no replications assigned in any step.
- To initiate a health check validation, these are the following types of validation starts and their effects:
- Click the recovery plan name, then click the Health tab to see the current validation results.
- To perform a manual validation of the health of the plan at any time, click Refresh health.
The following system-defined rules are preset:
Validation | Description | Result and possible failure resolution |
---|---|---|
COMPUTER_NAME | Validates computer name is set up. (only applicable when Guest customization is activated.) | WARN Guest customization activated while missing the required 'computerName' property. With active Guest customization toggle, when you skip entering a computer name, VMware Cloud Director automatically generates one, for example, 'vmname-001', that may disjoin from the domain virtual machines running a Windows guest OS. To enter a computer name, click . |
EMPTY_PLAN | Validates the plan has at least one step. | WARN Recovery plan must have at least one step. |
EMPTY_STEP | Validates each step contains at least one replication attached. | WARN Recovery step 'xyz' has no replications attached. |
INSTANCES | Validates the replication has at least one instance created. | FAIL Cannot create image. No available instances. For successful workload recovery, the replication requires at least one created instance. Ensure the source workload is powered-on and verify that the replication settings are correct. To trigger an instance creation, click . |
PLACEMENT | Validates the destination placement datastore, sizing policy, or storage profiles. | WARN Placement or sizing policy not found (applies to VDC VM policies in VMware Cloud Director). FAIL Invalid replication storage (the datastore/storage policy fails). The destination storage solution can no longer be used for this replication. To modify it, click Replication settings or navigate to the replications list and click . |
RECOVERY_SETTINGS | Validates recovery settings like networks, compute, and others. | FAIL Missing recovery settings for replication. The recovery settings are required for each replication for the system to know what infrastructure to use. To configure them, click Recovery settings or navigate to the replications list and click . |
SRC_SITE_UP | Validates the source site is reachable. (only applicable for migrations.) | FAIL Remote site 'abc' is not available. |
SRC_VM_TOOLS | Validates source VM has appropriate tools installed (only applicable when Guest customization is activated.) | WARN Missing VMware Tools, required for guest customization. Guest customization needs VMware Tools installed in the virtual machine. |
VAPP_ALL_VMS | Validates all virtual machines of every vApp are replicated. (only applicable for vApp replications.) | WARN Not all VMs are replicated in the vApp replication. |
After resolving any of the above failures, the health validation shows: The operation for resolving the issue has completed. You need to trigger manually the validation again in order to have the latest status available.