The SD-WAN Orchestrator Disaster Recovery (DR) feature prevents the loss of stored data and resumes SD-WAN Orchestrator services in the event of system or network failure.

SD-WAN Orchestrator DR involves setting up an active/standby SD-WAN Orchestrator pair with data replication and a manually-triggered failover mechanism.
  • The recovery time objective (RTO), therefore, is dependent on explicit action by the operator to trigger promotion of the standby.
  • The recovery point objective (RPO), however, is essentially zero, regardless of the recovery time, because all configuration is instantaneously replicated. Monitoring data that would have been collected during the outage is cached on the edges and gateways pending promotion of the standby.
Note: DR is mandatory. For licensing and pricing, contact the VMware sales team for support.

Active/Standby Pair

In a SD-WAN Orchestrator DR deployment, two identical SD-WAN Orchestrator systems are configured as an active / standby pair. The operator can view the state of DR readiness through the web UI on either of the servers. Edges and gateways are aware of both SD-WAN Orchestrators, and while they receive configuration changes only from the active SD-WAN Orchestrator, they periodically send DR heartbeats to both systems to report their view of both servers and to query the DR system status. When the operator triggers a failover, the edges and gateways are informed of the change in their next DR heartbeat.

DR States

From the view of an operator, and of the edges and gateways, a SD-WAN Orchestrator has one of four DR states:

DR State Description
Standalone No DR configured.
Active DR configured, acting as the primary SD-WAN Orchestrator server.
Standby DR configured, acting as an inactive replica SD-WAN Orchestrator server.
Zombie DR formerly configured and active but no longer acting as the active or standby.

Run-time Operation

When DR is configured, the standby server runs in a limited mode, blocking all API calls except those related to the DR status and the DR heartbeats. When the operator invokes a failover, the standby is promoted to become fully operational as a Standalone server. The server that was formerly active is automatically transitioned to a Zombie state if it is responsive and visible from the promoted standby. In the Zombie state, management configuration services are blocked and any contact from edges and gateways that have not transitioned to the new active SD-WAN Orchestrator are redirected to the promoted server.