The SD-WAN Orchestrator Disaster Recovery (DR) feature prevents the loss of stored data and resumes SD-WAN Orchestrator services in the event of system or network failure.
- The recovery time objective (RTO), therefore, is dependent on explicit action by the operator to trigger promotion of the standby.
- The recovery point objective (RPO), however, is essentially zero, regardless of the recovery time, because all configuration is instantaneously replicated. Monitoring data that would have been collected during the outage is cached on the edges and gateways pending promotion of the standby.
Active/Standby Pair
In a SD-WAN Orchestrator DR deployment, two identical SD-WAN Orchestrator systems are configured as an active / standby pair. The operator can view the state of DR readiness through the web UI on either of the servers. Edges and gateways are aware of both SD-WAN Orchestrators, and while they receive configuration changes only from the active SD-WAN Orchestrator, they periodically send DR heartbeats to both systems to report their view of both servers and to query the DR system status. When the operator triggers a failover, the edges and gateways are informed of the change in their next DR heartbeat.
DR States
From the view of an operator, and of the edges and gateways, a SD-WAN Orchestrator has one of four DR states:
DR State | Description |
---|---|
Standalone | No DR configured. |
Active | DR configured, acting as the primary SD-WAN Orchestrator server. |
Standby | DR configured, acting as an inactive replica SD-WAN Orchestrator server. |
Zombie | DR formerly configured and active but no longer acting as the active or standby. |
Run-time Operation
When DR is configured, the standby server runs in a limited mode, blocking all API calls except those related to the DR status and the DR heartbeats. When the operator invokes a failover, the standby is promoted to become fully operational as a Standalone server. The server that was formerly active is automatically transitioned to a Zombie state if it is responsive and visible from the promoted standby. In the Zombie state, management configuration services are blocked and any contact from edges and gateways that have not transitioned to the new active SD-WAN Orchestrator are redirected to the promoted server.