The VeloCloud Orchestrator (VCO) Disaster Recovery (DR) feature prevents the loss of stored data and resumes VCO services in the event of system or network failure.

VCO DR involves setting up an active/standby VCO pair with data replication and a manually-triggered failover mechanism.
  • The recovery time objective (RTO), therefore, is dependent on explicit action by the operator to trigger promotion of the standby.
  • The recovery point objective (RPO), however, is essentially zero, regardless of the recovery time, because all configuration is instantaneously replicated. Monitoring data that would have been collected during the outage is cached on the edges and gateways pending promotion of the standby.
Note: DR is mandatory. For licensing and pricing, contact the VeloCloud sales team for support.

Active/Standby Pair

In a VCO DR deployment, two identical VCO systems are configured as an active / standby pair. The operator can view the state of DR readiness through the web UI on either of the servers. Edges and gateways are aware of both VCOs, and while they receive configuration changes only from the active VCO, they periodically send DR heartbeats to both systems to report their view of both servers and to query the DR system status. When the operator triggers a failover, the edges and gateways are informed of the change in their next DR heartbeat.

DR States

From the view of an operator, and of the edges and gateways, a VCO has one of four DR states:

DR State Description
Standalone No DR configured.
Active DR configured, acting as the primary VCO server.
Standby DR configured, acting as an inactive replica VCO server.
Zombie DR formerly configured and active but no longer acting as the active or standby.

Run-time Operation

When DR is configured, the standby server runs in a limited mode, blocking all API calls except those related to the DR status and the DR heartbeats. When the operator invokes a failover, the standby is promoted to become fully operational as a Standalone server. The server that was formerly active is automatically transitioned to a Zombie state if it is responsive and visible from the promoted standby. In the Zombie state, management configuration services are blocked and any contact from edges and gateways that have not transitioned to the new active VCO are redirected to the promoted server.

disaster-recovery-replica-and-status