Starting with VMware Cloud Director 10.1, if the primary database service fails, you can enable VMware Cloud Director to perform an automatic failover to a new primary.

The automatic failover eliminates the need for an administrator to initiate the failover action if the primary database service fails to perform its functions for any reason. By default, the failover mode is set to manual. You can set the failover mode to automatic or manual by using the VMware Cloud Director appliance API. See the VMware Cloud Director Appliance API Schema Reference.

Note: If your cluster is configured for automatic failover, after you deploy one or more additional cells, you must use the Appliance API to reset the cluster failover mode to Automatic. The default failover mode for new cells is Manual. If the failover mode is inconsistent across the nodes of the cluster, the cluster failover mode is Indeterminate. The Indeterminate mode can lead to inconsistent cluster states between the nodes and nodes following an old primary cell. To view the cluster failover mode, see View the VMware Cloud Director Appliance Cluster Health and Failover Mode.

If your environment has at least two active standby cells, in case of a primary database failure, a database failover is automatically initiated. After the failover, there must be at least one active standby for the new primary database to be updatable. Under normal circumstances, your VMware Cloud Director appliance deployment must have at least two active standbys at all times. If there is only one active standby for a short period, for example, due to the failure of the primary and the promotion of one of the standbys, then the old failed primary must be replaced with a new standby as soon as possible.

When there is an active primary and at least two active standby cells, the cluster is considered to be in a Healthy state. If there is an active primary and only one active standby, the cluster is in a Degraded state. If there is another database failure while the cluster is in a Degraded state, the primary is not updatable until another standby comes online. When the primary database is not updatable, VMware Cloud Director is not available because the VMware Cloud Director cells are unable to update the database until there is at least one active standby to process a streaming replication from the primary database. The concept of a Healthy and Degraded cluster is the same whether you enable manual or automatic failover.

Figure 1. Manual and Automatic VMware Cloud Director Appliance Failover
An operational state diagram that represents the "No_Active_Primary" state of a primary after a primary database failure, the required administrator input for a standby promotion, and the manual redeployment of the failed primary. Next to it is the operational state diagram for automatic failover. The diagram displays the "No_Active_Primary" state of a primary after a primary database failure, the automatic promotion of a standby cell, and the manual redeployment of the failed primary.