You can create a VMware Cloud Director appliance deployment with a database HA cluster that provides failover capabilities to your VMware Cloud Director database.

The VMware Cloud Director appliance includes an embedded PostgreSQL database. The embedded PostgreSQL database includes the Replication Manager (repmgr) tool suite, which provides a high availability (HA) function to a cluster of PostgreSQL servers.

You can deploy the VMware Cloud Director appliance as a primary cell, standby cell, or VMware Cloud Director application cell. See Deploy Your VMware Cloud Director Appliance by Using the vSphere Client, Deploying Your VMware Cloud Director Appliance by Using VMware OVF Tool, or #GUID-D35B3629-FCA2-40A6-8009-1A6CF8120F30.

To configure HA for your VMware Cloud Director database, when you create your server group, you can configure a database HA cluster by deploying one primary and two standby instances of the VMware Cloud Director appliance. You can horizontally scale your server group by additionally deploying application cells. See the VMware Cloud Director Appliance Database HA Cluster figure.

Figure 1. VMware Cloud Director Appliance Database HA Cluster
One primary and two standby cells

Creating a VMware Cloud Director Appliance Deployment with Database HA

To create a VMware Cloud Director server group with a database HA configuration, follow this workflow:
  1. Deploy the VMware Cloud Director appliance as a primary cell.

    The primary cell is the first member in the VMware Cloud Director server group. The embedded database is configured as the VMware Cloud Director database. The database name is vcloud, and the database user is vcloud.

  2. Verify that the primary cell is up and running.
    1. To verify the VMware Cloud Director service health, log in with the system administrator credentials to the VMware Cloud Director Service Provider Admin Portal at https://primary_eth0_ip_address/provider.
    2. To verify the PostgreSQL database health, log in as root to the appliance management user interface at https://primary_eth1_ip_address:5480.

      The primary node must be in a running status.

  3. Deploy two instances of the VMware Cloud Director appliance as standby cells.

    The embedded databases are configured in a replication mode with the primary database.

    Note: After the initial standby appliance deployment, the replication manager begins synchronizing its database with the primary appliance database. During this time, the VMware Cloud Director database and therefore the VMware Cloud Director UI are unavailable.
  4. Verify that all cells in the HA cluster are running.

    See View Your VMware Cloud Director Appliance Cluster Health and Failover Mode.

  5. (Optional) Deploy one or more instances of the VMware Cloud Director appliance as VMware Cloud Director Application cells.

    The embedded databases are not used. The VMware Cloud Director Application cell connects to the primary database.

One primary cell, two standby cells, and N VMware Cloud Director Application cells
Note: If your cluster is configured for automatic failover, after you deploy one or more additional cells, you must use the Appliance API to reset the cluster failover mode to Automatic. See the VMware Cloud Director Appliance API. The default failover mode for new cells is Manual. If the failover mode is inconsistent across the nodes of the cluster, the cluster failover mode is Indeterminate. The Indeterminate mode can lead to inconsistent cluster states between the nodes and nodes following an old primary cell. To view the cluster failover mode, see View Your VMware Cloud Director Appliance Cluster Health and Failover Mode.

Creating a VMware Cloud Director Appliance Deployment Without Database HA

Important: VMware does not provide support for VMware Cloud Director appliance deployments without database HA.
To create a VMware Cloud Director server without a database HA configuration, follow this workflow:
  1. Deploy the VMware Cloud Director appliance as a primary cell.

    The primary cell is the first member in the VMware Cloud Director server group. The embedded database is configured as the VMware Cloud Director database. The database name is vcloud, and the database user is vcloud.

  2. Verify that the primary cell is up and running.
    1. To verify the VMware Cloud Director service health, log in with the system administrator credentials to the VMware Cloud Director Service Provider Admin Portal at https://primary_eth0_ip_address/provider.
    2. To verify the PostgreSQL database health, log in as root to the appliance management user interface at https://primary_eth1_ip_address:5480.

      The primary node must be in a running status.

  3. (Optional) Deploy one or more instances of the VMware Cloud Director appliance as VMware Cloud Director Application cells.

    The embedded database is not used. The VMware Cloud Director Application cell connects to the primary database.

One primary cell and N VMware Cloud Director application cells

Automatic Failover of Your VMware Cloud Director Appliance

If the primary database service fails, you can activate VMware Cloud Director to perform an automatic failover to a new primary.

The automatic failover eliminates the need for an administrator to initiate the failover action if the primary database service fails to perform its functions for any reason. By default, the failover mode is set to manual. You can set the failover mode to automatic or manual by using the VMware Cloud Director appliance API. See the VMware Cloud Director Appliance API Schema Reference.

Note: If your cluster is configured for automatic failover, after you deploy one or more additional cells, you must use the Appliance API to reset the cluster failover mode to Automatic. See the VMware Cloud Director Appliance API. The default failover mode for new cells is Manual. If the failover mode is inconsistent across the nodes of the cluster, the cluster failover mode is Indeterminate. The Indeterminate mode can lead to inconsistent cluster states between the nodes and nodes following an old primary cell. To view the cluster failover mode, see View Your VMware Cloud Director Appliance Cluster Health and Failover Mode.

If your environment has at least two active standby cells, in case of a primary database failure, a database failover is automatically initiated. After the failover, there must be at least one active standby for the new primary database to be updatable. Under normal circumstances, your VMware Cloud Director appliance deployment must have at least two active standbys at all times. If there is only one active standby for a short period, for example, due to the failure of the primary and the promotion of one of the standbys, then the old failed primary must be replaced with a new standby as soon as possible.

When there is an active primary and at least two active standby cells, the cluster is considered to be in a Healthy state. If there is an active primary and only one active standby, the cluster is in a Degraded state. If there is another database failure while the cluster is in a Degraded state, the primary is not updatable until another standby comes online. When the primary database is not updatable, VMware Cloud Director is not available because the VMware Cloud Director cells are unable to update the database until there is at least one active standby to process a streaming replication from the primary database. The concept of a Healthy and Degraded cluster is the same whether you activate manual or automatic failover.

After a primary database failure, the state of the primary is No_Active_Primary. For a manual VMware Cloud Director appliance failover, the administrator must manually promote a standby to primary and redeploy the failed primary as a standby. For automatic appliance failover, VMware Cloud Director automatically promotes a standby to primary, and the administrator manually redeploys the failed primary as a standby.

Figure 2. Manual and Automatic VMware Cloud Director Appliance Failover
If the primary database service fails, the promotion of a standby to primary can be manual or automatic.

Automatic Fencing of Your Failed VMware Cloud Director Primary Cell

If a new primary cell is promoted after a primary cell failure, VMware Cloud Director automatically fences out the old primary to prevent it from restarting.

In case of a failover, if a failed primary database restarts after a new primary cell is promoted, VMware Cloud Director automatically fences out the old primary. This automation prevents the split-brain syndrome where two active databases can diverge from each other. The fencing automation stops and deactivates the vpostgres service on the old primary node. After that, you can redeploy the failed primary as a standby cell to restore the cluster health to Healthy.

For more information about viewing the cluster health status and failover mode, see View Your VMware Cloud Director Appliance Cluster Health and Failover Mode.