The vSphere HA configuration protects the virtual machines of the management components whose operation is critical for the operation of your VMware Cloud Foundation environment. You consider the varying and sometimes significant CPU or memory reservations for the management virtual machines and the requirements of vSAN.

You configure several vSphere HA features to provide high availability for the management components of the SDDC.
Table 1. vSphere HA Features Configured for the SDDC

vSphere HA Feature

Description

Host failure response

vSphere HA can respond to individual host failures by restarting virtual machines on other hosts within the cluster.

Response for host isolation

If a host becomes isolated, vSphere HA can detect and shut down or restart virtual machines on available hosts.

Admission control policy

Configure how the cluster determines available resources. In a smaller vSphere HA cluster, a larger proportion of the cluster resources are reserved to accommodate ESXi host failures according to the selected admission control policy.

VM and Application Monitoring

If a virtual machine failure occurs, the VM and Application Monitoring service restarts that virtual machine. The service uses VMware Tools to evaluate whether a virtual machine in the cluster is running.

Table 2. Admission Control Policies in vSphere HA

Policy Name

Description

Host failures the cluster tolerates

vSphere HA ensures that a specified number of ESXi hosts can fail and sufficient resources remain in the cluster to fail over all the virtual machines from those ESXi hosts.

Percentage of cluster resources reserved

vSphere HA reserves a specified percentage of aggregated CPU and memory resources for failover.

Specify Failover Hosts

If an ESXi host fails, vSphere HA attempts to restart its virtual machines on any of the specified failover ESXi hosts. If a restart is not possible, for example, the failover ESXi hosts have insufficient resources or have failed as well, then vSphere HA attempts to restart the virtual machines on other ESXi hosts in the cluster.

Table 3. Design Decisions on vSphere Availability for the Default Management Cluster

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VCS-CLS-005

Use vSphere HA to protect all virtual machines against failures.

vSphere HA supports a robust level of protection for both ESXi host and virtual machine availability.

You must provide sufficient resources on the remaining hosts so that virtual machines can be migrated to those hosts in the event of a host outage.

VCF-MGMT-VCS-CLS-006

Set host isolation response to Power Off and restart VM in vSphere HA.

vSAN requires that the host isolation response be set to Power Off and to restart virtual machines on available ESXi hosts.

If a false positive event occurs, virtual machines are powered off and an ESXi host is declared isolated incorrectly.

VCF-MGMT-VCS-CLS-007

Set the advanced cluster setting das.usedefaultisolationaddress to false.

Ensures that vSphere HA uses the manual isolation addresses instead of the default management network gateway address.

You must manually configure this advanced parameter in case of deploying the management cluster in a single availability zone.

Table 4. Design Decisions on the Admission Control Policy for the Default Cluster in a Management Domain with a Single Availability Zone

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VCS-CLS-008

Configure admission control for 1 ESXi host failure and percentage-based failover capacity.

Using the percentage-based reservation works well in situations where virtual machines have varying and sometimes significant CPU or memory reservations.

vSphere automatically calculates the reserved percentage according to the number of ESXi host failures to tolerate and the number of ESXi hosts in the cluster.

In a cluster of 4 ESXi hosts, the resources of only 3 ESXi hosts are available for use.

VCF-MGMT-VCS-CLS-009

Set the isolation address for the cluster to the gateway IP address for the vSAN network.

Allows vSphere HA to validate complete network isolation if a connection failure occurs on an ESXi host.

You must manually configure the isolation address.

Table 5. Design Decisions on the Admission Control Policy for the Default Management Cluster for Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VCS-CLS-010

Increase admission control percentage to the half of the ESXi hosts in the cluster.

Allocating only half of a stretched cluster ensures that all VMs have enough resources if an availability zone outage occurs.

In a cluster of 8 ESXi hosts, the resources of only 4 ESXi hosts are available for use.

If you add more ESXi hosts to the default management cluster, add them in pairs, one per availability zone.

VCF-MGMT-VCS-CLS-011

Set an additional isolation address to the vSAN network gateway in the second availability zone.

Allows vSphere HA to validate complete network isolation if a connection failure occurs on an ESXi host or between availability zones.

None.

Table 6. Design Decisions on the VM and Application Monitoring Service for the Management Domain

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VCS-CLS-012

Enable VM Monitoring for each cluster.

VM Monitoring provides in-guest protection for most VM workloads. The application or service running on the virtual machine must be capable of restarting successfully after a reboot or the virtual machine restart is not sufficient.

None.

VCF-MGMT-VCS-CLS-013

Set the advanced cluster setting das.iostatsinterval to 0 to deactivate monitoring the storage and network I/O activities of the management appliances.

Enables triggering a restart of a management appliance when an OS failure occurs and heartbeats are not received from VMware Tools instead of waiting additionally for the I/O check to complete.

If you want to specifically enable I/O monitoring, then configure the das.iostatsinterval advanced setting.