The vSphere HA configuration protects customer workloads in the VI workload domain. You consider the varying and sometimes significant CPU or memory reservations for the customer workloads and the requirements of vSAN.

You configure several vSphere HA features to provide high availability for the customer workloads in the domain.

Table 1. vSphere HA Features Configured for the SDDC

vSphere HA Feature

Description

Host failure response

vSphere HA can respond to individual host failures by restarting virtual machines on other hosts within the cluster.

Response for host isolation

If a host becomes isolated, vSphere HA can detect and shut down or restart virtual machines on available hosts.

Datastore with PDL or APD

When virtual machines are hosted on non-vSAN datastores, vSphere HA can detect datastore outages and restart virtual machines on hosts that have datastore access.

Admission control policy

Configure how the cluster determines available resources. In a smaller vSphere HA cluster, a larger proportion of the cluster resources are reserved to accommodate ESXi host failures according to the selected admission control policy.

VM and Application Monitoring

If a virtual machine failure occurs, the VM and Application Monitoring service restarts that virtual machine. The service uses VMware Tools to evaluate whether a virtual machine in the cluster is running.

Table 2. Admission Control Policies in vSphere HA

Policy Name

Description

Host failures the cluster tolerates

vSphere HA ensures that a specified number of ESXi hosts can fail and sufficient resources remain in the cluster to fail over all the virtual machines from those ESXi hosts.

Percentage of cluster resources reserved

vSphere HA reserves a specified percentage of aggregated CPU and memory resources for failover.

Specify failover hosts

If an ESXi host fails, vSphere HA attempts to restart its virtual machines on any of the specified failover ESXi hosts. If a restart is not possible, for example, the failover ESXi hosts have insufficient resources or have failed as well, then vSphere HA attempts to restart the virtual machines on other ESXi hosts in the vSphere Cluster.

Table 3. Design Decisions on vSphere Availability for a VI Workload Domain Cluster

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-VCS-CLS-005

Use vSphere HA to protect all virtual machines against failures.

vSphere HA supports a robust level of protection for both ESXi host and virtual machine availability.

You must provide sufficient resources on the remaining hosts so that virtual machines can be migrated to those hosts in the event of a host outage.

VCF-WLD-VCS-CLS-006

Set host isolation response to Power Off and restart VM in vSphere HA.

vSAN requires that the host isolation response be set to Power Off and to restart virtual machines on available ESXi hosts.

If a false positive event occurs, virtual machines are powered off and an ESXi host is declared isolated incorrectly.

VCF-WLD-VCS-CLS-007

Set the advanced cluster setting das.usedefaultisolationaddress to false.

Ensures that vSphere HA uses the manual isolation addresses instead of the default management network gateway address.

You must configure this parameter manually.

Table 4. Design Decisions on the Admission Control Policy for a Cluster in a VI Workload Domain with a Single Availability Zone

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-VCS-CLS-008

Configure admission control for 1 ESXi host failure and percentage-based failover capacity.

Using the percentage-based reservation works well in situations where virtual machines have varying and sometimes significant CPU or memory reservations.

vSphere automatically calculates the reserved percentage according to the number of ESXi host failures to tolerate and the number of ESXi hosts in the cluster.

In a cluster of 4 ESXi hosts, the resources of only 3 ESXi hosts are available for use.

VCF-WLD-VCS-CLS-009

Set the isolation address for the cluster to the gateway IP address for the vSAN network.

Allows vSphere HA to validate complete network isolation if a connection failure occurs on an ESXi host.

You must configure the isolation address manually.

Table 5. Design Decisions on the Admission Control Policy for a Cluster in a VI Workload Domain with Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-VCS-CLS-010

Increase admission control percentage to the half of the ESXi hosts in the cluster.

Allocating only half of a stretched cluster ensures that all VMs have enough resources if an availability zone outage occurs.

In a cluster of 8 ESXi hosts, the resources of only 4 ESXi hosts are available for use.

If you add more ESXi hosts to the cluster, add them in pairs, one per availability zone.

VCF-WLD-VCS-CLS-011

Set an additional isolation address to the vSAN network gateway in the second availability zone.

Allows vSphere HA to validate complete network isolation if a connection failure occurs on an ESXi host or between availability zones.

None.

Table 6. Design Decisions on the VM and Application Monitoring Service for a VI Workload Domain

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-VCS-CLS-012

Enable VM Monitoring for each cluster.

VM Monitoring provides in-guest protection for most VM workloads. The application or service running on the virtual machine must be capable of restarting successfully after a reboot or the virtual machine restart is not sufficient.

None.

VCF-WLD-VCS-CLS-013

Set the advanced cluster setting das.iostatsinterval to 0 to deactivate monitoring the storage and network I/O activities of the management and workload appliances in the cluster.

The NSX Edge appliances in the cluster are restarted when an OS failure occurs and heartbeats are not received from VMware Tools instead of waiting additionally for the I/O check to complete. I/O monitoring is deactivated for the workload virtual machines too.

You must manually enable I/O monitoring by configuring the das.iostatsinterval advanced setting.