To ensure optimal vSphere HA cluster performance, you should follow certain best practices. This topic highlights some of the key best practices for a vSphere HA cluster. You can also refer to the vSphere High Availability Deployment Best Practices publication for further discussion.

Setting Alarms to Monitor Cluster Changes

When vSphere HA or Fault Tolerance take action to maintain availability, for example, a virtual machine failover, you can be notified about such changes. Configure alarms in vCenter Server to be triggered when these actions occur, and have alerts, such as emails, sent to a specified set of administrators.

Several default vSphere HA alarms are available.

  • Insufficient failover resources (a cluster alarm)
  • Cannot find primary (a cluster alarm)
  • Failover in progress (a cluster alarm)
  • Host HA status (a host alarm)
  • VM monitoring error (a virtual machine alarm)
  • VM monitoring action (a virtual machine alarm)
  • Failover failed (a virtual machine alarm)
Note: The default alarms include the feature name, vSphere HA.

Monitoring Cluster Validity

A valid cluster is one in which the admission control policy has not been violated.

A cluster enabled for vSphere HA becomes invalid when the number of virtual machines powered on exceeds the failover requirements, that is, the current failover capacity is smaller than configured failover capacity. If admission control is disabled, clusters do not become invalid.

In the vSphere Web Client, select vSphere HA from the cluster's Monitor tab and then select Configuration Issues. A list of current vSphere HA issues appears.

DRS behavior is not affected if a cluster is red because of a vSphere HA issue.

vSphere HA and Storage vMotion Interoperability in a Mixed Cluster

In clusters where ESXi 5.x hosts and ESX/ESXi 4.1 or prior hosts are present and where Storage vMotion is used extensively or Storage DRS is enabled, do not deploy vSphere HA. vSphere HA might respond to a host failure by restarting a virtual machine on a host with an ESXi version different from the one on which the virtual machine was running before the failure. A problem can occur if, at the time of failure, the virtual machine was involved in a Storage vMotion action on an ESXi 5.x host, and vSphere HA restarts the virtual machine on a host with a version prior to ESXi 5.0. While the virtual machine might power on, any subsequent attempts at snapshot operations could corrupt the vdisk state and leave the virtual machine unusable.

Admission Control Best Practices

The following recommendations are best practices for vSphere HA admission control.

  • Select the Percentage of Cluster Resources Reserved admission control policy. This policy offers the most flexibility in terms of host and virtual machine sizing. When configuring this policy, choose a percentage for CPU and memory that reflects the number of host failures you want to support. For example, if you want vSphere HA to set aside resources for two host failures and have ten hosts of equal capacity in the cluster, then specify 20% (2/10).

  • Ensure that you size all cluster hosts equally. For the Host Failures Cluster Tolerates policy, an unbalanced cluster results in excess capacity being reserved to handle failures because vSphere HA reserves capacity for the largest hosts. For the Percentage of Cluster Resources Policy, an unbalanced cluster requires that you specify larger percentages than would otherwise be necessary to reserve enough capacity for the anticipated number of host failures.

  • If you plan to use the Host Failures Cluster Tolerates policy, try to keep virtual machine sizing requirements similar across all configured virtual machines. This policy uses slot sizes to calculate the amount of capacity needed to reserve for each virtual machine. The slot size is based on the largest reserved memory and CPU needed for any virtual machine. When you mix virtual machines of different CPU and memory requirements, the slot size calculation defaults to the largest possible, which limits consolidation.

  • If you plan to use the Specify Failover Hosts policy, decide how many host failures to support and then specify this number of hosts as failover hosts. If the cluster is unbalanced, the designated failover hosts should be at least the same size as the non-failover hosts in your cluster. This ensures that there is adequate capacity in case of failure.

Using Auto Deploy with vSphere HA

You can use vSphere HA and Auto Deploy together to improve the availability of your virtual machines. Auto Deploy provisions hosts when they power up and you can also configure it to install the vSphere HA agent on such hosts during the boot process. See the Auto Deploy documentation included in vSphere Installation and Setup for details.

Upgrading Hosts in a Cluster Using Virtual SAN

If you are upgrading the ESXi hosts in your vSphere HA cluster to version 5.5 or higher, and you also plan to use Virtual SAN, follow this process.
  1. Upgrade all of the hosts.
  2. Disable vSphere HA.
  3. Enable Virtual SAN.
  4. Re-enable vSphere HA.