To ensure optimal Fault Tolerance results, you should follow certain best practices.
In addition to the following information, see the white paper VMware Fault Tolerance Recommendations and Considerations at http://www.vmware.com/resources/techresources/10040.
Consider the following best practices when configuring your hosts.
- Hosts running the Primary and Secondary VMs should operate at approximately the same processor frequencies, otherwise the Secondary VM might be restarted more frequently. Platform power management features that do not adjust based on workload (for example, power capping and enforced low frequency modes to save power) can cause processor frequencies to vary greatly. If Secondary VMs are being restarted on a regular basis, disable all power management modes on the hosts running fault tolerant virtual machines or ensure that all hosts are running in the same power management modes.
- Apply the same instruction set extension configuration (enabled or disabled) to all hosts. The process for enabling or disabling instruction sets varies among BIOSes. See the documentation for your hosts' BIOSes about how to configure instruction sets.
vSphere Fault Tolerance can function in clusters with nonuniform hosts, but it works best in clusters with compatible nodes. When constructing your cluster, all hosts should have the following configuration:
- Processors from the same compatible processor group.
- Common access to datastores used by the virtual machines.
- The same virtual machine network configuration.
- The same ESXi version.
- The same Fault Tolerance version number (or host build number for hosts prior to ESX/ESXi 4.1).
- The same BIOS settings (power management and hyperthreading) for all hosts.
Run Check Compliance to identify incompatibilities and to correct them.
To increase the bandwidth available for the logging traffic between Primary and Secondary VMs use a 10Gbit NIC, and enable the use of jumbo frames.
Store ISOs on Shared Storage for Continuous Access
Store ISOs that are accessed by virtual machines with Fault Tolerance enabled on shared storage that is accessible to both instances of the fault tolerant virtual machine. If you use this configuration, the CD-ROM in the virtual machine continues operating normally, even when a failover occurs.
For virtual machines with Fault Tolerance enabled, you might use ISO images that are accessible only to the Primary VM. In such a case, the Primary VM can access the ISO, but if a failover occurs, the CD-ROM reports errors as if there is no media. This situation might be acceptable if the CD-ROM is being used for a temporary, noncritical operation such as an installation.
Avoid Network Partitions
A network partition occurs when a vSphere HA cluster has a management network failure that isolates some of the hosts from vCenter Server and from one another. See Network Partitions. When a partition occurs, Fault Tolerance protection might be degraded.
In a partitioned vSphere HA cluster using Fault Tolerance, the Primary VM (or its Secondary VM) could end up in a partition managed by a primary host that is not responsible for the virtual machine. When a failover is needed, a Secondary VM is restarted only if the Primary VM was in a partition managed by the primary host responsible for it.
To ensure that your management network is less likely to have a failure that leads to a network partition, follow the recommendations in Best Practices for Networking.