This section provides information about high availability and fault tolerance.

High Availability (HA)

vSphere HA provides high availability for virtual machines by pooling the virtual machines and the hosts (on which the VMs reside on) into a cluster. Hosts in the cluster are monitored and in the event of a failure, the virtual machines on a failed host are restarted on alternate hosts.

When a vSphere HA cluster is created, a single host is automatically elected as the primary host. The primary host communicates with vCenter Server and monitors the state of all protected virtual machines and of the secondary hosts. Different types of host failures are possible, and the primary host must detect and appropriately deal with the failure. The primary host must distinguish between a failed host and one that is in a network partition or that has become network isolated. The primary host uses network and datastore heart beating to determine the type of failure.

If a host fails and its virtual machines must be restarted, the order in which the VMs are restarted can be controlled. The response for a host losing management network connectivity with others can also be configured by using the host isolation response setting.

VM and Application Monitoring:

VM Monitoring restarts individual virtual machines if they leverage VMware Tools and heartbeats are not received within a set period of time. Similarly, Application Monitoring can restart a virtual machine if heartbeats specific to an embedded application are not received. Pending compatibility, these features can be enabled and the sensitivity with which vSphere HA monitors non-responsiveness can be configured.

VM Component Protection:

If VM Component Protection (VMCP) is activated, vSphere HA can detect datastore accessibility failures and provide automated recovery for affected virtual machines.

vSphere HA Security:

vSphere HA is enhanced by several security features. It uses TCP and UDP port 8182 for agent-to-agent communication. The firewall ports open and close automatically to ensure they are open only when needed. It also stores configuration information on local storage or RAM disk if there is no local datastore. These files are protected using file system permissions and they are accessible only to the root user. Hosts without local storage are only supported if they are managed by Auto Deploy.

vSphere HA logs onto the vSphere HA agents using a user account, vpxuser, created by vCenter Server. This account is the same account used by vCenter Server to manage the host. vCenter Server creates a random password for this account and changes the password periodically. The time period is set by the vCenter Server VirtualCenter.VimPasswordExpirationInDays setting. Only users with administrative privileges on the root folder of the host can log in to the agent. All communication between vCenter Server and the vSphere HA agent is done over SSL. And each host must have a verified SSL certificate.

Fault Tolerance (FT)

vSphere FT can be used for most mission critical virtual machines. FT provides continuous availability for such a virtual machine by creating and maintaining another VM that is identical and continuously available to replace it in the event of a failover situation.

The protected virtual machine is called the primary VM. The duplicate virtual machine, the secondary VM, is created and runs on another host. The primary VM is continuously replicated to the secondary VM so that the secondary VM can take over at any point, thereby providing fault tolerant protection.

The primary and secondary VMs continuously monitor the status of one another to ensure that fault tolerance is maintained. A transparent failover occurs if the host running the primary VM fails, or encounters an uncorrectable hardware error in the memory of the primary VM, in which case the secondary VM is immediately activated to replace the primary VM. A new secondary VM is started and fault tolerance redundancy is reestablished automatically. If the host running the secondary VM fails, it is also immediately replaced. In either case, users experience no interruption in-service and no loss of data.

A fault tolerant virtual machine and its secondary copy are not allowed to run on the same host. This restriction ensures that a host failure cannot result in the loss of both VMs.