vSphere supports vMotion of a VM hosting a node of a WSFC.

Pre-requisites for vMotion support:

  • vMotion is supported only for a cluster of virtual machines across physical hosts (CAB).
  • Do not migrate more than 8 WSFC virtual machines at the same time, for VMs with cluster shared resources. This may cause failover of cluster roles to other VMs.
  • The vMotion network must be a 10Gbps Ethernet link. 1Gbps Ethernet link for vMotion of WSFC virtual machines is not supported.
  • vMotion is supported for Windows Server 2012 and above releases. Windows Server 2008 SP2 and earlier are not supported.
  • The WSFC cluster heartbeat time-out must be modified at least to the values listed below:
    • (get-cluster -name <cluster-name>).SameSubnetThreshold = 10
    • (get-cluster -name <cluster-name>).CrossSubnetThreshold = 20
    • (get-cluster -name <cluster-name>).RouteHistoryLength = 40
  • The virtual hardware version for the WSFC virtual machine must be version 11 and later.

Modifying the WSFC heartbeat time-out:

WSFC nodes use the network to send heartbeat packets to other nodes of the cluster. If a node does not receive a response from another node for a specified period of time, the cluster removes the node from cluster membership. By default, a guest cluster node is considered down if it does not respond within 5 seconds in Windows 2012, 2012 R2. Other nodes that are members of the cluster will take over any clustered roles that were running on the removed node.

An WSFC virtual machine can stall for a few seconds during vMotion. If the stall time exceeds the heartbeat time-out interval, then the guest cluster considers the node down and this can lead to unnecessary failover. To allow leeway and make the guest cluster more tolerant, the heartbeat time-out interval needs to be modified to allow at least 10 missed heartbeats. The property that controls the number of allowed heart misses is SameSubnetThreshold. You will need to modify this from its default value to at least 10. From any one of the participating WSFC cluster nodes run the following command:

(get-cluster -name <cluster-name>).SameSubnetThreshold = 10

You can also adjust other properties to control the workload tolerance for failover. Adjusting delay controls how often heartbeats are sent between the clustered node. The default setting is 1 second and the maximum setting is 2 seconds. Set the SameSubnetDelay value to 1. Threshold controls how many consecutive heartbeats can be missed before the node considers its partner to be unavailable and triggers the failover process. The default threshold is 5 heartbeats and the maximum is 120 heartbeats. It is the combination of delay and threshold that determines the total elapsed time during which clustered Windows nodes can lose communication before triggering a failover. When the clustered nodes are in different subnets, they are called CrossSubnetDelay and CrossSubnetThreshold. Set the CrossSubnetDelay value to 2 and the CrossSubnetThreshold value to 20.
Note: Recommended values for WSFC hearbeats settings are now defaults in Windows Server 2016 and above.