DRS Awareness of vSAN Stretched Cluster

DRS Awareness of vSAN Stretched Cluster is available on stretched clusters with DRS enabled using vSphere 7.0 U2. A vSAN stretched cluster has read locality, where the VM reads data from a local site. Fetching reads from a remote site can affect VM performance. In releases prior to vSphere 7.0 U2, DRS had no awareness of read locality for a vSAN stretched clusters and might inadvertently place a VM on a remote site with no read locality. With DRS Awareness of vSAN Stretched Cluster, DRS is now fully aware of VM read locality and will place the VM on a site that can fully satisfy the read locality. This is automatic, there are no configurable options. DRS Awareness of vSAN Stretched Cluster works with existing affinity rules. It works with vSphere 7.0 U2 and VMware Cloud on AWS.

vSAN Stretched Cluster with vSphere HA and vSphere DRS provide resiliency by having two copies of data spread across two fault domains and a witness node in a third fault domain in case of failures. The two active fault domains provide replication of data so that both fault domains have a current copy of the data.

vSAN Stretched Cluster provides an automated method of moving workloads within the two fault domains. In case of full site failures, VMs are restarted on the secondary site by vSphere HA. This ensures that there is no downtime for critical production workloads. Once the primary site is back online, DRS immediately rebalances the VMs back to the primary site with soft affinity hosts. This process causes the VM to read and write from the secondary site while the VM data components are still rebuilding and might reduce VM performance.

In releases prior to vSphere 7.0 U2, we recommend that you change DRS from fully automated to partially automated mode, to avoid VMs migrating while resynchronization is in progress to the primary site. Set DRS back to fully automated only after the resynchronization is complete.

With vSphere 7.0 U2, DRS Awareness of vSAN Stretched Cluster introduces a fully automated read locality solution for recovering from failures on a vSAN stretched cluster. The read locality information indicates the hosts the VM has full access to, and DRS uses this information when placing a VM on a host on vSAN Stretched Clusters. DRS prevents VMs from failing back to the primary site when vSAN resynchronization is still in progress during the site recovery phase. DRS automatically migrates a VM back to the primary affined site when its data components have achieved full read locality. This allows you to operate DRS in fully automatic mode in case of full site failures.

In case of partial site failures, if a VM loses read locality due to loss of data components greater than or equal to its Failures to Tolerate vSphere DRS will identify the VMs that consume a very high read bandwidth and try to rebalance them to the secondary site. This ensures that VMs with read-heavy workloads do not decrease during partial site failures. Once the primary site is back online and the data components have completed resynchronization, the VM is moved back to the site it is affined to.