When upgrading the vSphere hosts in your DRS cluster using VMware Update Manager (VUM), it checks with DRS first, to find out which host(s) can be upgraded. DRS in turn runs its algorithm and recommends one or more hosts that can be put into maintenance mode.
Starting with vSphere 6.7, DRS uses the new initial placement algorithm to come up with the recommended list of hosts to be placed in maintenance mode. Further, when evacuating the hosts, DRS uses the new initial placement algorithm to find new destination hosts for outgoing VMs.
In vSphere 6.7, DRS has been enhanced to be more efficient in evacuating VMs from a host when it is being put into maintenance mode. In earlier versions, during host evacuation, DRS used to issue vMotion at once for all the powered-on VMs on the host. DRS will now evacuate (vMotion out) powered-on VMs in batches of 8 at a time. The next batch of vMotions will only be issued after the first batch completes. This new model also uses the new initial placement algorithm.
The batching of vMotions makes the entire workflow more controlled and predictable, which makes it easier to estimate the total time to complete the end-to-end workflow. Together with the use of the new initial placement algorithm, the new model results in a very similar distribution of VMs post-evacuation, compared to the old model.
In vSphere 6.7, when there is a problem while putting a host into maintenance mode, an appropriate fault is generated, which gives detailed information about the problem. The following scenarios showcase the new error handling features.
We have a VM (VirtualMachine1) that is placed on host 10.156.227.182. The VM has a virtual machine–host affinity rule with the same host. When we try to put the host (10.156.227.182) into maintenance mode, the task halts (Figure 1) and a DRS fault is generated saying the host “Could not enter maintenance mode” (Figure 2).
Figure 1: Task - enter host into maintenance mode
Figure 2: DRS fault on host 10.156.2277.182
To understand more about how DRS faults work, please refer the VMware documentation about Faults [2].
When we look at the fault details, it is clear that placing the VM on a different host (than its current one) would violate a virtual machine–host affinity rule, as shown in Figure 3.
Figure 3: DRS Fault describing the virtual machine–host rule violation
We can then cancel the maintenance mode task and fix the problems causing the DRS faults.
We have a VM (VirtualMachine1) that is placed on host w3-vcscale1-002. The VM has a virtual CD/DVD drive, which is connected and has an ISO image mounted. This ISO image file is placed on a local data store on the host. When we try to put the host (w3-vcscale1-002) into maintenance mode, the task halts and a DRS fault is generated (Figure 4).
Figure 4: DRS fault on host w3-vcscale1-002
When we look at the fault details, it is clear that the local file backing of VirtualMachine1’s CD/DVD drive would not be accessible if it were evacuated to a different host (Figure 5 and Figure 6).
Figure 5: DRS fault details
Figure 6: DRS fault details showing the recommendation that was prevented
We can then cancel the maintenance mode task and fix the problems causing the DRS faults.
This scenario is very similar to scenario 2, except the VM VirtualMachine1 now has a virtual hard disk drive on the local data store of host w3-vcscale1-002. When we try to put the host (w3-vcscale1-002) into maintenance mode, the task halts and a DRS fault is generated (Figure 7).
Figure 7: DRS Fault on host w3-vcscale1-002
When we look at the fault details, it is clear that the VM’s virtual disk would not be accessible if it were evacuated to a different host (Figure 8 and Figure 9).
Figure 8: DRS fault details
Figure 9: DRS fault details showing the recommendation that was prevented
In summary, in vSphere 6.7 we have enhanced the host maintenance and VM evacuation workflows to be faster and more efficient. We have also improved the user experience for troubleshooting when we run into issues during these workflows.