Cluster Settings and Host Remediation

When you remediate ESXi hosts that are in a cluster, certain cluster settings might cause remediation failure. You must configure the cluster settings in such a way as to ensure successful remediation.

When you update the ESXi hosts in a cluster that has vSphere Distributed Resource Scheduler (DRS), vSphere High Availability (HA), and vSphere Fault Tolerance (FT) activated, you can temporarily deactivate vSphere Distributed Power Management (DPM), HA admission control, and FT for the entire cluster. When the update finishes, vSphere Lifecycle Manager restarts these features.

DRS

Updates might require a host to enter maintenance mode during remediation. Virtual machines cannot run when a host is in maintenance mode. To ensure availability, you can activate DRS for the cluster and you can configure it for vSphere vMotion. In this case, before the host is put in maintenance mode, vCenter Server migrates the virtual machines to another ESXi host within the cluster.

To help ensure vSphere vMotion compatibility between the hosts in the cluster, you can activate Enhanced vMotion Compatibility (EVC). EVC ensures that all hosts in the cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. EVC prevents migration failures due to incompatible CPUs. You can use EVC only in a cluster where the host CPUs meet the compatibility requirements. For more information about EVC and the requirements that the hosts in an EVC cluster must meet, see the vCenter Server and Host Management documentation.

DPM

If a host has no running virtual machines, DPM might put the host in standby mode, which might interrupt a vSphere Lifecycle Manager operation. So, to make sure that all vSphere Lifecycle Manager operations finish successfully, you can configure vSphere Lifecycle Manager to deactivate DPM during these operations. For successful remediation, you must have vSphere Lifecycle Manager deactivate DPM. After the remediation task finishes, vSphere Lifecycle Manager restores DPM.

If DPM has already put a host in standby mode, vSphere Lifecycle Manager powers on the host before compliance checks and remediation. Additionally, for clusters that you manage with baselines, vSphere Lifecycle Manager powers on the host before staging, too. After the respective task finishes, vSphere Lifecycle Manager turns on DPM and HA admission control and lets DPM put the host into standby mode, if needed. vSphere Lifecycle Manager does not remediate powered off hosts.

If a host is put in standby mode and DPM is manually deactivated for a reason, vSphere Lifecycle Manager does not remediate or power on the host.

HA Admission Control

Within a cluster, you must deactivate HA admission control temporarily to let vSphere vMotion proceed. This action prevents downtime for the machines on the hosts that you remediate. You can configure vSphere Lifecycle Manager to deactivate HA admission control during remediation. After the remediation of the entire cluster is complete, vSphere Lifecycle Manager restores the HA admission control settings. vSphere Lifecycle Manager deactivates HA admission control before remediation, but not before compliance checks. Additionally, for clusters that you manage with baselines, vSphere Lifecycle Manager deactivates HA admission control before staging.

Disabling HA admission control before you remediate a two-node cluster that uses a single vSphere Lifecycle Manager image causes the cluster to practically lose all its high availability guarantees. The reason is that when one of the two hosts enters maintenance mode, vCenter Server cannot failover virtual machines to that host and HA failovers are never successful. For more information about HA admission control, see the vSphere Availability documentation.

Fault Tolerance

If FT is turned on for any of the virtual machines on a host within a cluster, you must temporarily turn off FT before performing any vSphere Lifecycle Manager operation on the cluster. If FT is turned on for any of the virtual machines on a host, vSphere Lifecycle Manager does not remediate that host. You must remediate all hosts in a cluster with the same updates, so that FT can be reactivated after remediation. A primary virtual machine and a secondary virtual machine cannot reside on hosts of different ESXi versions and patch levels.