Whether you manage a vSAN cluster with baselines or with a single image, remediating the hosts that are part of a vSAN cluster has its specifics.

When you remediate hosts that are part of a vSAN cluster, you must be aware of the following behavior:
  • By design, in a vSAN cluster, only one host at a time can be in maintenance mode.
  • vSphere Lifecycle Manager remediates hosts that are part of a vSAN cluster sequentially.
  • Because vSphere Lifecycle Manager handles the remediation of the hosts sequentially, the host remediation process might take an extensive amount of time to finish.
  • vSphere Lifecycle Manager remediates vSAN clusters with configured fault domains by upgrading all hosts from one fault domain first and then upgrading the hosts in the next fault domain.
  • For a vSAN stretched cluster, vSphere Lifecycle Manager first remediates the hosts from the preferred site and then proceeds with remediating the hosts in the secondary site.

Host Maintenance Mode and vSAN Clusters

You can remediate a host that is in a vSAN cluster in two ways, depending on how you want to handle the virtual machines on the host:

  • You can put the host in maintenance mode manually and remediate the host by using vSphere Lifecycle Manager.
  • You can have the host enter maintenance mode during the vSphere Lifecycle Manager remediation process.

In the vSphere Client, when you put a host from a vSAN cluster into maintenance mode, you can choose between multiple options: Ensure accessibility, Full data evacuation, and No data evacuation. The Ensure accessibility option is the default option, and means that when you put a host in maintenance mode, vSAN ensures that all accessible virtual machines on the host remain accessible. To learn more about each of the options, see the "Place a Member of a vSAN Cluster in Maintenance Mode" topic in the vSphere Storage documentation.

During remediation, vSphere Lifecycle Manager, puts the hosts from the vSAN cluster in maintenance mode and handles the virtual machines on the host in the manner of the default Ensure accessibility option.

If a host is a part of a vSAN cluster, and any virtual machine on the host uses a VM storage policy with the setting for "Number of failures to tolerate=0", the host might experience unusual delays when it enters maintenance mode. The delay occurs because vSAN has to migrate the virtual machine data from one disk on the vSAN datastore cluster to another. Delays might take up to hours. You can work around this by setting the "Number of failures to tolerate=1" for the VM storage policy, which results in creating two copies of the virtual machine files on the vSAN datastore.

vSAN Health Check

vSphere Lifecycle Manager performs a remediation pre-check of vSAN clusters to ensure successful remediation. The vSAN health check is part of the remediation pre-check.

The vSAN health check gives you information about the cluster state and whether you must take extra actions to ensure successful remediation. Even if you do not take the recommended actions, you can still remediate the vSAN cluster or a host from the cluster. vSphere Lifecycle Manager successfully puts the host in maintenance mode and applies software updates on the host successfully. However, the host might fail to exit maintenance mode, and the remediation process might fail. As a result, the host from the vSAN cluster is upgraded, but you must take manual steps to take the host out of maintenance mode.