Whether you manage a vSAN cluster with baselines or with a single image, remediating the hosts that are part of a vSAN cluster has its specifics.

When you remediate hosts that are part of a vSAN cluster, you must be aware of the following behavior:
  • vSphere Lifecycle Manager puts only one host at a time in maintenance mode.
  • vSphere Lifecycle Manager remediates hosts that are part of a vSAN cluster sequentially.
  • Because vSphere Lifecycle Manager handles the remediation of the hosts sequentially, the host remediation process might take an extensive amount of time to finish.
  • vSphere Lifecycle Manager remediates vSAN clusters with configured fault domains by upgrading all hosts from one fault domain first and then upgrading the hosts in the next fault domain.
  • For a vSAN stretched cluster, vSphere Lifecycle Manager first remediates the hosts from the preferred site and then proceeds with remediating the hosts in the secondary site.

Host Maintenance Mode and vSAN Clusters

You can remediate a host that is in a vSAN cluster in two ways, depending on how you want to handle the virtual machines on the host:

  • You can put the host in maintenance mode manually and remediate the host by using vSphere Lifecycle Manager.
  • You can have the host enter maintenance mode during the vSphere Lifecycle Manager remediation process.

In the vSphere Client, when you put a host from a vSAN cluster into maintenance mode, you can choose between multiple options: Ensure accessibility, Full data evacuation, and No data evacuation. The Ensure accessibility option is the default option, and means that when you put a host in maintenance mode, vSAN ensures that all accessible virtual machines on the host remain accessible. To learn more about each of the options, see the "Place a Member of a vSAN Cluster in Maintenance Mode" topic in the vSphere Storage documentation.

During remediation, vSphere Lifecycle Manager, puts the hosts from the vSAN cluster in maintenance mode and handles the virtual machines on the host in the manner of the default Ensure accessibility option.

If a host is a part of a vSAN cluster, and any virtual machine on the host uses a VM storage policy with the setting for "Number of failures to tolerate=0", the host might experience unusual delays when it enters maintenance mode. The delay occurs because vSAN has to migrate the virtual machine data from one disk on the vSAN datastore cluster to another. Delays might take up to hours. You can work around this by setting the "Number of failures to tolerate=1" for the VM storage policy, which results in creating two copies of the virtual machine files on the vSAN datastore.

vSAN Health Check

vSphere Lifecycle Manager performs a remediation pre-check of vSAN clusters to ensure successful remediation. The vSAN health check is part of the remediation pre-check.

The vSAN health check gives you information about the cluster state and whether you must take extra actions to ensure successful remediation. Even if you do not take the recommended actions, you can still remediate the vSAN cluster or a host from the cluster. vSphere Lifecycle Manager successfully puts the host in maintenance mode and applies software updates on the host successfully. However, the host might fail to exit maintenance mode, and the remediation process might fail. As a result, the host from the vSAN cluster is upgraded, but you must take manual steps to take the host out of maintenance mode.

Using vSphere Lifecycle Manager Images to Remediate vSAN Stretched Clusters

When you manage а vSAN stretched cluster or a two-node ROBO cluster with vSphere Lifecycle Manager, you can manage the hosts in the cluster with a single image that is different from the image used to upgrade the dedicated witness host. With vSphere 8.0 Update 2, you upgrade the vSAN witness host in the same way as you upgrade a standalone host.

What Is a Stretched Cluster

A stretched cluster is a deployment model in which two or more hosts are part of the same logical cluster but are located in separate geographical locations. Every vSAN stretched cluster or two-node ROBO cluster has a witness host, which is a standalone host that is not a member of the respective cluster but is associated with it. The witness host of a vSAN cluster is managed by the same vCenter Server where the respective stretched or ROBO cluster resides.

vSphere Lifecycle Manager and the vSAN Witness Hosts

The vSAN witness host is a physical or virtual ESXi host that contains the witness components of virtual machine objects stored in the vSAN cluster. The witness host does not support workloads and is not a data node. A single stretched or two-node ROBO cluster can have only one witness host.

You can use vSphere Lifecycle Manager images to manage a vSAN stretched cluster and its witness host. Starting with vSphere 8.0 Update 2, you define separate images for the vSAN cluster and for the witness host. See Upgrading vSAN Stretched Clusters by Using a vSphere Lifecycle Manager Image. The following requirements exist:
  • vCenter Server must be version 8.0 Update 2 and later.
  • The witness host must be ESXi version 7.0 Update 2 and later.
  • The witness host can be a virtual server or a physical server.
  • The witness host can be a dedicated witness host or a shared witness host.
  • The witness host must be upgraded before the hosts in the associated vSAN stretched or two-node cluster.
  • The witness host and the associated vSAN clusters must not be upgraded in parallel.
  • You cannot run virtual machines on a witness host. If vSphere Lifecycle Manager detects any stale virtual machines running on a witness host, during the remediation of the standalone host vSphere Lifecycle Manager sets the VM power state remediation setting to Do not change power state. For more information, see Configure vSphere Lifecycle Manager Remediation Settings for Clusters or Standalone Hosts that You Manage with a Single Image.
You start using vSphere Lifecycle Manager images to manage the witness host by performing any of the following tasks:
  • You switch from using vSphere Lifecycle Manager baselines to using vSphere Lifecycle Manager images for an existing vSAN stretched or two-node ROBO cluster and for the dedicated standalone host.
    Note: The transition to using images is blocked if the witness host is of ESXi version earlier than 7.0 Update 2. In such cases, you can use baselines to upgrade the witness host to version 7.0 Update 2 or later, and then you can start managing the witness host with a single vSphere Lifecycle Manager image.
  • You convert an existing vSAN cluster that uses a single image into a stretched cluster with a virtual witness host.
  • You upgrade to version 8.0 Update 2 and later for vCenter Server and version 7.0 Update 2 or later for the witness host.
You stop using vSphere Lifecycle Manager images to manage the witness host in the following cases:
  • You convert an existing vSAN stretched cluster that uses images into a regular vSAN cluster.
  • You deactivate vSAN for an existing vSAN stretched cluster that you manage with a single image.
Important: With vSphere 8.0, you can use vSphere Lifecycle Manager images to manage standalone hosts in your vCenter Server inventory. Starting with vSphere 8.0 Update 2, you can apply a separate vSphere Lifecycle Manager image to the witness host of a vSAN cluster. You can start managing the witness host with a vSphere Lifecycle Manager image at the time of adding the host to the inventory or you can transition an existing standalone host that uses a single image to a witness host.

Upgrading vSAN Stretched Clusters by Using a vSphere Lifecycle Manager Image

You upgrade a stretched vSAN cluster along with the associated witness host by using a single vSphere Lifecycle Manager image, if the following requirements exist:

  • vCenter Server must be version 7.0 Update 3 and later.

  • The witness host must be ESXi version 7.0 Update 2 and later.

  • The witness host is a virtual server and a dedicated witness host.

Starting with vSphere 8.0 Update 2, once you upgrade the vCenter Server instance to version 8.0 Update 2, the virtual dedicated witness host is no longer managed with the vSphere Lifecycle Manager image that you define for the vSAN cluster. You can use a full vSphere Lifecycle Manager image to upgrade a witness host in the same way you upgrade a standalone host. The desired image you apply on a witness host can contain a base ESXi image, and any user components, solution components, or OEM add-ons.

With vSphere 8.0 Update 2, for stretched vSAN clusters, you must first upgrade the witness host with the separate vSphere Lifecycle Manager image you configured and then proceeds to remediating the hosts in the preferred site and the secondary site. If all hosts in the preferred site are in a compliant state, then vSphere Lifecycle Manager skips the preferred site and starts remediating the hosts from the secondary site. If any host in the entire cluster is in an incompatible state, remediation stops. For more information about fault domain-aware remediation and the order in which vSphere Lifecycle Manager remediates the hosts in a vSAN cluster, see Using vSphere Lifecycle Manager Images to Remediate vSAN Clusters with Configured Fault Domains.

To remediate the witness host against a single vSphere Lifecycle Manager image, the following requirements exist:
  • vCenter Server must be version 8.0 Update 2 and later.
  • The witness host must be ESXi version 7.0 Update 2 and later.
  • The witness host can be a virtual server and a physical server.
  • The witness host can be a dedicated witness host and a shared witness host.

Using vSphere Lifecycle Manager Images to Remediate vSAN Clusters with Configured Fault Domains

In vSAN clusters with configured fault domains, vSphere Lifecycle Manager remediates the hosts in an order that vSphere Lifecycle Manager calculates by factoring in the defined fault domains.

What Is a Fault Domain?

A fault domain consists of one or more vSAN hosts grouped according to their physical location in the data center. When configured, fault domains enable vSAN to tolerate failures of entire physical racks as well as failures of a single host, capacity device, network link, or a network switch dedicated to a fault domain. You can configure fault domains for non-stretched and stretched vSAN clusters. For more information about configuring fault domains, see the Administering VMware vSAN documentation.

Upgrading vSAN Clusters Configured with Multiple Fault Domains

vSphere Lifecycle Manager remediates vSAN clusters with configured fault domains by remediating all hosts in one fault domain at a time. To define the order of fault domains, vSphere Lifecycle Manager calculates and assigns priority to each fault domain for the vSAN cluster.

Remediation starts with the fault domain that has the highest priority. The priority of a fault domain is determined by the number of non-compliant hosts in that fault domain. The fewer non-compliant hosts in a fault domain, the higher the priority of that fault domain. However, if multiple fault domains have the same priority, vSphere Lifecycle Manager selects the first fault domain from the list of fault domains.

After vSphere Lifecycle Manager selects a fault domain, vSphere Lifecycle Manager uses DRS recommendations to select the optimal host within that domain to be remediated.

For fault domain-aware remediation of vSAN clusters, the following requirements exist:

  • vCenter Server must be version 7.0 Update 1 and later
  • The ESXi hosts must be version 7.0 and later

Upgrading vSAN Clusters Enabled with NSX or vSphere IaaS control plane

You can remediate a vSAN cluster against a vSphere Lifecycle Manager image that contains the same ESXi version as the ESXi version currently on the hosts, but the latest versions of NSX and vSphere IaaS control plane components. In that case, vSphere Lifecycle Manager upgrades only those components, without upgrading the ESXi version. Even in those cases, vSphere Lifecycle Manager still recognizes the configured fault domains for the vSAN cluster and performs the solution upgrade in accordance with the fault domain configuration.

For fault domain-aware remediation of vSAN clusters with enabled NSX or vSphere IaaS control plane, the following requirements exist:
  • vCenter Server must be version 7.0 Update 2
  • The ESXi hosts version 7.0 and later