When necessary, you can shut down the entire vSAN cluster.

If you plan to shut down the vSAN cluster, you do not need to manually disable vSAN on the cluster.

Procedure

  1. Shut down the vSAN cluster.
    1. Verify the vSAN health to confirm that the cluster is healthy.
    2. Power off all virtual machines (VMs) running in the vSAN cluster, if vCenter Server is not running on the cluster. If vCenter Server is hosted in the vSAN cluster, do not power off the vCenter Server VM.
    3. Click the Configure tab and ensure that HA is turned off so that the cluster does not register host shutdowns as failures.
    4. Verify that all resynchronization tasks are complete.
      Click the Monitor tab and select vSAN > Resyncing Objects.
    5. For vSphere 7.0U1 and later, enable the vCLS retreat mode. For more information, see the VMware knowledge base article at https://kb.vmware.com/s/article/80472.
      Note: If vCenter Server is hosted in the vSAN cluster, ensure that all the vCLS agent VMs are cleaned up before moving to the next step.
    6. If vCenter Server is hosted in the vSAN cluster, power off the vCenter Server VM. The vSphere Client becomes unavailable.
      Make a note of the host that runs the vCenter Server VM to identify the host that you must restart the vCenter Server VM, during the restart process.
    7. Disable cluster member updates from vCenter Server by running the following command on the ESXi hosts in the cluster.
      esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListUpdates
      Perform this step on all the hosts.
    8. Log in to any host in the cluster other than the witness host.
    9. Run the following command only on that host. If you run the command on multiple hosts concurrently, it may cause a race condition causing unexpected results.
      python /usr/lib/vmware/vsan/bin/reboot_helper.py prepare

      The command returns and prints the following:

      Cluster preparation is done.
      Note:
      • The cluster is fully partitioned after the successful completion of the command.
      • If you encounter an error, resolve the issue based on the error message and try enabling vCLS retreat mode again.
      • If there are unhealthy or disconnected hosts in the cluster, remove the hosts and retry running the command.
    10. Place all the hosts into the maintenance mode with 'No Action' mode. If the vCenter Server is powered off, use the following command to place the ESXi hosts into the maintenance mode with 'No Action' mode.
      esxcli system maintenanceMode set -e true -m noAction
      Perform this step on all the hosts.
    11. After all hosts have successfully entered the maintenance mode, perform any necessary maintenance tasks and power off the hosts.
  2. Restart the vSAN cluster.
    1. Power on the ESXi hosts.
      Power on the physical box where ESXi is installed. The ESXi host starts, locates the VMs, and functions normally.
      If any hosts fail to come up, you must manually recover the hosts or move the bad hosts out of the vSAN cluster.
    2. When all the hosts are back after powering on, exit all hosts from the maintenance mode. If the vCenter Server is powered off, use the following command on the ESXi hosts to exit the maintenance mode.
      esxcli system maintenanceMode set -e false
      Perform this step on all the hosts.
    3. Log in to one of the hosts in the cluster other than the witness host.
    4. Run the following command only on that host. If you run the command on multiple hosts concurrently, it may cause a race condition causing unexpected results.
      python /usr/lib/vmware/vsan/bin/reboot_helper.py recover

      The command returns and prints the following:

      Cluster reboot/power-on is completed successfully!
    5. Verify that all the hosts are available in the cluster by running the following command on each host.
      esxcli vsan cluster get
    6. Enable cluster member updates from vCenter Server by running the following command on the ESXi hosts in the cluster.
      esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListUpdates
      Perform this step on all the hosts.
    7. Restart the vCenter Server VM if it is powered off. Wait for the vCenter Server VM to be powered up and running. To disable the vCLS retreat mode, see the VMware knowledge base article at https://kb.vmware.com/s/article/80472.
    8. Verify again that all the hosts are participating in the vSAN cluster by running the following command on each host.
      esxcli vsan cluster get
    9. Restart the remaining VMs through vCenter Server.
    10. Check the vSAN health service and resolve any outstanding issues.
    11. (Optional) If the vSAN cluster has vSphere Availability enabled, you must manually restart vSphere Availability to avoid the following error: Cannot find vSphere HA master agent.
      To manually restart vSphere Availability, select the vSAN cluster and navigate to:
      1. Configure > Services > vSphere Availability > EDIT > Disable vSphere HA
      2. Configure > Services > vSphere Availability > EDIT > Enable vSphere HA
  3. If there are unhealthy or disconnected hosts in the cluster, recover or remove the hosts from the vSAN cluster. Retry the above commands only after the vSAN health shows all available hosts in the green state.
    If the vSAN environment has a three-node cluster, the command reboot_helper.py recover will not work in a one host failure situation. As an administrator, do the following:
    1. Temporarily remove the failure host information from the unicast agent list.
    2. Add the host after running the following command.
      reboot_helper.py recover
    Following are the commands to remove and add the host to a vSAN cluster:
    #esxcli vsan cluster unicastagent remove -a <IP Address> -t node -u <NodeUuid>
    #esxcli vsan cluster unicastagent add -t node -u <NodeUuid> -U true -a <IP Address> -p 12321

    For more information, see the VMware knowledge base article at https://kb.vmware.com/s/article/70650.