During cluster or host remediation, you can preserve the state of the virtual machines in the host memory and restore them from memory after the remediation finishes. Suspending virtual machines to memory and using the Quick Boot functionality significantly reduces the time for remediation, minimizes system boot time, and reduces the downtime of system and services.
During remediation with vSphere Lifecycle Manager, migrating virtual machines from the host that is under remediation to another host takes a considerable amount of time. After remediation, vSphere Lifecycle Manager migrates back the virtual machines to the remediated host. However, you can configure vSphere Lifecycle Manager to suspend virtual machines to memory instead of migrating them, powering them off, or suspending them to disk.
You can use the suspend to memory functionality only for patching operations, for example, when you remediate a cluster to apply to it a hot patch, express patch, and so on. You cannot use the suspend to memory option for upgrade operations, for example when you upgrade your ESXi hosts from version 7.0 Update 2 to 7.0 Update 3.
Suspend Virtual Machines to Memory
Suspend to memory is an option that you can use only for clusters that you manage with vSphere Lifecycle Manager images. The functionality works together with the Quick Boot setting to optimize the remediation process and minimize virtual machine downtime.
You enable vSphere Lifecycle Manager to suspend virtual machines to memory when you configure the vSphere Lifecycle Manager host remediation settings. During remediation pre-check and remediation, vSphere Lifecycle Manager verifies that the suspend to memory option is indeed applicable to the host or cluster under remediation. If for some reason suspend to memory is inapplicable, vSphere Lifecycle Manager reports an error and prevents remediation from proceeding.
During a suspend to memory operation, virtual machines remain in a suspended state for some time. So, suspending virtual machines to memory might impact the workloads running on those virtual machines. The impact is similar to the impact that the suspend to disk operation might have on virtual machines and workloads.
- vSphere ESX Agent Manager (EAM) virtual machines
vSphere Lifecycle Manager powers off the EAM virtual machines after all other virtual machines are suspended. Similarly, vSphere Lifecycle Manager powers on the EAM virtual machines before any other virtual machines are resumed from memory. None of the suspended virtual machines is resumed until the EAM virtual machines are powered on.
- vSphere Cluster Services virtual machines
vSphere Lifecycle Manager first migrates to another host the vSphere Cluster Services virtual machines, and then suspends to memory the rest of the virtual machines on the host.
- vCenter Server
- vSAN witness virtual machine
- vSphere with Tanzu
- NSX-T Data Center
- VMware HCX
- vSphere Replication
- Site Recovery Manager
- VMware vRealize products
Quick Boot
Quick Boot is a setting that you can use with clusters that you manage with vSphere Lifecycle Manager images and vSphere Lifecycle Manager baselines. Using Quick Boot optimizes the host patching and upgrade operations. Quick Boot lets vSphere Lifecycle Manager reduce the remediation time for hosts that undergo patch and upgrade operations. Patch and upgrade operations do not affect the hardware of a host. If the Quick Boot feature is activated, vSphere Lifecycle Manager skips the hardware reboot (the BIOS or UEFI firmware reboot). As a result, the time an ESXi host spends in maintenance mode is reduced and the risk of failures during remediation is minimized.
To configure vSphere Lifecycle Manager to suspend virtual machines to the host memory, you must activate Quick Boot. However, you can activate Quick Boot even if you decide not to use the Suspend to memory option.
Quick Boot is supported on a limited set of hardware platforms and drivers. Quick Boot is not supported on ESXi hosts that use TPM or passthrough devices. For more information about a host's compatibility with the Quick Boot setting, see the following KB article: https://kb.vmware.com/s/article/52477.
Requirements for Using Suspend to Memory
- The host supports the suspend to memory functionality.
- Quick Boot is activated for the cluster and the host under remediation supports Quick Boot.
- The remediation does not involve host upgrades or firmware upgrade.
- The host and the virtual machines meet certain requirements.
Host Requirements Virtual Machine Requirements - The host has enough free memory.
- The host has sufficient free low memory.
- The host has enough free memory per NUMA node to start after a reboot.
- The host has enough reservation available
- The host does not use swapped or compressed pages of virtual machines.
- The virtual machines do not have any passthrough devices.
- The virtual machines do not have latency sensitivity set to high.
- The virtual machines are not fault tolerant.
- The virtual machines are not encrypted.
- The virtual machines do not use persistent memory.
- The virtual machines do not have virtual SGX or SEV devices.
- The virtual machines do not have the suspend feature deactivated.
- The virtual machines are not frozen source virtual machines during an Instant Clone operation.
Suspend to Memory and vSphere High Availability (HA)
- If you deactivate or reconfigure vSphere HA for the cluster during remediation, vSphere HA can no longer protect the suspended virtual machines. Before you change the vSphere HA configuration, make sure that no hosts in the cluster are in maintenance mode and the suspended virtual machines are powered on.
- If you modify the
das.failoverDelayForSuspendToMemoryVmsSecs
advanced option for vSphere HA after you configure vSphere Lifecycle Manager to use the suspend to memory option, the newly specified timeout value might not apply to the virtual machines. If you need to modify the default value of thedas.failoverDelayForSuspendToMemoryVmsSecs
option, ensure that you modify it before you start remediation to ensure that the new value is in effect. - If the suspend to memory operation fails, vSphere HA determines the most appropriate failover host after the specified timeout value expires. The failover host might be the original host or another one.
- You must synchronize the server time for all ESXi hosts in the cluster. If the hosts are not synchronized, vSphere HA might not respect the specified timeout period and initiate failover earlier or later.
For more information about using and configuring vSphere HA, see the vSphere Availability documentation.