Configuring VM Best Effort Restart Policy

This policy controls the evacuation behavior of a VM during maintenance mode.

When you enter maintenance mode, you must make sure that a host is ready for maintenance. This is often to upgrade the host. Whether you are performing an upgrade or not, you must evacuate all running VMs to put the host into maintenance mode. To do this you must determine how to deal with powered-on VMs. If you are performing an upgrade, you need to verify if the host can enter maintenance mode. If there are some VMs you cannot vMotion, you can power-off these VMs before you enter maintenance mode.

You can use compute policies to automate situations like this to evacuate VMs by powering-off the VMs when entering maintenance mode. Then you can tag all the desired VMs, and the system will power these VMs off when entering maintenance mode.

BestEffortRestart policy

During enter maintenance mode, BestEffortRestart policy attempts to shut down the VMs, if that fails the VMs are powered-off. Instead of waiting for the host to exit MM, the policy finds the best host for the VM while the original host is still in maintenance mode. BestEffortRestart creates tasks for VMs to be powered-on on the best hosts during enter maintenance mode so that these VMs are in powered-on state as soon as possible. If a VM cannot be powered-on for whatever reason, there is a remediation cycle that runs every 3 minutes. Exiting maintenance mode is no longer prerequisite for this VM to be running.

If you are moving a host out of a cluster and want VMs to remain in the cluster and not go away with the host you can use BestEffortRestart. In this case, you want VMs to remain in the cluster when the host is getting moved out of the cluster. So, during enter maintenance mode not only we have to power-off VMs but also find the best host for them excluding the current host and power them on there.

You can benefit from using BestEffortRestart for vGPU VMs and other passthrough devices. vGPU VMs have large memory framebuffers which are costly to migrate and may exceed the 100 second default vMotion timeout. These vMotions are likely to time out leaving vGPU VMs in an undesireable state. Instead, you can power-off vGPU VMs during enter maintenance mode. Ideally pass-through VMs, including vGPU VMs, should be powered-on on a different host during enter maintenance mode so that they don't have to wait until exitMM completes.

DRS must be activated for this policy to work properly. If DRS is deactivated either at the host or VM level, a VM is powered-off during enter maintenance mode.

If BestEffortRestart policy is deleted, depending on the timing of when the policy is deleted, there could be multiple outcomes.

If a policy is removed before the evacuation action for a VM, this VM is treated the same as any other VM. There is no connection to BestEffortRestart policy.
If the policy is deleted after the evacuation action a VM can be powered-off. The VM will not be remediated because this VM is no longer associated with BestEffortRestart policy.
If a tag associated with BestEffortRestart policy is deleted before the evacuation action for a VM, this VM is treated the same as any other VM. There is no connection to BestEffortRestart policy.

The VMs should be in the same state that they were before maintenance mode. A VM will eventually be powered-on if it was powered-on before maintenance mode.

When BestEffortRestart policy is successful all BestEffortRestart policy VMs are powered-off and powered-on on the best hosts.