When a datastore becomes inaccessible, VMCP might not terminate and restart the affected virtual machines.

Problem

When an All Paths Down (APD) or Permanent Device Loss (PDL) failure occurs and a datastore becomes inaccessible, VMCP might not resolve the issue for the affected virtual machines.

In an APD or PDL failure situation, VMCP might not terminate a virtual machine for the following reasons:

  • VM is not protected by vSphere HA at the time of failure.

  • VMCP is disabled for this virtual machine.

Furthermore, if the failure is an APD, VMCP might not terminate a VM for several reasons:

  • APD failure is corrected before the VM was terminated.

  • Insufficient capacity on hosts with which the virtual machine is compatible

  • During a network partition or isolation, the host affected by the APD failure is not able to query the master host for available capacity. In such a case, vSphere HA defers to the user policy and terminates the VM if the VM Component Protection setting is aggressive.

  • vSphere HA terminates APD-affected VMs only after the following timeouts expire:

    • APD timeout (default 140 seconds).

    • APD failover delay (default 180 seconds). For faster recovery, this can be set to 0.

      Note:

      Based on these default values, vSphere HA terminates the affected virtual machine after 320 seconds (APD timeout + APD failover delay)

Results

To address this issue, check and adjust any of the following:

  • Insufficient capacity to restart the virtual machine

  • User-configured timeouts and delays

  • User settings affecting VM termination

  • VM Component Protection policy

  • Host monitoring or VM restart priority must be enabled