When a datastore becomes inaccessible, VMCP might not terminate and restart the affected virtual machines.
When an All Paths Down (APD) or Permanent Device Loss (PDL) failure occurs and a datastore becomes inaccessible, VMCP might not resolve the issue for the affected virtual machines.
In an APD or PDL failure situation, VMCP might not terminate a virtual machine for the following reasons:
VM is not protected by vSphere HA at the time of failure.
VMCP is disabled for this virtual machine.
Furthermore, if the failure is an APD, VMCP might not terminate a VM for several reasons:
APD failure is corrected before the VM was terminated.
Insufficient capacity on hosts with which the virtual machine is compatible
During a network partition or isolation, the host affected by the APD failure is not able to query the master host for available capacity. In such a case, vSphere HA defers to the user policy and terminates the VM if the VM Component Protection setting is aggressive.
vSphere HA terminates APD-affected VMs only after the following timeouts expire:
APD timeout (default 140 seconds).
APD failover delay (default 180 seconds). For faster recovery, this can be set to 0.Note:
Based on these default values, vSphere HA terminates the affected virtual machine after 320 seconds (APD timeout + APD failover delay)
To address this issue, check and adjust any of the following:
Insufficient capacity to restart the virtual machine
User-configured timeouts and delays
User settings affecting VM termination
VM Component Protection policy
Host monitoring or VM restart priority must be enabled