Datastore Inaccessibility Is Not Resolved for a VM

When a datastore becomes inaccessible, VMCP might not terminate and restart the affected virtual machines.

Problem

When an All Paths Down (APD) or Permanent Device Loss (PDL) failure occurs and a datastore becomes inaccessible, VMCP might not resolve the issue for the affected virtual machines.

Cause

In an APD or PDL failure situation, VMCP might not terminate a virtual machine for the following reasons:

VM is not protected by vSphere HA at the time of failure.
VMCP is disabled for this virtual machine.

Furthermore, if the failure is an APD, VMCP might not terminate a VM for several reasons:

APD failure is corrected before the VM was terminated.
Insufficient capacity on hosts with which the virtual machine is compatible
During a network partition or isolation, the host affected by the APD failure is not able to query the primary host for available capacity. In such a case, vSphere HA defers to the user policy and terminates the VM if the VM Component Protection setting is aggressive.
vSphere HA terminates APD-affected VMs only after the following timeouts expire:
- APD timeout (default 140 seconds).
- APD failover delay (default 180 seconds). For faster recovery, this can be set to 0.
  Note: Based on these default values, vSphere HA terminates the affected virtual machine after 320 seconds (APD timeout + APD failover delay)

Solution

To address this issue, check and adjust any of the following:

Insufficient capacity to restart the virtual machine
User-configured timeouts and delays
User settings affecting VM termination
VM Component Protection policy
Host monitoring or VM restart priority must be enabled