Storage Device is Failing in a Virtual SAN Cluster

Virtual SAN monitors the performance of each storage device and proactively isolates unhealthy devices. It detects gradual failure of a storage device and isolates the device before congestion builds up within the affected host and the entire Virtual SAN cluster.

If a disk experiences sustained high latencies or congestion, Virtual SAN considers the device as a dying disk, and evacuates data from the disk. Virtual SAN handles the dying disk by evacuating or rebuilding data. No user action is required, unless the cluster lacks resources or has inaccessible objects.

Component Failure State and Accessibility

The Virtual SAN components that reside on the magnetic disk or flash capacity device are marked as absent.

Behavior of Virtual SAN

Virtual SAN responds to the storage device failure in the following ways.

Parameter	Behavior
Alarms	An alarm is generated from each host whenever an unhealthy device is diagnosed. A warning is issued whenever a disk is suspected of being unhealthy.
Health check	The Overall disk health check issues a warning for the dying disk.
Health status	On the Disk Management page, the health status of the dying disk is listed as Unhealthy. When Virtual SAN completes evacuation of data, the health status is listed as DyingDiskEmpty.
Rebuilding data	Virtual SAN examines whether the hosts and the capacity devices can satisfy the requirements for space and placement rules for the objects on the failed device or disk group. If such a host with capacity is available, Virtual SAN starts the recovery process immediately because the components are marked as degraded. If resources are available, Virtual SAN automatically reprotects the data.

If Virtual SAN detects a disk with a permanent error, it makes a limited number of attempts to revive the disk by unmounting and mounting it.