If your Virtual SAN cluster spans across multiple racks or blade server chassis in a data center and you want to make sure that your hosts are protected against rack or chassis failure, you can create fault domains and add one or more hosts to each fault domain.

A fault domain consists of one or more Virtual SAN hosts grouped together according to their physical location in the data center. When configured, fault domains enable Virtual SAN to tolerate failures of entire physical racks as well as failures of a single host, capacity device, network link or a network switch dedicated to a fault domain.

The Primary level of failures to tolerate policy for the cluster depends on the number of failures a virtual machine is provisioned to tolerate. For example, when a virtual machine is configured with the Primary level of failures to tolerate set to 1 (PFTT = 1) and using multiple fault domains, Virtual SAN can tolerate a single failure of any kind and of any component in a fault domain, including the failure of an entire rack.

When you configure fault domains on a rack and provision a new virtual machine, Virtual SAN ensures that protection objects, such as replicas and witnesses, are placed in different fault domains. For example, if a virtual machine's storage policy has the Primary level of failures to tolerate set to N (PFTT = n), Virtual SAN requires a minimum of 2*n+1 fault domains in the cluster. When virtual machines are provisioned in a cluster with fault domains using this policy, the copies of the associated virtual machine objects are stored across separate racks.

A minimum of three fault domains are required. For best results, configure four or more fault domains in the cluster. A cluster with three fault domains has the same restrictions that a three host cluster has, such as the inability to reprotect data after a failure and the inability to use the Full data migration mode. For information about designing and sizing fault domains, see Designing and Sizing Virtual SAN Fault Domains.

Consider a scenario where you have a Virtual SAN cluster with 16 hosts. The hosts are spread across 4 racks, that is, 4 hosts per rack. In order to tolerate an entire rack failure, you should create a fault domain for each rack. A cluster of such capacity can be configured to tolerate the Primary level of failures to tolerate set to 1. If you want to configure the cluster to allow for virtual machines with Primary level of failures to tolerate set to 2, you need to configure five fault domains in a cluster.

When a rack fails, all resources including the CPU, memory in the rack become unavailable to the cluster. To reduce the impact of a potential rack failure, you should configure fault domains of smaller sizes. This increases the total amount of resource availability in the cluster after a rack failure.

When working with fault domains, follow these best practices.
  • Configure a minimum of three fault domains in the Virtual SAN cluster. For best results, configure four or more fault domains.
  • A host not included in any fault domain is considered to reside in its own single-host fault domain.
  • You do not need to assign every Virtual SAN host to a fault domain. If you decide to use fault domains to protect the Virtual SAN environment, consider creating equal sized fault domains.
  • When moved to another cluster, Virtual SAN hosts retain their fault domain assignments.
  • When designing a fault domain, it is recommended that you configure fault domains with uniform number of hosts.

    For guidelines about designing fault domains, see Designing and Sizing Virtual SAN Fault Domains.

  • You can add any number of hosts to a fault domain. Each fault domain must contain at least one host.