Workload Domains and vSphere Clusters Design

The vCenter Server functionality is distributed across a minimum of two workload domains and two vSphere clusters.

This solution uses two vCenter Server instances: one for the management workload domain and another for the first compute workload domain. The compute workload domain can contain multiple vSphere clusters.

The cluster design must consider the workloads that the cluster handles. Different cluster types in this design have different characteristics. When you design the cluster layout in vSphere, consider the following guidelines:

Use a few large-sized ESXi hosts or more small-sized ESXi hosts.
- A scale-up cluster has fewer large-sized ESXi hosts.
- A scale-out cluster has more small-sized ESXi hosts.
Compare the capital costs of purchasing few large-sized ESXi hosts with more small-sized ESXi hosts. Costs vary between vendors and models.
Evaluate the operational costs for managing a few ESXi hosts with more ESXi hosts.
Consider the purpose of the cluster.
Consider the total number of ESXi hosts and cluster limits.

vSphere High Availability

VMware vSphere High Availability (vSphere HA) protects your VMs in case of an ESXi host failure by restarting VMs on other hosts in the cluster. During the cluster configuration, the ESXi hosts elect a primary ESXi host. The primary ESXi host communicates with the vCenter Server system and monitors the VMs and secondary ESXi hosts in the cluster.

The primary ESXi host detects different types of failure:

ESXi host failure, for example, an unexpected power failure.
ESXi host network isolation or connectivity failure.
Loss of storage connectivity.
Problems with the virtual machine OS availability.

The vSphere HA Admission Control Policy allows an administrator to configure how the cluster determines available resources. In a small vSphere HA cluster, a large proportion of the cluster resources is reserved to accommodate ESXi host failures, based on the selected policy.

The following policies are available:

Host failures the cluster tolerates: vSphere HA ensures that a specified number of ESXi hosts can fail and sufficient resources remain in the cluster to fail over all the VMs from those ESXi hosts.
Percentage of cluster resources reserved: vSphere HA reserves a specified percentage of aggregate CPU and memory resources for failover.
Specify Failover Hosts: When an ESXi host fails, vSphere HA attempts to restart its VMs on any of the specified failover ESXi hosts. If the restart is not possible, for example, the failover ESXi hosts have insufficient resources or have failed as well, then vSphere HA attempts to restart the VMs on other ESXi hosts in the cluster.

Table 1. Recommended vSphere Cluster Design
Design Decision	Design Justification	Design Implication
Use vSphere HA to protect all VMs against failures.	Provides a robust level of protection for VM availability.	You must provide sufficient resources on the remaining hosts so that VMs can be migrated to those hosts in the event of a host outage.
Set the Host Isolation Response of vSphere HA to Power Off.	vSAN requires that the HA Isolation Response is set to Power Off, so that the VMs can restart on available ESXi hosts.	VMs are powered off in case of a false positive and an ESXi host is declared isolated incorrectly.
Create a single management cluster that contains all the management ESXi hosts.	Simplifies configuration by isolating management workloads from compute workloads. Ensures that the compute workloads have no impact on the management stack. You can add ESXi hosts to the cluster as needed.	Management of multiple clusters and vCenter Server instances increases operational overhead.
Create a single edge cluster per compute workload domain.	Supports running NSX Edge nodes in a dedicated cluster.	Requires an additional vSphere cluster.
Create at least one compute cluster. This cluster contains compute workloads.	The clusters can be placed close to end-users where the workloads run. The management stack has no impact on compute workloads. You can add ESXi hosts to the cluster as needed.	Management of multiple clusters and vCenter Server instances increases the operational overhead.
Create a management cluster with a minimum of four ESXi hosts.	Allocating four ESXi hosts provides full redundancy for the cluster.	Additional ESXi host resources are required for redundancy.
Create an edge cluster with a minimum of three ESXi hosts.	Supports availability for a minimum of two NSX Edge Nodes.	As Edge Nodes are added, additional ESXi hosts must be added to the cluster to maintain availability.
Create a compute cluster with a minimum of four ESXi hosts.	Allocating four ESXi hosts provides full redundancy for the cluster.	Additional ESXi host resources are required for redundancy.
Configure vSphere HA to use percentage-based failover capacity to ensure n+1 availability.	Using explicit host failover limits the total available resources in a cluster.	The resource reservation of one ESXi host in the cluster can cause provisioning failure if resources are exhausted.
Enable VM Monitoring for each cluster.	VM Monitoring provides in-guest protection for most VM workloads.	None
Enable vSphere Distributed Resource Scheduler (DRS) in the management cluster and set it to `Fully Automated, with the default setting (medium)`.	Provides the best trade-off between load balancing and excessive migration with vSphere vMotion events.	If a vCenter Server outage occurs, mapping from VMs to ESXi hosts might be more difficult to determine.
Enable vSphere DRS in the compute clusters and set it to Manual mode.	Ensures that the latency-sensitive VMs do not move between ESXi hosts automatically.	Manual DRS mode increases the administrative overhead in ensuring that the cluster is properly balanced.