This design uses VMware vSAN to implement software-defined storage as the primary storage type for the management cluster. By using vSAN, you have a high level of control over the storage subsystem.

All functional testing and validation of the design is on vSAN. Although VMware Validated Design uses vSAN, in particular for the clusters running management components, you can use any supported storage solution. If you select a storage solution other than vSAN, take into account that all the design, deployment, and Day-2 guidance in VMware Validated Design applies under the context of vSAN and adjust appropriately. Your storage design must match or exceed the capacity and performance capabilities of the vSAN configuration in the design. For multiple availability zones, the vSAN configuration includes vSAN stretched clusters.

vSAN is a hyper-converged storage software that is fully integrated with the hypervisor. vSAN creates a cluster of local ESXi host hard disk drives and solid-state drives, and presents a flash-optimized, highly resilient, shared storage datastore to ESXi hosts and virtual machines. By using vSAN storage policies, you can control capacity, performance, and availability on a per virtual machine basis.

vSAN Physical Requirements and Dependencies

The software-defined storage module has the following requirements and options.

Requirement Category

Requirements

Number of hosts

Minimum of three ESXi hosts providing storage resources to the vSAN cluster.

vSAN configuration

vSAN is configured as hybrid storage or all-flash storage.

  • A vSAN hybrid storage configuration requires both magnetic devices and flash caching devices.

  • An all-flash vSAN configuration requires flash devices for both the caching and capacity tiers.

Requirements for individual hosts that provide storage resources

  • Minimum of one flash device. The flash-based cache tier must be at least 10% of the size of the HDD capacity tier.

  • Minimum of two additional devises for capacity tier.

  • RAID controller that is compatible with vSAN.

  • Minimum 10 Gbps network for vSAN traffic.

  • vSphere High Availability host isolation response set to power off virtual machines. With this setting, you prevent split-brain conditions if isolation or network partition occurs. In a split-brain condition, the virtual machine might be powered on by two ESXi hosts by accident.

See Design Decisions on the Admission Control Policy for the First Cluster in the Management Domain.

vSAN Hardware Considerations

While VMware supports building your own vSAN cluster from compatible components, vSAN ReadyNodes are selected for this VMware Validated Design. See Design Decisions on Server Hardware for ESXi.

vSAN Hardware Options

Description

Build Your Own

Use hardware from the VMware Compatibility Guide for the following vSAN components:

  • Flash-based drives

  • Magnetic hard drives

  • I/O controllers, including vSAN certified driver and firmware combinations

Use VMware vSAN ReadyNodes

A vSAN ReadyNode is a server configuration that is validated in a tested, certified hardware form factor for vSAN deployment, jointly recommended by the server OEM and VMware. See the vSAN ReadyNode documentation. The vSAN Compatibility Guide for vSAN ReadyNodes documentation provides examples of standardized configurations, including supported numbers of VMs and estimated number of 4K IOPS delivered.

I/O Controllers for vSAN

The I/O controllers are as important to a vSAN configuration as the selection of disk drives. vSAN supports SAS, SATA, and SCSI adapters in either pass-through or RAID 0 mode. vSAN supports multiple controllers per ESXi host.

  • You select between single- and multi-controller configuration in the following way: Multiple controllers can improve performance, and mitigate a controller or SSD failure to a smaller number of drives or vSAN disk groups.

  • With a single controller, all disks are controlled by one device. A controller failure impacts all storage, including the boot media (if configured).

Controller queue depth is possibly the most important aspect for performance. All I/O controllers in the VMware vSAN Hardware Compatibility Guide have a minimum queue depth of 256. Consider regular day-to-day operations and increase of I/O because of virtual machine deployment operations, or re-sync I/O activity as a result of automatic or manual fault remediation.

Table 1. Design Decisions on the vSAN I/O Controller Configuration

Decision ID

Design Decision

Design Justification

Design Implication

SDDC-MGMT-VI-SDS-001

Ensure that the I/O Controller that is running the vSAN disk group(s) is capable and has a minimum queue depth of 256 set.

Controllers with lower queue depths can cause performance and stability problems when running vSAN.

vSAN Ready Nodes are configured with appropriate queue depths.

Limits the number of compatible I/O controllers that can be used for storage.

SDDC-MGMT-VI-SDS-002

I/O Controllers that are running vSAN disk group(s) should not be used for an other purpose.

Running non-vSAN disks, EG: VMFS, on an I/O controller that is running a vSAN disk group can impact vSAN performance.

If non-vSAN disks are required in ESXi hosts, an additional I/O controller is needed in the host.

vSAN Flash Options

vSAN has two configuration options: all-flash and hybrid.

Hybrid Mode

In a hybrid storage architecture, vSAN pools server-attached capacity devices (in this case magnetic devices) and flash-based caching devices, typically SSDs, or PCI-e devices, to create a distributed shared datastore.

All-Flash Mode

All-flash storage uses flash-based devices (SSD or PCI-e) as a write cache while other flash-based devices provide high endurance for capacity and data persistence.

Table 2. Design Decisions on vSAN Mode

Decision ID

Design Decision

Design Justification

Design Implication

SDDC-MGMT-VI-SDS-003

Configure vSAN in all-flash mode in the first cluster of the management domain.

  • Meets the performance needs of the first cluster in the management domain.

Using high speed magnetic disks in a hybrid vSAN configuration can provide satisfactory performance and is supported.

More disks might be required per host because flash disks are not as dense as magnetic disks.

Sizing Storage

You usually base sizing on the requirements of the IT organization. However, this design provides calculations that are based on a single-region implementation, and is then implemented on a per-region basis. In this way, you can handle storage in a dual-region deployment that has failover capabilities enabled.

This sizing is calculated according to a certain node configuration per region. Although VMware Validated Design allocates enough memory capacity to handle N-1 host failures, and uses thin-provisioned swap for the vSAN configuration, the potential think-provisioned swap capacity is factored in the calculation.

Table 3. Management Layers and Hardware Sizes

Category

Quantity

Resource Type

Consumption (GB)

Physical Infrastructure (ESXi)

4

Memory

1,024

Virtual Infrastructure

12

Disk

4,286

Swap

301

Identity and Access Management

4

Disk

240

Swap

54

Cloud Operations

9

Disk

6,180

Swap

158

Cloud Automation

3

Disk

666

Swap

120

Total

  • 32 management virtual machines

  • 4 ESXi hosts

Disk

11,372

Swap

633

Memory

1,024

The storage space that is required for the vSAN capacity tier according is worked out using the following calculations. For vSAN memory consumption by management ESXi hosts, see VMware Knowledge Base article 2113954.

Derive the consumption of storage space by the management virtual machines according to the following calculations. See vSAN ReadyNode™ Sizer.

The Disk Space Usage Distribution is made up of the following components:

  • Effective Raw Capactiy - Space Available for the vSAN Datastore.

  • Slack Space - Space reserved for vSAN specific operations such as resync and rebuilds.

  • Dedupe Overhead - Space reserved for dedup and compression metadata such as hash, translation and allocations maps.

  • Disk Formatting Overhead - Reservation for file system metadata.

  • Checksum Overhead - Space used to checksum information.

  • Physical Reservation - How much physical space/raw capacity do we consume as part of the overheads

11,372 GB Disk + 633 GB Swap = 12,005 GB Virtual Machine Raw Capacity Requirements
12,005 GB * 2 = 24,010 GB for Total Virtual Machine Raw Capacity, using FTT 1 (RAID 1).
24,010 GB + 30% = 31,213 GB Total Raw Virtual Machine Capacity with Overheads 
31,213 GB + 20% = 37,455.6 GB Total Raw Virtual Machine Capacity with Overheads and 20% Estimated Growth
37,455.6 GB / 4 = 9,363.9 GB Total Raw Capaity per Host
9,363.9 / 2 = 4,681.95 GB per disk group
Table 4. Design Decisions on vSAN Disk Configuration

Decision ID

Design Decision

Design Justification

Design Implication

SDDC-MGMT-VI-SDS-004

Use a 600 GB or greater flash-based drive for the cache tier in each disk group.

Provides enough cache for both hybrid or all-flash vSAN configurations to buffer I/O and ensure disk group performance.

Additional space in the cache tier does not increase performance.

Larger flash disks can increase initial host cost

SDDC-MGMT-VI-SDS-005

Have at least 5TB of flash-based drives for the capacity tier in each disk group.

Provides enough capacity for the management virtual machines with a minimum of 30% of overhead, and 20% growth when the number of primary failures to tolerate is 1.

None.

vSAN Hardware Considerations

While VMware supports building your own vSAN cluster from compatible components, vSAN ReadyNodes are selected for this VMware Validated Design. See Sizing Compute Resources for ESXi for the Management Domain.