Use this design decision list for reference related to shared storage, vSAN principal storage, and NFS supplemental storage in an environment with a single or multiple VMware Cloud Foundation instances. The design also considers whether an instance contains a single or multiple availability zones.

After you set up the physical storage infrastructure, the configuration tasks for most design decisions are automated in VMware Cloud Foundation. You must perform the configuration manually only for a limited number of decisions as noted in the design implication.

For full design details, see Shared Storage Design for the Management Domain.

vSAN Deployment Specification

Table 1. Design Decisions on Storage I/O Controller Configuration for vSAN

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-001

Ensure that the storage I/O controller that is running the vSAN disk groups is capable and has a minimum queue depth of 256 set.

Storage controllers with lower queue depths can cause performance and stability problems when running vSAN.

vSAN ReadyNode servers are configured with the right queue depths for vSAN.

Limits the number of compatible I/O controllers that can be used for storage.

VCF-MGMT-VSAN-CFG-002

Do not use the storage I/O controllers that are running vSAN disk groups for another purpose.

Running non-vSAN disks, for example, VMFS, on a storage I/O controller that is running a vSAN disk group can impact vSAN performance.

If non-vSAN disks are required in ESXi hosts, you must have an additional storage I/O controller in the host.

Table 2. Design Decisions on vSAN Configuration

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-003

Configure vSAN in all-flash configuration in the default management cluster.

Meets the performance needs of the default management cluster.

Using high-speed magnetic disks in a hybrid vSAN configuration can provide satisfactory performance and is supported.

All vSAN disks must be flash disks, which might cost more than magnetic disks.

Table 3. Design Decisions on the vSAN Datastore for a Single VMware Cloud Foundation Instance

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-004

Provide the default management cluster with a minimum of 13.72 TB of raw capacity for vSAN.

The management virtual machines require at least 4.4 TB of raw storage (before setting FTT to 1) and 8.8 TB when using the default vSAN storage policy.

By allocating at least 13.72 TB, initially 30% of the space is reserved for vSAN internal operations and 20% of the space is free which you can use it for additional growth of management virtual machines.

If you scale the environment out with more workloads, additional storage is required in the management domain.

VCF-MGMT-VSAN-CFG-005

On the vSAN datastore, ensure that at least 30% of free space is always available.

When vSAN reaches 80% usage, a rebalance task is started which can be resource-intensive.

Increases the amount of available storage needed.

Table 4. Design Decisions on the vSAN Datastore for Multiple VMware Cloud Foundation Instances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-006

Provide the default cluster in the management with a minimum of 19.86 TB of raw capacity for vSAN.

The management virtual machines require at least 6.36 TB of raw storage (before setting FTT to 1) and 12.73 TB when using the default vSAN storage policy.

By allocating at least 19.86 TB, initially 30% of the space is reserved for vSAN internal operations and 20% of the space is free which you can use for additional growth of management virtual machines.

NFS is used as secondary shared storage for some management components, for example, for backups and log archives.

If you scale the environment out with more workloads, additional storage is required in the management domain.

Table 5. Design Decision on the vSAN Cluster Size in a Management Domain with a Single Availability Zone

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-007

The default management cluster requires a minimum of 4 ESXi hosts to support vSAN.

  • Having 4 ESXi hosts addresses the availability and sizing requirements.

  • You can take an ESXi host offline for maintenance or upgrades without impacting the overall vSAN cluster health.

The availability requirements for the management cluster might cause underutilization of the cluster's ESXi hosts.

Table 6. Design Decision on the vSAN Cluster Size in a Management Domain with Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-008

To support a vSAN stretched cluster, the default management cluster requires a minimum of 8 ESXi hosts (4 in each availability zone) .

  • Having 8 ESXi hosts addresses the availability and sizing requirements.

  • You can take an availability zone offline for maintenance or upgrades without impacting the overall vSAN cluster health.

The capacity of the additional 4 hosts is not added to capacity of the cluster. They are only used to provide additional availability.

Table 7. Design Decisions on the vSAN Disk Groups per ESXi Host

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-009

Configure vSAN with a minimum of two disk groups per ESXi host.

Reduces the size of the fault domain and spreads the I/O load over more disks for better performance.

Multiple disks groups require more disks in each ESXi host.

Table 8. Design Decisions on the vSAN Disk Configuration for a Management Domain for a Single VMware Cloud Foundation Instance

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-010

For the caching tier in each disk group, use a flash-based drive that is at least 600 GB large.

Provides enough cache for both hybrid or all-flash vSAN configurations to buffer I/O and ensure disk group performance.

Additional space in the cache tier does not increase performance.

Larger flash disks can increase initial host cost

VCF-MGMT-VSAN-CFG-011

Allocate at least 2.3 TB of flash-based drives for the capacity tier in each disk group.

Provides enough capacity for the management virtual machines with a minimum of 30% of overhead and 20% growth when the number of primary failures to tolerate is 1.

None.

Table 9. Design Decisions on the vSAN Disk Configuration for a Management Domain for Multiple VMware Cloud Foundation Instances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-012

Have at least 3.31 TB of flash-based drives for the capacity tier in each disk group.

Provides enough capacity for the management virtual machines with a minimum of 30% of overhead and 20% growth when the number of primary failures to tolerate is 1.

None.

Table 10. Design Decisions on the vSAN Storage Policy in a Management Domain with a Single Availability Zone

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-013

Use the default VMware vSAN storage policy.

Provides the level of redundancy that is needed in the management cluster.

Provides the level of performance that is enough for the individual management components.

You might need additional policies for third-party virtual machines hosted in these clusters because their performance or availability requirements might differ from what the default VMware vSAN policy supports.

VCF-MGMT-VSAN-CFG-014

Leave the default virtual machine swap file as a sparse object on VMware vSAN.

Sparse virtual swap files only consume capacity on vSAN as they are accessed. As a result, you can reduce the consumption on the vSAN datastore if virtual machines do not experience memory over-commitment which requires the use of the virtual swap file.

None.

Table 11. Design Decisions on the vSAN Storage Policy in a Management Domain with Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-CFG-015

Add the following setting to the default vSAN storage policy:

Secondary Failures to Tolerate = 1

Provides the necessary protection for virtual machines in each availability zone, with the ability to recover from an availability zone outage.

You might need additional policies if third-party virtual machines are to be hosted in these clusters because their performance or availability requirements might differ from what the default VMware vSAN policy supports.

VCF-MGMT-VSAN-CFG-016

Configure two fault domains, one for each availability zone. Assign each host to their respective availability zone fault domain.

Fault domains are mapped to availability zones to provide logical host separation and ensure a copy of vSAN data is always available even when an availability zone goes offline.

Additional raw storage is required when the secondary failure to tolerate option and fault domains are enabled.

vSAN Network Design

Table 12. Design Decisions on the Virtual Switch Configuration for vSAN

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-NET-001

Use the existing vSphere Distributed Switch instances in the default management cluster.

Provides guaranteed performance for vSAN traffic in a connection-free network by using existing networking components. 

All traffic paths are shared over common uplinks.

VCF-MGMT-VSAN-NET-002

Configure jumbo frames on the VLAN for vSAN traffic.

  • Simplifies configuration because jumbo frames are also used to improve the performance of vSphere vMotion and NFS storage traffic.

  • Reduces the CPU overhead resulting high network usage.

Every device in the network must support jumbo frames.

vSAN Witness Design

Table 13. Design Decisions for the vSAN Witness Appliance for Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-WTN-001

Deploy a vSAN witness appliance in a location that is not local to the ESXi hosts in any of the availability zones.

The witness appliance has these features.

  • Acts as a tiebreaker if network isolation between the availability zones occurs.

  • Hosts all required witness components to form the required RAID-1 configuration on vSAN, that is, each data copy at a site while the witness is at the witness site.

A third physically-separate location is required. Such a location must have a vSphere environment. Another VMware Cloud Foundation Instance in a separate physical location might be an option.

VCF-MGMT-VSAN-WTN-002

Deploy a medium-size witness appliance.

A medium-size witness appliance supports up to 500 virtual machines which is sufficient for high availability of the management components of the SDDC.

The vSphere environment at the witness location must satisfy the resource requirements of the witness appliance.

Table 14. Design Decisions on the Network Configuration of the vSAN Witness Appliance for Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-VSAN-WTN-003

Connect the first VMkernel adapter of the vSAN witness appliance to the management network in the witness site.

Connects the witness appliance to the vCenter Server instance and ESXi hosts in both availability zones.

The management networks in both availability zones must be routed to the management network in the witness site.

VCF-MGMT-VSAN-WTN-004

Configure the vSAN witness appliance to use the first VMkernel adapter, that is the management interface, for vSAN witness traffic.

Separates the witness traffic from the vSAN data traffic. Witness traffic separation provides the following benefits:

  • Removes the requirement to have static routes from the vSAN networks in both availability zones to the witness site.

  • Removes the requirement to have jumbo frames enabled on the path between both availability zones and the witness site because witness traffic can use a regular MTU size of 1500 bytes.

The management networks in both availability zones must be routed to the management network in the witness site.

VCF-MGMT-VSAN-WTN-005

Place witness traffic on the management VMkernel adapter of all the ESXi hosts in the management domain.

Separates the witness traffic from the vSAN data traffic. Witness traffic separation provides the following benefits:

  • Removes the requirement to have static routes from the vSAN networks in both availability zones to the witness site.

  • Removes the requirement to have jumbo frames enabled on the path between both availability zones and the witness site because witness traffic can use a regular MTU size of 1500 bytes.

The management networks in both availability zones must be routed to the management network in the witness site.

VCF-MGMT-VSAN-WTN-006

Allocate a statically assigned IP address and host name to the management adapter of the vSAN witness appliance.

Simplifies maintenance and tracking, and implements a DNS configuration.

Requires precise IP address management.

VCF-MGMT-VSAN-WTN-007

Configure forward and reverse DNS records for the vSAN witness appliance assigning the record to the child domain for the VMware Cloud Foundation instance.

Enables connecting the vSAN witness appliance to the management domain vCenter Server by FQDN instead of IP address.

You must provide DNS records for the vSAN witness appliance.

VCF-MGMT-VSAN-WTN-008

Configure time synchronization by using an internal NTP time for the vSAN witness appliance.

Prevents any failures in the stretched cluster configuration that are caused by time mismatch between the vSAN witness appliance and the ESXi hosts in both availability zones and management domain vCenter Server.

  • An operational NTP service must be available in the environment.

  • All firewalls between the vSAN witness appliance and the NTP servers must allow NTP traffic on the required network ports.

NFS Deployment Specification

Table 15. Design Decisions on Supplemental NFS Storage Sizing

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-NFS-CFG-001

Ensure that at least 20% of free space is always available on all non-vSAN datastores.

If a datastore runs out of free space, applications and services in the management domain running on the NFS datastores fail.

Monitoring and capacity management must be proactive operations.

Table 16. Design Decisions on the NFS Version

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-NFS-CFG-002

Use NFS version 3 for all NFS datastores.

You cannot use Storage I/O Control with NFS version 4.1 datastores.

NFS version 3 does not support Kerberos authentication.

Table 17. Design Decisions on NFS Hardware

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-NFS-CFG-003

  • Consider 10k SAS drives a base line performance requirement. Greater performance might be needed according to the scale and growth profile of the environment.

  • Consider the number and performance of disks backing supplemental storage NFS volumes.

10K SAS drives provide a balance between performance and capacity. You can use faster drives. vStorage API for Data Protection-based backups require high- performance datastores to meet backup SLAs.

10K SAS drives are more expensive than other alternatives.

Table 18. Design Decision on the Integration of vStorage APIs for Array

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-NFS-CFG-004

Select an array that supports vStorage APIs for Array Integration (VAAI) over NAS (NFS).

  • VAAI offloads tasks to the array itself, enabling the ESXi hypervisor to use its resources for application workloads and not become a bottleneck in the storage subsystem.

  • VAAI is required to support the target number of virtual machine life cycle operations in this design.

Not all arrays support VAAI over NFS. For the arrays that support VAAI, to enable VAAI over NFS, you must install a plug-in from the array vendor .

Table 19. Design Decisions on NFS Volume Assignment

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-NFS-CFG-005

Use a dedicated NFS volume to support image-level backup requirements.

The backup and restore process is I/O intensive. Using a dedicated NFS volume ensures that the process does not impact the performance of other management components.

Dedicated volumes add management overhead to storage administrators. Dedicated volumes might use more disks, according to the array and type of RAID.

VCF-MGMT-NFS-CFG-006

Use a shared volume for other management component datastores.

Non-backup related management applications can share a common volume because of the lower I/O profile of these applications.

Enough storage space for shared volumes and their associated application data must be available.

Table 20. Design Decisions on NFS Exports

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-NFS-CFG-007

For each export, limit access to the application virtual machines or hosts requiring the ability to mount the storage only.

Limiting access helps ensure the security of the underlying data.

Securing exports individually can introduce operational overhead.

Table 21. Design Decisions on Storage Policies and Controls

Decision ID

Design Decision

Design Justification

Design Implication

VCF-MGMT-NFS-CFG-008

Enable Storage I/O Control with the default values on all supplemental NFS datastores.

Ensures that all virtual machines on a datastore receive equal amount of I/O capacity.

Virtual machines that use more I/O access the datastore with priority. Other virtual machines can access the datastore only when an I/O contention occurs on the datastore.