vSAN storage polices define storage requirements for your virtual machines. These policies guarantee the required level of service for your VMs because they determine how storage is allocated to the VM.

VMware Cloud on AWS includes two vSAN datastores, one for the management VMs (vsanDatastore) and one for the workload VMs (WorkloadDatastore). Both datastores share the same underlying storage devices and consume from the same pool of free space.

Each virtual machine deployed to a vSAN datastore is assigned at least one virtual machine storage policy. You can assign storage policies when you create or reconfigure virtual machines.

Storage policies have availability attributes and advanced attributes.

Availability Attributes for vSAN VM Storage Policies

Site disaster tolerance
Defines the data redundancy method used by stretched clusters to handle a site failure. This attribute applies to stretched clusters. If you have a standard vSAN cluster, choose None (standard cluster).
The options are:
  • None (standard cluster)
  • Dual-site monitoring (stretched cluster)
  • None - Keep data on primary (stretched cluster)
  • None - Keep data on secondary (stretched cluster)
Failures to tolerate
Defines the number of host and device failures that a virtual machine can tolerate. You can choose to have no data redundancy, or select a RAID configuration optimized for either performance (Mirroring) or capacity (Erasure Coding).
  • RAID-1 uses more disk space to place the components of objects but provides better performance for accessing the objects.
  • RAID-5/6 (Erasure Coding) uses less disk space, but the performance is reduced.
Table 1. RAID Configurations, FTT, and Host Requirements
RAID Configuration Failures to Tolerate (FTT) Minimum Hosts Required
RAID-1 (Mirroring) This is the default setting. RAID-1 1 3
RAID-5 (Erasure Coding) 1 4
RAID-1 (Mirroring) 2 5
RAID-6 (Erasure Coding) 2 6
RAID-1 (Mirroring) 3 7
Note: VMs with FTT = 0 (No Data Redundancy) might experience data loss if there is a failure or if the VM becomes unresponsive.

The Managed Storage Policy Profile determines the initial RAID configuration of a cluster. When a Managed Storage Policy Profile is applied to the cluster, the RAID configuration is updated automatically as the cluster size changes. See VMware Cloud on AWS Managed Storage Policy Profiles for details.

Advanced Attributes for vSAN VM Storage Policies

Number of disk stripes per object
Minimum number of capacity devices across which each replica of a virtual machine object is striped. A value higher than 1 might result in better performance, but also results in higher use of system resources. Default value is 1. Maximum value is 12. Change the default value only when recommended by VMware support.
IOPS limit for object
Defines the IOPS limit for an object, such as a VMDK. IOPS is calculated as the number of I/O operations, using a weighted size. If the system uses the default base size of 32 KB, a 64-KB I/O represents two I/O operations.

When calculating IOPS, read and write are considered equivalent, but cache hit ratio and sequentiality are not considered. If a disk’s IOPS exceeds the limit, I/O operations are throttled. If the IOPS limit for object is set to 0, IOPS limits are not enforced.

vSAN allows the object to double the rate of the IOPS limit during the first second of operation or after a period of inactivity.

Object space reservation
This setting defines the percentage of the logical size of the virtual machine disk (vmdk) object that must be reserved (provisioned) when deploying virtual machines. The default reservation value in VMware Cloud on AWS is 0% ( Thin provisioning). You can specify Thick provisioning to reserve capacity for larger-than-expected vSAN writes, but the underlying vmdk structure remains the same as it is in the Thin provisioning configuration, and is not the same as the Thick provision eager zeroed provisioning model available on-premises.
Flash read cache reservation
This setting is ignored in VMware Cloud on AWS. In Hybrid vSan deployments, it designates how much flash capacity is reserved as read cache.
Disable object checksum
If the option is set to No, the object calculates checksum information to ensure the integrity of its data. If this option is set to Yes, the object does not calculate checksum information.

vSAN uses end-to-end checksum to ensure the integrity of data by confirming that each copy of a file is exactly the same as the source file. The system checks the validity of the data during read/write operations, and if an error is detected, vSAN repairs the data or reports the error.

If a checksum mismatch is detected, vSAN automatically repairs the data by overwriting the incorrect data with the correct data. Checksum calculation and error-correction are performed as background operations.

The default setting for all objects in the cluster is No, which means that checksum is enabled.

Force provisioning
If the option is set to Yes, the object is provisioned even if the Primary level of failures to tolerate, Number of disk stripes per object, and Flash read cache reservation policies specified in the storage policy cannot be satisfied by the datastore. Use this parameter in bootstrapping scenarios and during an outage when standard provisioning is no longer possible.

The default No is acceptable for most production environments. vSAN fails to provision a virtual machine when the policy requirements are not met, but it successfully creates the user-defined storage policy.

VMware Cloud on AWS Managed Storage Policy Profiles

When you create a cluster in your SDDC, VMware Cloud on AWS creates a managed storage policy profile that is applied as the default storage policy to VMs that you create in the cluster. This storage policy profile is named "VMC Workload Storage Policy - cluster name". The policy settings are configured to ensure that the cluster meets the requirements outlined in the Service Level Agreement for VMware Cloud on AWS (the SLA).

The managed storage policy settings are based on the cluster configuration as follows:

  • Single host SDDCs are not covered by the SLA. They use a No data redundancy policy.
  • Single-AZ clusters use thin provisioning and the failure tolerance depends on the size of the cluster and the host instance type:
    • Clusters using Elastic VSAN storage use 1 failure - RAID-1 (Mirroring) policy, regardless of cluster size.
    • Clusters using non-Elastic VSAN storage containing 3 to 5 hosts use 1 failure - RAID-1 (Mirroring).
    • Clusters using non-Elastic VSAN storage containing 6 or more hosts use 2 failures - RAID-6 (Erasure Coding).
  • Stretched clusters use 1 failure - RAID-1 (Mirroring), but also have Site Disaster Tolerance set to Dual Site Monitoring.

Because the managed storage policy for non-Elastic VSAN clusters varies based on cluster size, adding or removing hosts will trigger a storage policy reconfiguration if it changes the size of the cluster so that it requires a different policy. For example, if you add an additional host to a cluster contaning five i3.metal hosts, the storage policy for that cluster is reconfigured from using 1 failure - RAID-1 (Mirroring) to 2 failures - RAID-6 (Erasure Coding). The reverse happens if the extra host is removed and the number of hosts is reduced from six to five.

Note: When you make a change to a cluster that triggers a managed storage policy reconfiguration, the reconfiguration temporarily requires additional storage. If the cluster is close to 75% storage capacity, this might trigger an EDRS scale out event, adding a host to the cluster. After the reconfiguration is completed, EDRS might not remove the additional host. Check your clusters after storage reconfiguration, and remove the additional host if necessary.

For a non-Elastic VSAN cluster with 6 or more hosts, you cannot remove a host if the cluster storage utilization is greater than 40% of the total storage capacity. For all other types of cluster, VMware strongly recommends that you do not remove a host if the cluster storage utilization is greater than 40% of the total storage capacity.

If you remove one or more hosts from a cluster, and that triggers a managed storage policy reconfiguration, the reconfiguration must complete before the host or hosts are removed. If your workloads use a large amount of storage, this reconfiguration could take anywhere from hours to days to complete. During this time, any hosts you have designated to be removed remain usable and you are still billed for host usage. After the storage policy reconfiguration completes, the host or hosts are removed and you are no longer billed for the host usage.

Note: Do not edit the managed storage policies that VMware Cloud on AWS creates for your clusters. If you rename the policy, it is no longer managed by VMware Cloud on AWS. If you edit the settings of the managed storage policy, your changes are overwritten at the next storage policy reconfiguration.

If you do not want to use the managed storage policy, you can define your own storage policy and assign it as the default for the workload datastore. See https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vsan.doc/GUID-F52F0AE9-FB31-4236-B566-D9610B14C670.html.

VM Templates and Managed Storage Policies

If a VM template is associated with a VMware Cloud on AWS managed storage policy, the template's policy is not automatically updated if the cluster's policy is reconfigured. After the cluster's storage policy is reconfigured, the VM template compliance status is "Out of Date". To make the template policy status "Compliant", you must convert the template to a VM, reapply the VM storage policy, and then convert the VM back to a template.

When you deploy a VM from a template, VMware recommends that you select Datastore Default for the VM Storage Policy in order to ensure that the VM is deployed with the current cluster managed storage policy.

Storage Policies and SLA Requirements

When working with virtual machine storage policies, it's important to understand how they affect the consumption of storage capacity in the vSAN cluster and whether they meet the requirements defined in the Service Level Agreement for VMware Cloud on AWS (the SLA).

The managed storage policy is initially configured based on the number of hosts in the cluster. For example, a three-host cluster defaults to FTT=1 using the RAID-1 Mirroring policy. Clusters with more than six i3.metal hosts in a single AZ default to 2 failures - RAID-6 (Erasure Coding). You can create custom policies that align data availability with the needs of your underlying data, but workload VMs with storage policies that do not meet the requirements set forth in the Service Level Agreement may not qualify for SLA Credits. The VM Storage Policy must be configured with the appropriate level of protection. Ephemeral workloads may use the No Data Redundancy policy to save capacity, foregoing any SLA guarantees of availability.

Important:

When scaling an i3.metal cluster up from five to six hosts, the failure tolerance for the underlying policy must be updated to 2 failures - RAID-6 (Erasure Coding) or 2 failures - RAID-6 (Mirroring) to compensate for the larger failure pool. Clusters using the managed storage policy will be reconfigured automatically, but you must manually update any clusters that use custom policies. Continued use of a failure toleration of 1 for this host configuration means that VMware cannot guarantee availability per the service definition guidance. R5.metal clusters using Elastic vSAN are able to sustain the SLA with failure toleration of 1 for any cluster size of three hosts or more.

For more information about designing and sizing considerations of storage policies, see the Administering VMware vSAN documentation.