You can manage the Elastic DRS policy for each SDDC cluster to optimize cluster scaling to meet your workloads' needs.

For any policy, scale-out is triggered when a cluster reaches the high threshold for any resource. Regardless of the policy you choose, the storage scale-out threshold cannot be set to greater than 80%. Scale-in is triggered only after all of the low thresholds have been reached. See How the Elastic DRS Algorithm Works for more information about Elastic DRS (EDRS) scale-out and scale-in logic. The VMware Cloud on AWS console has an Elasticity tab that displays a grid with a row showing the EDRS policy currently applied to each cluster. To view or edit policy details, expand a row.

Note: In two-host SDDCs and stretched clusters with fewer than six hosts, only the Elastic DRS Baseline policy is available.
vSAN Component and Storage Utilization

Elastic DRS monitors vSAN component and storage utilization and sends warning notifications when utilization becomes high.

For the baseline policy, we notify you when vSAN component utilization exceeds 75% or storage utilization exceeds 70%. If vSAN component utilization exceeds 85% or storage utilization exceeds 80%, we also send you a "host addition" notification and auto-remediate the cluster by adding a host to maintain an optimal SDDC health. Additional hosts are billed to your account at “on-demand“ rates if you do not have an available subscription for the instance type and region.

For all other policies, we send a warning notification when vSAN component or resource utilization is within 10% of the resource threshold.

Read VMware Knowledge Base article 74695 to learn more about vSAN component utilization.
The following Elastic DRS policies are available:
Elastic DRS Baseline
This is the default policy for a new cluster. It cannot be disabled and always applies to a cluster. If another policy is selected, it might add hosts faster than the baseline policy would, or enable automatic cluster scale-in operations (host removal). but the baseline policy always determines the upper threshold. All policies add hosts after storage utilization reaches 80% or vSAN component utilization reaches 85%. All policies will breach the maximum host count if needed to maintain vSAN slack space. If an AWS Availability Zone failure occurs in a multi-AZ SDDC, hosts added to remediate the outage are considered replacements and are not billable. This policy has the following thresholds:
Resource High Threshold Low Threshold
CPU N/A N/A
Memory N/A N/A
Storage 80% utilization N/A
Optimize for Best Performance
This policy adds hosts as needed to maintain performance and removes them only when resource consumption is significantly reduced. It does not remove hosts if it determines that the removal would degrade performance and force a near-term scale-out. It has the following thresholds:
Resource High Threshold Low Threshold
CPU 90% utilization 50% utilization
Memory 80% utilization 50% utilization
Storage 80% utilization 20% utilization
Optimize for Lowest Cost

When scaling in, this policy removes hosts quickly to maintain baseline performance while keeping host counts to a practical minimum. It removes hosts only if it anticipates that storage utilization would not result in a scale out in the near term after host removal. It has the following thresholds:

Resource High Threshold Low Threshold
CPU 90% utilization 60% utilization
Memory 80% utilization 60% utilization
Storage 80% utilization 40% utilization
Rapid Scaling

This policy adds multiple hosts at a time when needed for memory or CPU, and adds hosts incrementally when needed for storage. By default, hosts are added four at a time. You can specify a larger scale-out increment (8 or 12) if you need faster scaling for disaster recovery, Virtual Desktop Infrastructure (VDI), and similar use cases. As with any EDRS policy, scale-out time increases with increment size. When the increment is large (12 hosts), it can take up to 40 minutes to complete in some configurations.

When scaling in, this policy removes hosts rapidly, maintaining baseline performance while keeping host count to a practical minimum. It does not remove hosts if it anticipates that doing so would degrade performance and force a near-term scale-out. Scale-in stops when the cluster reaches the minimum host count or the number of hosts in the scale-out increment has been removed. This policy has the following thresholds:

Resource High Threshold Low Threshold
CPU 80% utilization 50% utilization
Memory 80% utilization 50% utilization
Storage 80% utilization 40% utilization
Custom Managed EDRS Policy

This policy allows you to configure policy parameters independently to ensure performance standards while optimizing the cost. You can set the high and low thresholds for all resources.

Scale-out is based on resource type (CPU, memory, and storage). Scale-out is triggered when any of the high thresholds have been reached, but scale-in requires all resource types to be below those thresholds. Scale-out can be disabled for CPU and memory but not for storage. Scale-in can be disabled on the cluster.

By default, this policy adds multiple hosts in parallel when needed for memory or CPU, and adds hosts one at a time when needed for storage. As with any EDRS policy, scale-out time increases with increment size. When the increment is large (12 hosts), it can take up to 40 minutes to complete in some configurations. This policy has the following threshold ranges:

Resource High Threshold Range Low Threshold Range
CPU 60%-95% utilization when enabled 5%-60%utilization
Memory 60%-95% utilization when enabled 5%-60%utilization
Storage 70%-80% utilization 5%-40% utilization
Note:

As a best practice, the gap between high and low thresholds should not be less than 15 percentage points. Larger gaps are less likely to force add/remove events for hosts. When setting scale-in thresholds, keep in mind that the data evacuation required when removing a host can temporarily impose additional load on remaining hosts. And be careful when configuring a Custom Managed EDRS Policy when cluster size approaches the 6-host SLA threshold for moving between FTT=1 and FTT=2. Any change in FTT forces background rebuilds of data, which impose additional load on the clusters.

Elastic DRS polices are governed by three variables:
Minimum cluster size
The smallest host count EDRS will scale in to regardless of resource utilization. When minimum cluster size is reached, EDRS can no longer perform a scale-in operation. You can still remove hosts manually as long as storage utilization remains below the minimum threshold and cluster size doesn't fall below minimum requirements (generally two hosts for a conventional cluster and six for a stretched cluster).
Maximum cluster size
The largest host count EDRS will scale out to regardless of resource utilization. Once maximum cluster size is reached, EDRS can no longer perform a scale-out operation for CPU or memory consumption, but can continue to add hosts for storage. You can always add hosts manually as long as cluster size doesn't exceed the maximum allowed for your organization.
Scale Increment
(Custom and Rapid Scaling policies only) The number of hosts added during a scale-out event or removed during a scale-in event for CPU and memory. The scale increment for storage is always a single host (1). In a conventional cluster (single AZ) the Custom Managed EDRS Policy supports increments of 1-6. In a stretched cluster, it supports even-numbered increments in the range 2-12.

Procedure

  1. Log in to the VMware Cloud Console at https://vmc.vmware.com.
  2. Click Inventory > SDDCs, then pick an SDDC and click VIEW DETAILS.
  3. Select a cluster and specify the Elastic DRS policy you want it to use.

    On the card for the cluster, click ACTIONS and choose Edit Elastic DRS Settings. You can also start by opening the Elasticity tab, which displays a grid with a row showing the EDRS policy currently applied to each cluster. To view or edit policy details, expand the row.

    The Elastic DRS Baseline policy has no parameters. For other policies, specify a Minimum cluster size of 2 or more and a Maximum cluster size consistent with your expected workload resource consumption. The Maximum cluster size applies to CPU and Memory. When needed to maintain storage capacity and ensure data durability, the service can add more hosts than the number specified in Maximum cluster size.

    The Custom Managed EDRS Policy provides default values for all resources. You can edit these to suit the needs of your workloads. You can also disable scaling for memory and CPU.

  4. Click SAVE.

What to do next

All EDRS policy changes are logged in the SDDC Activity Log.