Manage Elasticity in SDDC Clusters

You can manage the Elastic DRS policy for each SDDC cluster to optimize cluster scaling to meet your workloads' needs.

For any policy, scale-out is triggered when a cluster reaches the high threshold for any resource. Regardless of the policy you choose, the storage scale-out threshold cannot be set to greater than 80%. Scale-in is triggered only after all of the low thresholds have been reached. See How the Elastic DRS Algorithm Works for more information about Elastic DRS (EDRS) scale-out and scale-in logic. The VMware Cloud on AWS console has an Elasticity tab that displays a grid with a row showing the EDRS policy currently applied to each cluster. To view or edit policy details, expand a row.

Note: In two-host SDDCs and stretched clusters with fewer than six hosts, only the Elastic DRS Baseline policy is available.

vSAN Component and Storage Utilization

Elastic DRS monitors vSAN component and storage utilization and sends warning notifications when utilization becomes high.

For the baseline policy, we notify you when vSAN component utilization exceeds 75% or storage utilization exceeds 70%. If vSAN component utilization exceeds 85% or storage utilization exceeds 80%, we also send you a "host addition" notification and auto-remediate the cluster by adding a host to maintain an optimal SDDC health. If you do not have an available subscription for the instance type and region, you must purchase one within 48 hours. If a host is added beyond available subscriptions and you do not purchase sufficient subscription coverage within 48 hours, you will lose access to your SDDC and workloads until you purchase sufficient subscriptions. After 48 hours of excess usage, the SDDC will be isolated, and soon after that, any hosts or SDDCs not covered by a subscription will be deleted to bring your usage in line with your existing subscription purchases.

For all other policies, we send a warning notification when vSAN component or resource utilization is within 10% of the resource threshold.

Read VMware Knowledge Base article 74695 to learn more about vSAN component utilization.

The following Elastic DRS policies are available:

Elastic DRS Baseline

This is the default policy for a new cluster. It cannot be disabled and always applies to a cluster. If another policy is selected, it might add hosts faster than the baseline policy would, or enable automatic cluster scale-in operations (host removal). but the baseline policy always determines the upper threshold. All policies add hosts after storage utilization reaches 80% or vSAN component utilization reaches 85%. All policies will breach the maximum host count if needed to maintain vSAN slack space. If an AWS Availability Zone failure occurs in a multi-AZ SDDC, hosts added to remediate the outage are considered replacements and are not billable. This policy has the following thresholds:


Resource	High Threshold	Low Threshold
CPU	N/A	N/A
Memory	N/A	N/A
Storage	80% utilization	N/A

Optimize for Best Performance

This policy adds hosts as needed to maintain performance and removes them only when resource consumption is significantly reduced. It does not remove hosts if it determines that the removal would degrade performance and force a near-term scale-out. It has the following thresholds:


Resource	High Threshold	Low Threshold
CPU	90% utilization	50% utilization
Memory	80% utilization	50% utilization
Storage	80% utilization	20% utilization

Optimize for Lowest Cost

When scaling in, this policy removes hosts quickly to maintain baseline performance while keeping host counts to a practical minimum. It removes hosts only if it anticipates that storage utilization would not result in a scale out in the near term after host removal. It has the following thresholds:


Resource	High Threshold	Low Threshold
CPU	90% utilization	60% utilization
Memory	80% utilization	60% utilization
Storage	80% utilization	40% utilization

Rapid Scaling

This policy adds multiple hosts at a time when needed for memory or CPU as long as sufficient subscriptions are available for the hosts. By default, hosts are added four at a time. You can specify a larger scale-out increment (8 or 12) if you need faster scaling for disaster recovery, Virtual Desktop Infrastructure (VDI), and similar use cases. As with any EDRS policy, scale-out time increases with increment size. When the increment is large (12 hosts), it can take up to 40 minutes to complete in some configurations.

Rapid scaling does not apply to storage. EDRS adds hosts incrementally when needed for storage.

When scaling in, this policy removes hosts rapidly, maintaining baseline performance while keeping host count to a practical minimum. It does not remove hosts if it anticipates that doing so would degrade performance and force a near-term scale-out. Scale-in stops when the cluster reaches the minimum host count or the number of hosts in the scale-out increment has been removed. This policy has the following thresholds:


Resource	High Threshold	Low Threshold
CPU	80% utilization	50% utilization
Memory	80% utilization	50% utilization
Storage	80% utilization	40% utilization

Custom Managed EDRS Policy

This policy allows you to configure policy parameters independently to ensure performance standards while optimizing the cost. You can set the high and low thresholds for all resources.

Scale-out is based on resource type (CPU, memory, and storage). Scale-out is triggered when any of the high thresholds have been reached, but scale-in requires all resource types to be below those thresholds. Scale-out can be disabled for CPU and memory but not for storage. Scale-in can be disabled on the cluster.

By default, this policy adds multiple hosts in parallel when needed for memory or CPU, and adds hosts one at a time when needed for storage. As with any EDRS policy, scale-out time increases with increment size. When the increment is large (12 hosts), it can take up to 40 minutes to complete in some configurations. This policy has the following threshold ranges:


Resource	High Threshold Range	Low Threshold Range
CPU	60%-95% utilization when enabled	5%-60%utilization
Memory	60%-95% utilization when enabled	5%-60%utilization
Storage	70%-80% utilization	5%-40% utilization

Note:

As a best practice, the gap between high and low thresholds should not be less than 15 percentage points. Larger gaps are less likely to force add/remove events for hosts. When setting scale-in thresholds, keep in mind that the data evacuation required when removing a host can temporarily impose additional load on remaining hosts. And be careful when configuring a Custom Managed EDRS Policy when cluster size approaches the 6-host SLA threshold for moving between FTT=1 and FTT=2. Any change in FTT forces background rebuilds of data, which impose additional load on the clusters.

Elastic DRS polices are governed by three variables:

Minimum cluster size: The smallest host count EDRS will scale in to regardless of resource utilization. When minimum cluster size is reached, EDRS can no longer perform a scale-in operation. You can still remove hosts manually as long as storage utilization remains below the minimum threshold and cluster size doesn't fall below minimum requirements (generally two hosts for a conventional cluster and six for a stretched cluster).
Maximum cluster size: The largest host count EDRS will scale out to regardless of resource utilization. Once maximum cluster size is reached, EDRS can no longer perform a scale-out operation for CPU or memory consumption, but can continue to add hosts for storage. You can always add hosts manually as long as cluster size doesn't exceed the maximum allowed for your organization.
Scale Increment: (Custom and Rapid Scaling policies only) The number of hosts added during a scale-out event or removed during a scale-in event for CPU and memory. The scale increment for storage is always a single host (1). In a conventional cluster (single AZ) the Custom Managed EDRS Policy supports increments of 1-6. In a stretched cluster, it supports even-numbered increments in the range 2-12.

Procedure

Log in to the VMware Cloud Console at https://vmc.vmware.com.
Click Inventory > SDDCs, then pick an SDDC and click VIEW DETAILS.
Select a cluster and specify the Elastic DRS policy you want it to use.

On the card for the cluster, click ACTIONS and choose Edit Elastic DRS Settings. You can also start by opening the Elasticity tab, which displays a grid with a row showing the EDRS policy currently applied to each cluster. To view or edit policy details, expand the row.

The Elastic DRS Baseline policy has no parameters. For other policies, specify a Minimum cluster size of 2 or more and a Maximum cluster size consistent with your expected workload resource consumption. The Maximum cluster size applies to CPU and Memory. When needed to maintain storage capacity and ensure data durability, the service can add more hosts than the number specified in Maximum cluster size.

The Custom Managed EDRS Policy provides default values for all resources. You can edit these to suit the needs of your workloads. You can also disable scaling for memory and CPU.
Click SAVE.

What to do next

All EDRS policy changes are logged in the SDDC Activity Log.