A cluster is a collection of ESXi hosts and associated virtual machines with shared resources and a shared management interface. Before you can obtain the benefits of cluster-level resource management you must create a cluster and activate DRS.
Depending on whether or not Enhanced vMotion Compatibility (EVC) is activated, DRS behaves differently when you use vSphere Fault Tolerance (vSphere FT) virtual machines in your cluster.
EVC | DRS (Load Balancing) | DRS (Initial Placement) |
---|---|---|
Enabled | Enabled (Primary and Secondary VMs) | Enabled (Primary and Secondary VMs) |
Disabled | Disabled (Primary and Secondary VMs) | Disabled (Primary VMs) Fully Automated (Secondary VMs) |
Admission Control and Initial Placement
When you attempt to power on a single virtual machine or a group of virtual machines in a DRS-enabled cluster, vCenter Server performs admission control. It checks that there are enough resources in the cluster to support the virtual machine(s).
If the cluster does not have sufficient resources to power on a single virtual machine, or any of the virtual machines in a group power-on attempt, a message appears. Otherwise, for each virtual machine, DRS generates a recommendation of a host on which to run the virtual machine and takes one of the following actions
- Automatically executes the placement recommendation.
- Displays the placement recommendation, which the user can then choose to accept or override.
Note: No initial placement recommendations are given for virtual machines on standalone hosts or in non-DRS clusters. When powered on, they are placed on the host where they currently reside.
- DRS considers network bandwidth. By calculating host network saturation, DRS is able to make better placement decisions. This can help avoid performance degradation of virtual machines with a more comprehensive understanding of the environment.
Single Virtual Machine Power On
In a DRS cluster, you can power on a single virtual machine and receive initial placement recommendations.
When you power on a single virtual machine, you have two types of initial placement recommendations:
-
A single virtual machine is being powered on and no prerequisite steps are needed.
The user is presented with a list of mutually exclusive initial placement recommendations for the virtual machine. You can select only one.
-
A single virtual machine is being powered on, but prerequisite actions are required.
These actions include powering on a host in standby mode or the migration of other virtual machines from one host to another. In this case, the recommendations provided have multiple lines, showing each of the prerequisite actions. The user can either accept this entire recommendation or cancel powering on the virtual machine.
Group Power-on
You can attempt to power on multiple virtual machines at the same time (group power-on).
Virtual machines selected for a group power-on attempt do not have to be in the same DRS cluster. They can be selected across clusters but must be within the same data center. It is also possible to include virtual machines located in non-DRS clusters or on standalone hosts. These virtual machines are powered on automatically and not included in any initial placement recommendation.
The initial placement recommendations for group power-on attempts are provided on a per-cluster basis. If all the placement-related actions for a group power-on attempt are in automatic mode, the virtual machines are powered on with no initial placement recommendation given. If placement-related actions for any of the virtual machines are in manual mode, the powering on of all the virtual machines (including the virtual machines that are in automatic mode) is manual. These actions are included in an initial placement recommendation.
For each DRS cluster that the virtual machines being powered on belong to, there is a single recommendation, which contains all the prerequisites (or no recommendation). All such cluster-specific recommendations are presented together under the Power On Recommendations tab.
When a nonautomatic group power-on attempt is made, and virtual machines not subject to an initial placement recommendation (that is, the virtual machines on standalone hosts or in non-DRS clusters) are included, vCenter Server attempts to power them on automatically. If these power-ons are successful, they are listed under the Started Power-Ons tab. Any virtual machines that fail to power-on are listed under the Failed Power-Ons tab.
Group Power-on
The user selects three virtual machines in the same data center for a group power-on attempt. The first two virtual machines (VM1 and VM2) are in the same DRS cluster (Cluster1), while the third virtual machine (VM3) is on a standalone host. VM1 is in automatic mode and VM2 is in manual mode. For this scenario, the user is presented with an initial placement recommendation for Cluster1 (under the Power On Recommendations tab) which consists of actions for powering on VM1 and VM2. An attempt is made to power on VM3 automatically and, if successful, it is listed under the Started Power-Ons tab. If this attempt fails, it is listed under the Failed Power-Ons tab.
Virtual Machine Migration
Although DRS performs initial placements so that load is balanced across the cluster, changes in virtual machine load and resource availability can cause the cluster to become unbalanced. To correct such imbalances, DRS generates migration recommendations.
If DRS is enabled on the cluster, load can be distributed more uniformly to reduce the degree of this imbalance. For example, the three hosts on the left side of the following figure are unbalanced. Assume that Host 1, Host 2, and Host 3 have identical capacity, and all virtual machines have the same configuration and load (which includes reservation, if set). However, because Host 1 has six virtual machines, its resources might be overused while ample resources are available on Host 2 and Host 3. DRS migrates (or recommends the migration of) virtual machines from Host 1 to Host 2 and Host 3. On the right side of the diagram, the properly load balanced configuration of the hosts that results appears.
When a cluster becomes unbalanced, DRS makes recommendations or migrates virtual machines, depending on the default automation level:
- If the cluster or any of the virtual machines involved are manual or partially automated, vCenter Server does not take automatic actions to balance resources. Instead, the Summary page indicates that migration recommendations are available and the DRS Recommendations page displays recommendations for changes that make the most efficient use of resources across the cluster.
-
If the cluster and virtual machines involved are all fully automated, vCenter Server migrates running virtual machines between hosts as needed to ensure efficient use of cluster resources.
Note: Even in an automatic migration setup, users can explicitly migrate individual virtual machines, but vCenter Server might move those virtual machines to other hosts to optimize cluster resources.
By default, automation level is specified for the whole cluster. You can also specify a custom automation level for individual virtual machines.
DRS Migration Threshold
The DRS migration threshold allows you to specify which recommendations are generated and then applied (when the virtual machines involved in the recommendation are in fully automated mode) or shown (if in manual mode). This threshold is a measure of how aggressive DRS is in recommending migrations to improve VM happiness.
You can move the threshold slider to use one of five settings, ranging from Conservative to Aggressive. The higher the agressiveness setting, the more frequently DRS might recommend migrations to improve VM happiness. The Conservative setting generates only priority-one recommendations (mandatory recommendations).
After a recommendation receives a priority level, this level is compared to the migration threshold you set. If the priority level is less than or equal to the threshold setting, the recommendation is either applied (if the relevant virtual machines are in fully automated mode) or displayed to the user for confirmation (if in manual or partially automated mode.)
DRS Score
Each migration recommendation is computed using the VM happiness metric which measures execution efficiency. This metric is displayed as DRS Score in the cluster's Summary tab in the vSphere Client. DRS load balancing recommendations attempt to improve the DRS score of a VM. The Cluster DRS score is a weighted average of the VM DRS Scores of all the powered on VMs in the cluster. The Cluster DRS Score is shown in the gauge component. The color of the filled in section changes depending on the value to match the corresponding bar in the VM DRS Score histogram. The bars in the histogram show the percentage of VMs that have a DRS Score in that range. You can view the list with server-side sorting and filtering by selecting the Monitor tab of the cluster and selecting vSphere DRS, which shows a list of the VMs in the cluster sorted by their DRS score in ascending order.
Migration Recommendations
If you create a cluster with a default manual or partially automated mode, vCenter Server displays migration recommendations on the DRS Recommendations page.
The system supplies as many recommendations as necessary to enforce rules and balance the resources of the cluster. Each recommendation includes the virtual machine to be moved, current (source) host and destination host, and a reason for the recommendation. The reason can be one of the following:
- Balance average CPU loads or reservations.
- Balance average memory loads or reservations.
- Satisfy resource pool reservations.
- Satisfy an affinity rule.
- Host is entering maintenance mode or standby mode.
DRS Cluster Requirements
Hosts that are added to a DRS cluster must meet certain requirements to use cluster features successfully.
Shared Storage Requirements
A DRS cluster has certain shared storage requirements.
Ensure that the managed hosts use shared storage. Shared storage is typically on a SAN, but can also be implemented using NAS shared storage.
See the vSphere Storage documentation for information about other shared storage.
Shared VMFS Volume Requirements
A DRS cluster has certain shared VMFS volume requirements.
Configure all managed hosts to use shared VMFS volumes.
- Place the disks of all virtual machines on VMFS volumes that are accessible by source and destination hosts.
- Ensure the VMFS volume is sufficiently large to store all virtual disks for your virtual machines.
- Ensure all VMFS volumes on source and destination hosts use volume names, and all virtual machines use those volume names for specifying the virtual disks.
.vmdk
virtual disk files). This requirement does not apply if all source and destination hosts are ESX Server 3.5 or higher and using host-local swap. In that case, vMotion with swap files on unshared storage is supported. Swap files are placed on a VMFS by default, but administrators might override the file location using advanced virtual machine configuration options.
Processor Compatibility Requirements
A DRS cluster has certain processor compatibility requirements.
To avoid limiting the capabilities of DRS, you should maximize the processor compatibility of source and destination hosts in the cluster.
vMotion transfers the running architectural state of a virtual machine between underlying ESXi hosts. vMotion compatibility means that the processors of the destination host must be able to resume execution using the equivalent instructions where the processors of the source host were suspended. Processor clock speeds and cache sizes might vary, but processors must come from the same vendor class (Intel versus AMD) and the same processor family to be compatible for migration with vMotion.
Processor families are defined by the processor vendors. You can distinguish different processor versions within the same family by comparing the processors’ model, stepping level, and extended features.
Sometimes, processor vendors have introduced significant architectural changes within the same processor family (such as 64-bit extensions and SSE3). VMware identifies these exceptions if it cannot guarantee successful migration with vMotion.
vCenter Server provides features that help ensure that virtual machines migrated with vMotion meet processor compatibility requirements. These features include:
- Enhanced vMotion Compatibility (EVC) – You can use EVC to help ensure vMotion compatibility for the hosts in a cluster. EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. This prevents migrations with vMotion from failing due to incompatible CPUs.
Configure EVC from the Cluster Settings dialog box. The hosts in a cluster must meet certain requirements for the cluster to use EVC. For information about EVC and EVC requirements, see the vCenter Server and Host Management documentation.
- CPU compatibility masks – vCenter Server compares the CPU features available to a virtual machine with the CPU features of the destination host to determine whether to allow or disallow migrations with vMotion. By applying CPU compatibility masks to individual virtual machines, you can hide certain CPU features from the virtual machine and potentially prevent migrations with vMotion from failing due to incompatible CPUs.
vMotion Requirements for DRS Clusters
A DRS cluster has certain vMotion requirements.
To enable the use of DRS migration recommendations, the hosts in your cluster must be part of a vMotion network. If the hosts are not in the vMotion network, DRS can still make initial placement recommendations.
To be configured for vMotion, each host in the cluster must meet the following requirements:
- vMotion does not support raw disks or migration of applications clustered using Microsoft Cluster Service (MSCS).
- vMotion requires a private Gigabit Ethernet migration network between all of the vMotion enabled managed hosts. When vMotion is enabled on a managed host, configure a unique network identity object for the managed host and connect it to the private migration network.
Configuring DRS with Virtual Flash
DRS can manage virtual machines that have virtual flash reservations.
Virtual flash capacity appears as a statistic that is regularly reported from the host to the vSphere Client. Each time DRS runs, it uses the most recent capacity value reported.
You can configure one virtual flash resource per host. This means that during virtual machine power-on time, DRS does not need to select between different virtual flash resources on a given host.
DRS selects a host that has sufficient available virtual flash capacity to start the virtual machine. If DRS cannot satisfy the virtual flash reservation of a virtual machine, it cannot be powered-on. DRS treats a powered-on virtual machine with a virtual flash reservation as having a soft affinity with its current host. DRS will not recommend such a virtual machine for vMotion except for mandatory reasons, such as putting a host in maintenance mode, or to reduce the load on an over utilized host.
Create a Cluster
A cluster is a group of hosts. When a host is added to a cluster, the host's resources become part of the cluster's resources. The cluster manages the resources of all hosts within it.
Prerequisites
- Verify that you have sufficient permissions to create a cluster object.
- Verify that a data center exists in the inventory.
- If you want to use vSAN, it must be enabled before you configure vSphere HA.
Procedure
Results
The cluster is added to the inventory.
What to do next
Edit Cluster Settings
When you add a host to a DRS cluster, the host’s resources become part of the cluster’s resources. In addition to this aggregation of resources, with a DRS cluster you can support cluster-wide resource pools and enforce cluster-level resource allocation policies.
The following cluster-level resource management capabilities are also available.
- Load Balancing
- The distribution and usage of CPU and memory resources for all hosts and virtual machines in the cluster are continuously monitored. DRS compares these metrics to an ideal resource usage given the attributes of the cluster’s resource pools and virtual machines, the current demand, and the imbalance target. DRS then provides recommendations or performs virtual machine migrations accordingly. See Virtual Machine Migration. When you power on a virtual machine in the cluster, DRS attempts to maintain proper load balancing by either placing the virtual machine on an appropriate host or making a recommendation. See Admission Control and Initial Placement.
- Power management
- When the vSphere Distributed Power Management (DPM) feature is enabled, DRS compares cluster and host-level capacity to the demands of the cluster’s virtual machines, including recent historical demand. DRS then recommends you place hosts in standby, or places hosts in standby power mode when sufficient excess capacity is found. DRS powers-on hosts if capacity is needed. Depending on the resulting host power state recommendations, virtual machines might need to be migrated to and from the hosts as well. See Managing Power Resources.
- Affinity Rules
- You can control the placement of virtual machines on hosts within a cluster, by assigning affinity rules. See Using Affinity Rules with vSphere DRS.
Prerequisites
Procedure
What to do next
You can view memory utilization for DRS in the vSphere Client. To find out more, see:
Set a Custom Automation Level for a Virtual Machine
After you create a DRS cluster, you can customize the automation level for individual virtual machines to override the cluster’s default automation level.
For example, you can select Manual for specific virtual machines in a cluster with full automation, or Partially Automated for specific virtual machines in a manual cluster.
If a virtual machine is set to Disabled, vCenter Server does not migrate that virtual machine or provide migration recommendations for it.
Procedure
Results
Other VMware products or features, such as vSphere vApp and vSphere Fault Tolerance, might override the automation levels of virtual machines in a DRS cluster. Refer to the product-specific documentation for details.
Deactivate DRS
You can turn off DRS for a cluster.
When DRS is deactivated:
- DRS affinity rules are not removed but are not applied until DRS is reactivated.
- Host and VM groups are not removed but are not applied until DRS is reactivated.
- Resource pools are permanently removed from the cluster. To avoid losing the resource pools, save a snapshot of the resource pool tree on your local machine. You can use the snapshot to restore the resource pool when you activate DRS.
Procedure
- Browse to the cluster in the vSphere Client.
- Click the Configure tab and click Services.
- Under vSphere DRS, click Edit.
- Deselect the Turn On vSphere DRS check box.
- Click OK to turn off DRS.
- (Optional) Choose an option to save the resource pool.
- Click Yes to save a resource pool tree snapshot on a local machine.
- Click No to turn off DRS without saving a resource pool tree snapshot.
Results
Restore a Resource Pool Tree
You can restore a previously saved resource pool tree snapshot.
Prerequisites
- vSphere DRS must be turned ON.
- You can restore a snapshot only on the same cluster that it was taken.
- No other resource pools are present in the cluster.
- Backup and restore must always be performed on the same version of vCenter and ESXi.
Procedure
- Browse to the cluster in the vSphere Client.
- Right-click on the cluster and select Restore Resource Pool Tree.
- Click Browse, and locate the snapshot file on your local machine.
- Click Open.
- Click OK to restore the resource pool tree.
DRS Awareness of vSAN Stretched Cluster
DRS Awareness of vSAN Stretched Cluster is available on stretched clusters with DRS enabled. A vSAN stretched cluster has read locality, where the VM reads data from a local site. Fetching reads from a remote site can affect VM performance. With DRS Awareness of vSAN Stretched Cluster, DRS is now fully aware of VM read locality and will place the VM on a site that can fully satisfy the read locality. This is automatic, there are no configurable options. DRS Awareness of vSAN Stretched Cluster works with existing affinity rules. It also works with VMware Cloud on AWS.
vSAN Stretched Cluster with vSphere HA and vSphere DRS provide resiliency by having two copies of data spread across two fault domains and a witness node in a third fault domain in case of failures. The two active fault domains provide replication of data so that both fault domains have a current copy of the data.
vSAN Stretched Cluster provides an automated method of moving workloads within the two fault domains. In case of full site failures, VMs are restarted on the secondary site by vSphere HA. This ensures that there is no downtime for critical production workloads. Once the primary site is back online, DRS immediately rebalances the VMs back to the primary site with soft affinity hosts. This process causes the VM to read and write from the secondary site while the VM data components are still rebuilding and might reduce VM performance.
In releases prior to vSphere 7.0 U2, we recommend that you change DRS from fully automated to partially automated mode, to avoid VMs migrating while resynchronization is in progress to the primary site. Set DRS back to fully automated only after the resynchronization is complete.
DRS Awareness of vSAN Stretched Cluster introduces a fully automated read locality solution for recovering from failures on a vSAN stretched cluster. The read locality information indicates the hosts the VM has full access to, and DRS uses this information when placing a VM on a host on vSAN Stretched Clusters. DRS prevents VMs from failing back to the primary site when vSAN resynchronization is still in progress during the site recovery phase. DRS automatically migrates a VM back to the primary affined site when its data components have achieved full read locality. This allows you to operate DRS in fully automatic mode in case of full site failures.
In case of partial site failures, if a VM loses read locality due to loss of data components greater than or equal to its Failures to Tolerate vSphere DRS will identify the VMs that consume a very high read bandwidth and try to rebalance them to the secondary site. This ensures that VMs with read-heavy workloads do not decrease during partial site failures. Once the primary site is back online and the data components have completed resynchronization, the VM is moved back to the site it is affined to.
DRS Placement of vGPUs
DRS distributes vGPU VMs across a cluster's hosts.
DRS will distribute vGPU VMs in a breadth-first manner across a cluster's hosts. Fractional vGPU profile allocation for a VM may be subject to homogeneous profile mutual exclusion rules.
- Manually migrate vGPU VMs to a desired host to open up unused Physical GPU capacity.
- Use the same vGPU profile configuration in all vGPU VMs in a Cluster.
- Enable Host "GPU Consolidation", please see Configuring Host Graphics for more information.
- If DRS Automation is active, consider putting the Cluster or VM into Partially Automated mode, please see Edit Cluster Settings for more information.
DRS Overhead Memory Management for VMs
In vSphere 8.0 U3 DRS has enhanced overhead memory management for VMs which are being reconfigured.
In VMware vSphere, overhead memory refers to the amount of memory that is used by ESXi to manage a virtual machine (VM). This memory is necessary for ESXi to perform its functions and is separate from the guest memory allocated to the VMs. The amount of overhead memory depends on several factors, including the number of virtual CPUs (vCPUs), the amount of memory allocated to the VM, and the VM's configuration and hardware version. More vCPUs and larger memory allocations lead to higher overhead memory consumption. In vSphere, DRS works with ESXi memory management to ensure the VM has optimal overhead memory usage. DRS manages the overhead memory by setting up the VM overhead memory limit, and the ESXi is allowed to consume the overhead memory within the limit.
Reconfiguring a VM in VMware vSphere can directly impact the overhead memory that ESXi requires to manage a VM. When you change a VM's configuration, such as modifying the number of vCPUs, the amount of assigned RAM, or adding virtual hardware like network adapters or disk controllers, the overhead memory requirements can change. For example, reconfiguring a VM reservation from 250GB to 0GB requires about 25MB of overhead memory usage. ESXi allocates 25MB of additional overhead memory to manage the page table mapping between virtual and physical pages. vSphere monitors and manages these changes. However, previous vSphere releases were not customized to accommodate these overhead memory increases. If the new overhead memory increase exceeds the overhead limit, it can cause a reconfiguration failure.
In vSphere 8.0 U3, DRS proactively updates the VM's overhead memory limit before any reconfiguration. DRS checks various factors, including the VM's resource specifications, IO filters, and other elements impacting overhead memory. DRS ensures that the new overhead limit accommodates the expected increase in overhead memory due to the updated VM specifications after reconfiguration, which optimizes VM performance and stability.
DRS enhanced overhead memory management can help to prevent reconfiguration failures by intelligently managing overhead memory limits before a VM undergoes reconfiguration, which dramatically reduces the risk of reconfiguration failures. This proactive approach ensures a more reliable experience. By optimizing VM performance and stability, virtual environments can run efficiently and without interruption, particularly during critical reconfiguration processes. This enhancement seamlessly integrates into your existing vSphere environment, while increasing performance and reliability.