A cluster is a collection of ESXi hosts and associated virtual machines with shared resources and a shared management interface. Before you can obtain the benefits of cluster-level resource management you must create a cluster and activate DRS.

Depending on whether or not Enhanced vMotion Compatibility (EVC) is activated, DRS behaves differently when you use vSphere Fault Tolerance (vSphere FT) virtual machines in your cluster.

Table 1. DRS Behavior with vSphere FT Virtual Machines and EVC
EVC DRS (Load Balancing) DRS (Initial Placement)
Enabled Enabled (Primary and Secondary VMs) Enabled (Primary and Secondary VMs)
Disabled Disabled (Primary and Secondary VMs)

Disabled (Primary VMs)

Fully Automated (Secondary VMs)

Admission Control and Initial Placement

When you attempt to power on a single virtual machine or a group of virtual machines in a DRS-enabled cluster, vCenter Server performs admission control. It checks that there are enough resources in the cluster to support the virtual machine(s).

If the cluster does not have sufficient resources to power on a single virtual machine, or any of the virtual machines in a group power-on attempt, a message appears. Otherwise, for each virtual machine, DRS generates a recommendation of a host on which to run the virtual machine and takes one of the following actions

  • Automatically executes the placement recommendation.
  • Displays the placement recommendation, which the user can then choose to accept or override.
    Note: No initial placement recommendations are given for virtual machines on standalone hosts or in non-DRS clusters. When powered on, they are placed on the host where they currently reside.
  • DRS considers network bandwidth. By calculating host network saturation, DRS is able to make better placement decisions. This can help avoid performance degradation of virtual machines with a more comprehensive understanding of the environment.

Single Virtual Machine Power On

In a DRS cluster, you can power on a single virtual machine and receive initial placement recommendations.

When you power on a single virtual machine, you have two types of initial placement recommendations:

  • A single virtual machine is being powered on and no prerequisite steps are needed.

    The user is presented with a list of mutually exclusive initial placement recommendations for the virtual machine. You can select only one.

  • A single virtual machine is being powered on, but prerequisite actions are required.

    These actions include powering on a host in standby mode or the migration of other virtual machines from one host to another. In this case, the recommendations provided have multiple lines, showing each of the prerequisite actions. The user can either accept this entire recommendation or cancel powering on the virtual machine.

Group Power-on

You can attempt to power on multiple virtual machines at the same time (group power-on).

Virtual machines selected for a group power-on attempt do not have to be in the same DRS cluster. They can be selected across clusters but must be within the same data center. It is also possible to include virtual machines located in non-DRS clusters or on standalone hosts. These virtual machines are powered on automatically and not included in any initial placement recommendation.

The initial placement recommendations for group power-on attempts are provided on a per-cluster basis. If all the placement-related actions for a group power-on attempt are in automatic mode, the virtual machines are powered on with no initial placement recommendation given. If placement-related actions for any of the virtual machines are in manual mode, the powering on of all the virtual machines (including the virtual machines that are in automatic mode) is manual. These actions are included in an initial placement recommendation.

For each DRS cluster that the virtual machines being powered on belong to, there is a single recommendation, which contains all the prerequisites (or no recommendation). All such cluster-specific recommendations are presented together under the Power On Recommendations tab.

When a nonautomatic group power-on attempt is made, and virtual machines not subject to an initial placement recommendation (that is, the virtual machines on standalone hosts or in non-DRS clusters) are included, vCenter Server attempts to power them on automatically. If these power-ons are successful, they are listed under the Started Power-Ons tab. Any virtual machines that fail to power-on are listed under the Failed Power-Ons tab.

Group Power-on

The user selects three virtual machines in the same data center for a group power-on attempt. The first two virtual machines (VM1 and VM2) are in the same DRS cluster (Cluster1), while the third virtual machine (VM3) is on a standalone host. VM1 is in automatic mode and VM2 is in manual mode. For this scenario, the user is presented with an initial placement recommendation for Cluster1 (under the Power On Recommendations tab) which consists of actions for powering on VM1 and VM2. An attempt is made to power on VM3 automatically and, if successful, it is listed under the Started Power-Ons tab. If this attempt fails, it is listed under the Failed Power-Ons tab.

Virtual Machine Migration

Although DRS performs initial placements so that load is balanced across the cluster, changes in virtual machine load and resource availability can cause the cluster to become unbalanced. To correct such imbalances, DRS generates migration recommendations.

If DRS is enabled on the cluster, load can be distributed more uniformly to reduce the degree of this imbalance. For example, the three hosts on the left side of the following figure are unbalanced. Assume that Host 1, Host 2, and Host 3 have identical capacity, and all virtual machines have the same configuration and load (which includes reservation, if set). However, because Host 1 has six virtual machines, its resources might be overused while ample resources are available on Host 2 and Host 3. DRS migrates (or recommends the migration of) virtual machines from Host 1 to Host 2 and Host 3. On the right side of the diagram, the properly load balanced configuration of the hosts that results appears.

Figure 1. Load Balancing

This figure shows how DRS rebalances a cluster.

When a cluster becomes unbalanced, DRS makes recommendations or migrates virtual machines, depending on the default automation level:

  • If the cluster or any of the virtual machines involved are manual or partially automated, vCenter Server does not take automatic actions to balance resources. Instead, the Summary page indicates that migration recommendations are available and the DRS Recommendations page displays recommendations for changes that make the most efficient use of resources across the cluster.
  • If the cluster and virtual machines involved are all fully automated, vCenter Server migrates running virtual machines between hosts as needed to ensure efficient use of cluster resources.

    Note: Even in an automatic migration setup, users can explicitly migrate individual virtual machines, but vCenter Server might move those virtual machines to other hosts to optimize cluster resources.

By default, automation level is specified for the whole cluster. You can also specify a custom automation level for individual virtual machines.

DRS Migration Threshold

The DRS migration threshold allows you to specify which recommendations are generated and then applied (when the virtual machines involved in the recommendation are in fully automated mode) or shown (if in manual mode). This threshold is a measure of how aggressive DRS is in recommending migrations to improve VM happiness.

You can move the threshold slider to use one of five settings, ranging from Conservative to Aggressive. The higher the agressiveness setting, the more frequently DRS might recommend migrations to improve VM happiness. The Conservative setting generates only priority-one recommendations (mandatory recommendations).

After a recommendation receives a priority level, this level is compared to the migration threshold you set. If the priority level is less than or equal to the threshold setting, the recommendation is either applied (if the relevant virtual machines are in fully automated mode) or displayed to the user for confirmation (if in manual or partially automated mode.)

DRS Score

Each migration recommendation is computed using the VM happiness metric which measures execution efficiency. This metric is displayed as DRS Score in the cluster's Summary tab in the vSphere Client. DRS load balancing recommendations attempt to improve the DRS score of a VM. The Cluster DRS score is a weighted average of the VM DRS Scores of all the powered on VMs in the cluster. The Cluster DRS Score is shown in the gauge component. The color of the filled in section changes depending on the value to match the corresponding bar in the VM DRS Score histogram. The bars in the histogram show the percentage of VMs that have a DRS Score in that range. You can view the list with server-side sorting and filtering by selecting the Monitor tab of the cluster and selecting vSphere DRS, which shows a list of the VMs in the cluster sorted by their DRS score in ascending order.

Migration Recommendations

If you create a cluster with a default manual or partially automated mode, vCenter Server displays migration recommendations on the DRS Recommendations page.

The system supplies as many recommendations as necessary to enforce rules and balance the resources of the cluster. Each recommendation includes the virtual machine to be moved, current (source) host and destination host, and a reason for the recommendation. The reason can be one of the following:

  • Balance average CPU loads or reservations.
  • Balance average memory loads or reservations.
  • Satisfy resource pool reservations.
  • Satisfy an affinity rule.
  • Host is entering maintenance mode or standby mode.
Note: If you are using the vSphere Distributed Power Management (DPM) feature, in addition to migration recommendations, DRS provides host power state recommendations.

DRS Cluster Requirements

Hosts that are added to a DRS cluster must meet certain requirements to use cluster features successfully.

Note: vSphere DRS is a critical feature of vSphere which is required to maintain the health of the workloads running inside vSphere Cluster. Starting with vSphere 7.0 Update 1, DRS depends on the availability of vCLS VMs. See vSphere Cluster Services for more information.

Shared Storage Requirements

A DRS cluster has certain shared storage requirements.

Ensure that the managed hosts use shared storage. Shared storage is typically on a SAN, but can also be implemented using NAS shared storage.

See the vSphere Storage documentation for information about other shared storage.

Shared VMFS Volume Requirements

A DRS cluster has certain shared VMFS volume requirements.

Configure all managed hosts to use shared VMFS volumes.

  • Place the disks of all virtual machines on VMFS volumes that are accessible by source and destination hosts.
  • Ensure the VMFS volume is sufficiently large to store all virtual disks for your virtual machines.
  • Ensure all VMFS volumes on source and destination hosts use volume names, and all virtual machines use those volume names for specifying the virtual disks.
Note: Virtual machine swap files also need to be on a VMFS accessible to source and destination hosts (just like .vmdk virtual disk files). This requirement does not apply if all source and destination hosts are ESX Server 3.5 or higher and using host-local swap. In that case, vMotion with swap files on unshared storage is supported. Swap files are placed on a VMFS by default, but administrators might override the file location using advanced virtual machine configuration options.

Processor Compatibility Requirements

A DRS cluster has certain processor compatibility requirements.

To avoid limiting the capabilities of DRS, you should maximize the processor compatibility of source and destination hosts in the cluster.

vMotion transfers the running architectural state of a virtual machine between underlying ESXi hosts. vMotion compatibility means that the processors of the destination host must be able to resume execution using the equivalent instructions where the processors of the source host were suspended. Processor clock speeds and cache sizes might vary, but processors must come from the same vendor class (Intel versus AMD) and the same processor family to be compatible for migration with vMotion.

Processor families are defined by the processor vendors. You can distinguish different processor versions within the same family by comparing the processors’ model, stepping level, and extended features.

Sometimes, processor vendors have introduced significant architectural changes within the same processor family (such as 64-bit extensions and SSE3). VMware identifies these exceptions if it cannot guarantee successful migration with vMotion.

vCenter Server provides features that help ensure that virtual machines migrated with vMotion meet processor compatibility requirements. These features include:

  • Enhanced vMotion Compatibility (EVC) – You can use EVC to help ensure vMotion compatibility for the hosts in a cluster. EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. This prevents migrations with vMotion from failing due to incompatible CPUs.

    Configure EVC from the Cluster Settings dialog box. The hosts in a cluster must meet certain requirements for the cluster to use EVC. For information about EVC and EVC requirements, see the vCenter Server and Host Management documentation.

  • CPU compatibility masks – vCenter Server compares the CPU features available to a virtual machine with the CPU features of the destination host to determine whether to allow or disallow migrations with vMotion. By applying CPU compatibility masks to individual virtual machines, you can hide certain CPU features from the virtual machine and potentially prevent migrations with vMotion from failing due to incompatible CPUs.

vMotion Requirements for DRS Clusters

A DRS cluster has certain vMotion requirements.

To enable the use of DRS migration recommendations, the hosts in your cluster must be part of a vMotion network. If the hosts are not in the vMotion network, DRS can still make initial placement recommendations.

To be configured for vMotion, each host in the cluster must meet the following requirements:

  • vMotion does not support raw disks or migration of applications clustered using Microsoft Cluster Service (MSCS).
  • vMotion requires a private Gigabit Ethernet migration network between all of the vMotion enabled managed hosts. When vMotion is enabled on a managed host, configure a unique network identity object for the managed host and connect it to the private migration network.

Configuring DRS with Virtual Flash

DRS can manage virtual machines that have virtual flash reservations.

Virtual flash capacity appears as a statistic that is regularly reported from the host to the vSphere Client. Each time DRS runs, it uses the most recent capacity value reported.

You can configure one virtual flash resource per host. This means that during virtual machine power-on time, DRS does not need to select between different virtual flash resources on a given host.

DRS selects a host that has sufficient available virtual flash capacity to start the virtual machine. If DRS cannot satisfy the virtual flash reservation of a virtual machine, it cannot be powered-on. DRS treats a powered-on virtual machine with a virtual flash reservation as having a soft affinity with its current host. DRS will not recommend such a virtual machine for vMotion except for mandatory reasons, such as putting a host in maintenance mode, or to reduce the load on an over utilized host.

Create a Cluster

A cluster is a group of hosts. When a host is added to a cluster, the host's resources become part of the cluster's resources. The cluster manages the resources of all hosts within it.

Clusters enable the vSphere High Availability (HA) and vSphere Distributed Resource Scheduler (DRS) solutions.
Note: vSphere DRS is a critical feature of vSphere which is required to maintain the health of the workloads running inside vSphere Cluster. Starting with vSphere 7.0 Update 1, DRS depends on the availability of vCLS VMs. See vSphere Cluster Services for more information.

Prerequisites

  • Verify that you have sufficient permissions to create a cluster object.
  • Verify that a data center exists in the inventory.
  • If you want to use vSAN, it must be enabled before you configure vSphere HA.

Procedure

  1. Browse to a data center in the vSphere Client.
  2. Right-click the data center and select New Cluster.
  3. Enter a name for the cluster.
  4. Select DRS and vSphere HA cluster features.
    Option Description
    To use DRS with this cluster
    1. Select the DRS Turn ON check box.
    2. Select an automation level and a migration threshold.
    To use HA with this cluster
    1. Select the vSphere HA Turn ON check box.
    2. Select whether to enable host monitoring and admission control.
    3. If admission control is enabled, specify a policy.
    4. Select a VM Monitoring option.
    5. Specify the virtual machine monitoring sensitivity.
  5. Select an Enhanced vMotion Compatibility (EVC) setting.
    EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if the actual CPUs on the hosts differ. This prevents migrations with vMotion from failing due to incompatible CPUs.
  6. Click OK.

Results

The cluster is added to the inventory.

What to do next

Add hosts and resource pools to the cluster.
Note: Under the Cluster Summary page, you can see Cluster Services which displays vSphere Cluster Services health status.

Edit Cluster Settings

When you add a host to a DRS cluster, the host’s resources become part of the cluster’s resources. In addition to this aggregation of resources, with a DRS cluster you can support cluster-wide resource pools and enforce cluster-level resource allocation policies.

The following cluster-level resource management capabilities are also available.

Load Balancing
The distribution and usage of CPU and memory resources for all hosts and virtual machines in the cluster are continuously monitored. DRS compares these metrics to an ideal resource usage given the attributes of the cluster’s resource pools and virtual machines, the current demand, and the imbalance target. DRS then provides recommendations or performs virtual machine migrations accordingly. See Virtual Machine Migration. When you power on a virtual machine in the cluster, DRS attempts to maintain proper load balancing by either placing the virtual machine on an appropriate host or making a recommendation. See Admission Control and Initial Placement.
Power management
When the vSphere Distributed Power Management (DPM) feature is enabled, DRS compares cluster and host-level capacity to the demands of the cluster’s virtual machines, including recent historical demand. DRS then recommends you place hosts in standby, or places hosts in standby power mode when sufficient excess capacity is found. DRS powers-on hosts if capacity is needed. Depending on the resulting host power state recommendations, virtual machines might need to be migrated to and from the hosts as well. See Managing Power Resources.
Affinity Rules
You can control the placement of virtual machines on hosts within a cluster, by assigning affinity rules. See Using DRS Affinity Rules.

Prerequisites

You can create a cluster without a special license, but you must have a license to enable a cluster for vSphere DRS or vSphere HA.
Note: vSphere DRS is a critical feature of vSphere which is required to maintain the health of the workloads running inside vSphere Cluster. Starting with vSphere 7.0 Update 1, DRS depends on the availability of vCLS VMs. See vSphere Cluster Services for more information.

Procedure

  1. Browse to a cluster in the vSphere Client.
  2. Click the Configure tab and click Services.
  3. Under vSphere DRS click Edit.
  4. Under DRS Automation, select a default automation level for DRS.
    Automation Level Action
    Manual
    • Initial placement: Recommended host is displayed.
    • Migration: Recommendation is displayed.
    Partially Automated
    • Initial placement: Automatic.
    • Migration: Recommendation is displayed.
    Fully Automated
    • Initial placement: Automatic.
    • Migration: Recommendation is run automatically.
  5. Set the Migration Threshold for DRS.
  6. Select the Predictive DRS check box. In addition to real-time metrics, DRS responds to forecasted metrics provided by vRealize Operations server. You must also configure Predictive DRS in a version of vRealize Operations that supports this feature.
  7. Select Virtual Machine Automation check box to enable individual virtual machine automation levels.
    Override for individual virtual machines can be set from the VM Overrides page.
  8. Under Additional Options, select a check box to enforce one of the default policies.
    Option Description
    VM Distribution For availability, distribute a more even number of virtual machines across hosts. This is secondary to DRS load balancing.
    Memory Metric for Load Balancing Load balance based on consumed memory of virtual machines rather than active memory. This setting is only recommended for clusters where host memory is not over-committed.
    Note: This setting is no longer supported and will not be displayed in vCenter 7.0.
    CPU Over-Commitment Control CPU over-commitment in the cluster.
    Scalable Shares Enable scalable shares for the resource pools on this cluster.
  9. Under Power Management, select Automation Level.
  10. If DPM is enabled, set the DPM Threshold.
  11. Click OK.

What to do next

Note: Under the Cluster Summary page, you can see Cluster Services which displays vSphere Cluster Services health status.

You can view memory utilization for DRS in the vSphere Client. To find out more, see:

Set a Custom Automation Level for a Virtual Machine

After you create a DRS cluster, you can customize the automation level for individual virtual machines to override the cluster’s default automation level.

For example, you can select Manual for specific virtual machines in a cluster with full automation, or Partially Automated for specific virtual machines in a manual cluster.

If a virtual machine is set to Disabled, vCenter Server does not migrate that virtual machine or provide migration recommendations for it.

Procedure

  1. Browse to the cluster in the vSphere Client.
  2. Click the Configure tab and click Services.
  3. Under Services, select vSphere DRS and click Edit. Expand DRS Automation.
  4. Select the Enable individual virtual machine automation levels check box.
  5. To temporarily deactivate any individual virtual machine overrides, deselect the Enable individual virtual machine automation levels check box.
    Virtual machine settings are restored when the check box is selected again.
  6. To temporarily suspend all vMotion activity in a cluster, put the cluster in manual mode and deselect the Enable individual virtual machine automation levels check box.
  7. Select one or more virtual machines.
  8. Click the Automation Level column and select an automation level from the drop-down menu.
    Option Description
    Manual

    Placement and migration recommendations are displayed, but do not run until you manually apply the recommendation.

    Fully Automated

    Placement and migration recommendations run automatically.

    Partially Automated Initial placement is performed automatically. Migration recommendations are displayed, but do not run.
    Disabled

    vCenter Server does not migrate the virtual machine or provide migration recommendations for it.

  9. Click OK.

Results

Note:

Other VMware products or features, such as vSphere vApp and vSphere Fault Tolerance, might override the automation levels of virtual machines in a DRS cluster. Refer to the product-specific documentation for details.

Deactivate DRS

You can turn off DRS for a cluster.

When DRS is deactivated, the cluster’s resource pool hierarchy and affinity rules are not reestablished when DRS is turned back on. If you deactivate DRS, the resource pools are removed from the cluster. To avoid losing the resource pools, save a snapshot of the resource pool tree on your local machine. You can use the snapshot to restore the resource pool when you activate DRS.

Procedure

  1. Browse to the cluster in the vSphere Client.
  2. Click the Configure tab and click Services.
  3. Under vSphere DRS, click Edit.
  4. Deselect the Turn On vSphere DRS check box.
  5. Click OK to turn off DRS.
  6. (Optional) Choose an option to save the resource pool.
    • Click Yes to save a resource pool tree snapshot on a local machine.
    • Click No to turn off DRS without saving a resource pool tree snapshot.

Results

DRS is turned off.
Note: vSphere DRS is a critical feature of vSphere which is required to maintain the health of the workloads running inside vSphere Cluster. Starting with vSphere 7.0 Update 1, DRS depends on the availability of vCLS VMs. See vSphere Cluster Services for more information.

Restore a Resource Pool Tree

You can restore a previously saved resource pool tree snapshot.

Prerequisites

  • vSphere DRS must be turned ON.
  • You can restore a snapshot only on the same cluster that it was taken.
  • No other resource pools are present in the cluster.

Procedure

  1. Browse to the cluster in the vSphere Client.
  2. Right-click on the cluster and select Restore Resource Pool Tree.
  3. Click Browse, and locate the snapshot file on your local machine.
  4. Click Open.
  5. Click OK to restore the resource pool tree.

DRS Awareness of vSAN Stretched Cluster

DRS Awareness of vSAN Stretched Cluster is available on stretched clusters with DRS enabled using vSphere 7.0 U2. A vSAN stretched cluster has read locality, where the VM reads data from a local site. Fetching reads from a remote site can affect VM performance. In releases prior to vSphere 7.0 U2, DRS had no awareness of read locality for a vSAN stretched clusters and might inadvertently place a VM on a remote site with no read locality. With DRS Awareness of vSAN Stretched Cluster, DRS is now fully aware of VM read locality and will place the VM on a site that can fully satisfy the read locality. This is automatic, there are no configurable options. DRS Awareness of vSAN Stretched Cluster works with existing affinity rules. It works with vSphere 7.0 U2 and VMware Cloud on AWS.

vSAN Stretched Cluster with vSphere HA and vSphere DRS provide resiliency by having two copies of data spread across two fault domains and a witness node in a third fault domain in case of failures. The two active fault domains provide replication of data so that both fault domains have a current copy of the data.

vSAN Stretched Cluster provides an automated method of moving workloads within the two fault domains. In case of full site failures, VMs are restarted on the secondary site by vSphere HA. This ensures that there is no downtime for critical production workloads. Once the primary site is back online, DRS immediately rebalances the VMs back to the primary site with soft affinity hosts. This process causes the VM to read and write from the secondary site while the VM data components are still rebuilding and might reduce VM performance.

In releases prior to vSphere 7.0 U2, we recommend that you change DRS from fully automated to partially automated mode, to avoid VMs migrating while resynchronization is in progress to the primary site. Set DRS back to fully automated only after the resynchronization is complete.

With vSphere 7.0 U2, DRS Awareness of vSAN Stretched Cluster introduces a fully automated read locality solution for recovering from failures on a vSAN stretched cluster. The read locality information indicates the hosts the VM has full access to, and DRS uses this information when placing a VM on a host on vSAN Stretched Clusters. DRS prevents VMs from failing back to the primary site when vSAN resynchronization is still in progress during the site recovery phase. DRS automatically migrates a VM back to the primary affined site when its data components have achieved full read locality. This allows you to operate DRS in fully automatic mode in case of full site failures.

In case of partial site failures, if a VM loses read locality due to loss of data components greater than or equal to its Failures to Tolerate vSphere DRS will identify the VMs that consume a very high read bandwidth and try to rebalance them to the secondary site. This ensures that VMs with read-heavy workloads do not suffer during partial site failures. Once the primary site is back online and the data components have completed resynchronization, the VM is moved back to the site it is affined to.