vSphere Cluster Services (vCLS) is activated by default and runs in all vSphere clusters. vCLS ensures that if vCenter Server becomes unavailable, cluster services remain available to maintain the resources and health of the workloads that run in the clusters. vCenter Server is still required to run DRS and HA.

vCLS is activated when you upgrade to vSphere 7.0 Update 3 or when you have a new vSphere 7.0 Update 3 or later deployment. vCLS is upgraded as part of vCenter Server upgrade.

vCLS uses agent virtual machines to maintain cluster services health. The vCLS agent virtual machines (vCLS VMs) are created when you add hosts to clusters. Up to three vCLS VMs are required to run in each vSphere cluster, distributed within a cluster. vCLS is also activated on clusters which contain only one or two hosts. In these clusters the number of vCLS VMs is one and two, respectively.

New anti-affinity rules are applied automatically. Every three minutes a check is performed, if multiple vCLS VMs are located on a single host they will be automatically redistributed to different hosts.

Table 1. Number of vCLS Agent VMs in Clusters
Number of Hosts in a Cluster Number of vCLS Agent VMs
1 1
2 2
3 or more 3

vCLS VMs run in every cluster even if cluster services like vSphere DRS or vSphere HA are not activated on the cluster. The life cycle operations of vCLS VMs are managed by vCenter Server services like ESX Agent Manager and Workload Control Plane. vCLS VMs do not support NICs.

A cluster activated with vCLS can contain ESXi hosts of different versions if the ESXi versions are compatible with vCenter Server. vCLS works with vSphere Lifecycle Manager clusters.

vSphere DRS and vCLS VMs

vSphere DRS is a critical feature of vSphere which is required to maintan the health of the workloads running inside vSphere cluster. DRS depends on the availability of vCLS VMs.

Note: If you try to activate DRS on a cluster where there are issues with the vCLS VMs, a warning message is displayed on the Cluster Summary page.
Note: If DRS is on but there are issues with the vCLS VMs, you must resolve these issues for DRS to operate. A warning message is displayed on the Cluster Summary page.

If DRS is non-functional this does not mean that DRS is deactivated. Existing DRS settings and resource pools survive across a lost vCLS VMs quorum. vCLS health turns Unhealthy only in a DRS activated cluster when vCLS VMs are not running and the first instance of DRS is skipped because of this. vCLS health will stay Degraded on a non-DRS activated cluster when at least one vCLS VM is not running.

Datastore selection for vCLS VMs

The datastore for vCLS VMs is automatically selected based on ranking all the datastores connected to the hosts inside the cluster.

A datastore is more likely to be selected if there are hosts in the cluster with free reserved DRS slots connected to the datastore. The algorithm tries to place vCLS VMs in a shared datastore if possible before selecting a local datastore. A datastore with more free space is preferred and the algorithm tries not to place more than one vCLS VM on the same datastore. You can only change the datastore of vCLS VMs after they are deployed and powered on.

If you want to move the VMDKs for vCLS VMs to a different datastore or attach a different storage policy, you can reconfigure vCLS VMs. A warning message is displayed when you perform this operation.

You can perform a storage vMotion to migrate vCLS VMs to a different datastore. You can tag vCLS VMs or attach custom attributes if you want to group them separately from workload VMs, for instance if you have a specific meta-data strategy for all VMs that run in a data center.

Note: When a datastore is placed in maintenance mode, if the datastore hosts vCLS VMs, you must manually apply storage vMotion to the vCLS VMs to move them to a new location or put the cluster in retreat mode. A warning message is displayed.
The enter maintenance mode task will start but cannot finish because there is 1 virtual machine residing on the datastore. You can always cancel the task in your Recent Tasks if you decide to continue.
The selected datastore might be storing vSphere Cluster Services VMs which cannot be powered off. To ensure the health of vSphere Cluster Services, these VMs have to be manually vMotioned to a different datastore within the cluster prior to taking this datastore down for maintenance. Refer to this KB article: KB 79892.
Select the check box Let me migrate storage for all virtual machines and continue entering maintenance mode after migration. to proceed.

vCLS Datastore Placement

You can override default vCLS VM datastore placement.

vSphere Cluster Services (vCLS) VM datastore location is chosen by a default datastore selection logic. To override the default vCLS VM datastore placement for a cluster, you can specify a set of allowed datastores by browsing to the cluster and clicking ADD under Configure > vSphere Cluster Service > Datastores. Some datastores cannot be selected for vCLS because they are blocked by solutions like SRM or vSAN maintenance mode where vCLS cannot be configured. Users cannot add or remove Solution blocked datastores for vCLS VMs.

Monitoring vSphere Cluster Services

You can monitor the resources consumed by vCLS VMs and their health status.

vCLS VMs are not displayed in the inventory tree in the Hosts and Clusters tab. vCLS VMs from all clusters within a data center are placed inside a separate VMs and templates folder named vCLS. This folder and the vCLS VMs are visible only in the VMs and Templates tab of the vSphere Client. These VMs are identified by a different icon than regular workload VMs. You can view information about the purpose of the vCLS VMs in the Summary tab of the vCLS VMs.

You can monitor the resources consumed by vCLS VMs in the Monitor tab.

Table 2. vCLS VM Resource Allocation
Property Size
VMDK size 245 MB (thin disk)
Memory 128 MB
CPU 1 vCPU
Hard disk 2 GB
Storage on datastore 480 MB (thin disk)
Note: Each vCLS VM has 100MHz and 100MB capacity reserved in the cluster. Depending on the number of vCLS VMs running in the cluster, a max of 400 MHz and 400 MB of capacity can be reserved for these VMs.

You can monitor the health status of vCLS in the Cluster Services portlet displayed in the Summary tab of the cluster.

Table 3. Health status of vCLS
Status Color Coding Summary
Healthy Green If there is at least one vCLS VM running, the status remains healthy, regardless of the number of hosts in the cluster.
Degraded Yellow If there is no vCLS VM running for less than 3 minutes (180 seconds), the status is degraded.
Unhealthy Red If there is no vCLS VM running for 3 minutes or more, the status is unhealthy in a DRS enabled cluster.

Maintaining Health of vSphere Cluster Services

vCLS VMs are always powered-on because vSphere DRS depends on the availability of these VMs. These VMs should be treated as system VMs. Only administrators can perform selective operations on vCLS VMs. To avoid failure of cluster services, avoid performing any configuration or operations on the vCLS VMs.

vCLS VMs are protected from accidental deletion. Cluster VMs and folders are protected from modification by users, including administrators.

Only users which are part of the Administrators SSO group can perform the following operations::

  • ReadOnly access for vCLS VMs
  • Console access to vCLS VMs
  • Relocate vCLS VMs to either new storage, compute resource or both using cold or hot migration
  • Use tags and custom attributes for vCLS VMs

Operations that might disrupt the healthy functioning of vCLS VMs:

  • Changing the power state of the vCLS VMs
  • Resource reconfiguration of the vCLS VMs such as changing CPU, Memory, Disk size, Disk placement
  • VM encryption
  • Triggering vMotion of the vCLS VMs
  • Changing the BIOS
  • Removing the vCLS VMs from the inventory
  • Deleting the vCLS VMs from disk
  • Enabling FT of vCLS VMs
  • Cloning vCLS VMs
  • Configuring PMem
  • Moving vCLS VM to a different folder
  • Renaming the vCLS VMs
  • Renaming the vCLS folders
  • Enabling DRS rules and overrides on vCLS VMs
  • Enabling HA admission control policy on vCLS VMs
  • Enabling HA overrides on vCLS VMs
  • Moving vCLS VMs to a resource pool
  • Recovering vCLS VMs from a snapshot

When you perform any disruptive operation on the vCLS VMs, a warning dialog box appears.

Troubleshooting:

The health of vCLS VMs, including power state, is managed by EAM and WCP services. In case of power on failure of vCLS VMs, or if the first instance of DRS for a cluster is skipped due to lack of quorum of vCLS VMs, a banner appears in the cluster summary page along with a link to a Knowledge Base article to help troubleshoot the error state.

Because vCLS VMs are treated as system VMs, you do not need to backup or snapshot these VMs. The health state of these VMs is managed by vCenter services.

Putting a Cluster in Retreat Mode

When a datastore is placed in maintenance mode, if the datastore hosts vCLS VMs, you must manually storage vMotion the vCLS VMs to a new location or put the cluster in retreat mode.

This task explains how to put a cluster in retreat mode.

Procedure

  1. Login to the vSphere Client.
  2. Navigate to the cluster on which vCLS must be deactivated.
  3. Copy the cluster domain ID from the URL of the browser. It should be similar to domain-c(number).
    Note: Only copy the numbers to the left of the colon in the URL.
  4. Navigate to the vCenter Server Configure tab.
  5. Under Advanced Settings, click the Edit Settings button.
  6. Add a new entry config.vcls.clusters.domain-c(number).enabled. Use the domain ID copied in step 3.
  7. Set the Value to False.
  8. Click Save.

Results

vCLS monitoring service runs every 30 seconds. Within 1 minute, all the vCLS VMs in the cluster are cleaned up and the Cluster Services health will be set to Degraded. If the cluster has DRS activated, it stops functioning and an additional warning is displayed in the Cluster Summary. DRS is not functional, even if it is activated, until vCLS is reconfigured by removing it from Retreat Mode.

vSphere HA does not perform optimal placement during a host failure scenario. HA depends on DRS for placement recommendations. HA will still power on the VMs but these VMs might be powered on in a less optimal host.

To remove Retreat Mode from the cluster, change the value in step 7 to True.

Retrieving Password for vCLS VMs

You can retrieve the password to login to the vCLS VMs.

To ensure cluster services health, avoid accessing the vCLS VMs. This document is intended for explicit diagnostics on vCLS VMs.

Procedure

  1. Use SSH to login to the vCenter Server Appliance.
  2. Run the following python script:
    /usr/lib/vmware-wcp/decrypt_clustervm_pw.py
  3. Read the output for the password.

    pwd-script-output

    Read key from file

    Connected to PSQL

    PWD: (password displayed here)

Results

With the retrieved password, you can log into the vCLS VMs.

vCLS VM Anti-Affinity Policies

vSphere supports anti-affinity between vCLS VMs and another group of workload VMs.

Compute policies provide a way to specify how the vSphere Distributed Resource Scheduler (DRS) should place VMs on hosts in a resource pool. Use the vSphere Compute Policies editor to create and delete compute policies. You can create or delete, but not modify, a compute policy. If you delete a category tag used in the definition of the policy, the policy is also deleted. Open the VM Summary page in vSphere to view the compute policies that apply to a VM and its compliance status with each policy. You can create a compute policy for a group of workload VMs that is anti-affine to the group of vCLS VMs. A vCLS anti-affinity policy can have a single user visible tag for a group of workload VMs, and the other group of vCLS VMs is internally recognized.

Create or Delete a vCLS VM Anti-Affinity Policy

A vCLS VM anti-affinity policy describes a relationship between a category of VMs and vCLS system VMs.

A vCLS VM anti-affinity policy discourages placement of vCLS VMs and application VMs on the same host. This kind of policy can be useful when you do not want vCLS VMs and virtual machines running critical workload to run on the same host. Some best practices for running critical workloads such as SAP HANA require dedicated hosts. After the policy is created, the placement engine attempts to place vCLS VMs on the hosts where policy VMs are not running.

Enforcement of a vCLS VM anti-affinity policy can be affected in several ways:
  • If the policy applies to multiple VMs on different hosts and it is not possible to have enough hosts to distribute vCLS VMs, vCLS VMs are consolidated into the hosts without policy VMs.
  • If a provisioning operation specifies a destination host, that specification is always honored even if it violates the policy. DRS will try to move the vCLS VMs to a compliant host in a subsequent remediation cycle.

Procedure

  1. Create a category and tag for each group of VMs that you want to include in a vCLS VM anti-affinity policy.
  2. Tag the VMs that you want to include.
  3. Create a vCLS VM anti-affinity policy.
    1. From the vSphere, click Policies and Profiles > Compute Policies.
    2. Click Add to open the New Compute Policy Wizard.
    3. Fill in the policy Name and choose vCLS VM anti affinity from the Policy type drop-down control.
      The policy Name must be unique.
    4. Provide a Description of the policy, then use VM tag to choose the Category and Tag to which the policy applies.
      Unless you have multiple VM tags associated with a category, the wizard fills in the VM tag after you select the tag Category.
    5. Click Create to create the policy.
  4. (Optional) To delete a compute policy, open vSphere, click Policies and Profiles > Compute Policies to show each policy as a card. Click DELETE to delete a policy.