This documentation page describes key things to know about the maintenance of the VMware software components that comprise the deployed Horizon Cloud pod.

Brief Introduction

The system's maintenance activities include an automated update of the pod's software components to include new features, fixes, and improvements for service supportability and resiliency.

To complete an update of the pod and gateway appliances with near-zero downtime, the system uses counts of the end-user sessions. The system uses the session counts to determine the optimal timing for completing the update when there are a low number of users connected to the environment with active sessions.

The maintenance activity that updates an existing pod to a newer manifest is system-initiated from the cloud plane to occur at a system-determined day and time.

To indicate your preference that any such system maintenance activity should start at a particular hour and day of the week, you use the console to specify each pod's preferred maintenance window.

A pod without a preferred maintenance window specified in the console will be taken to mean VMware can schedule a maintenance on that pod at any time at VMware's convenience.

Note: As described in this documentation page, starting in early calendar year 2022, the service enhanced the upgrade code to programmatically use VMware offers that VMware provides in the Azure Marketplace. When the upgrade pre-checks determine that programmatic use of those VMware offers is prevented in the subscription, you must complete the actions as described in that documentation page to resolve the update-blocking errors.

For example, if the Horizon Cloud service principal associated with the subscriptions used for the pod and its gateway configurations is using a custom role (atypical), please ensure that custom role includes these two permissions. The enhanced upgrade API code relies on these permissions to retrieve the offers list from the Marketplace and obtain the VMware offers. If the custom role does not already include these two permissions, please have them added to the custom role before the pod and gateway upgrade process takes place.

Microsoft.MarketplaceOrdering/offertypes/publishers/offers/plans/agreements/read
Microsoft.MarketplaceOrdering/offertypes/publishers/offers/plans/agreements/write

When the software components used in a deployed pod are updated to a new version, the manifest number for the pod increases to a higher version number, such as 2632.0. If there are improvements considered important for pod serviceability and support operations, VMware can create a new manifest that is a point version, such as 2632.1. The console displays a pod's manifest on the Capacity page.

Important Information About Updating a Pod From Manifests Earlier Than 3328

Starting in February 2022, the NICs for the pod manager VMs follow the same infrastructure design pattern as the NICs for the Unified Access Gateway VMs.

In new pod deployments starting at that time and in pod updates from manifests earlier than manifest 3328, the deployer instantiates all of the necessary networking needed to support running the pod and for subsequent updates over time. The pod's resource group will now have 8 NICs:

  • 4 NICs that reserve 4 IP addresses from the pod's management subnet
  • 4 NICs that reserve 4 IP addresses from the pod's primary VM subnet (historically called the tenant subnet).

These 8 pod NICs will persist and continue to reserve their assigned IP addresses for the life of the pod.

This design supports faster and more resilient pod updates. Prior to this design, a pod update required creating new NICs as part of the green pod build-out and obtaining IP addresses for those NICs from the pod's subnets at the time of the update. With that design, timeouts in Azure could occur and disrupt the update process.

In this design where the deployer instantiates all of the necessary networking up front, the NICs and their IP addresses from the management and VM (tenant) subnets are preserved to be used in subsequent pod updates. This design aligns with the pattern used for the Unified Access Gateway instances.

When your pod does not yet have the 8 NICs in the pod's resource group and the pod is scheduled to be updated to manifest 3328 or later, you must take these actions.

Before updating that pod, ensure that the pod's management subnet's IP addresses and primary VM (tenant) subnet's IP addresses are taken only by items that the Horizon Cloud on Microsoft Azure creates and configures
  • Management subnet - Only the Horizon Cloud on Microsoft Azure deployment's specific NICs that the pod deployer had created and configured should be using IP addresses from the pod's management subnet. Those NICs are the pod managers' NICS and the pod's Unified Access Gateway instance's NICs. The pod's management subnet must not have any non-pod-deployed resources or items attached to it or taking IP addresses from it.
  • Tenant subnet - Only the Horizon Cloud on Microsoft Azure deployment's specific NICs and load balancers that the pod deployer creates and configures should be using IP addresses from the pod's tenant subnet. The pod's tenant subnet must not have any non-deployment resources or items attached to it or taking IP addresses from it.

The Deployment Guide precisely states that the subnets used by the pod should have zero additional resources attached to them other than the pod deployment's resources. If you have manually created resources and assigned IP addresses from the pod's management or tenant subnet to such additional resources, then you must remove those IP addresses from those resources before the pod update runs. Otherwise, the pod update will fail and require VMware Support.

After updating the pod, ensure that you add all of the IP addresses reserved by the deployer-created NICs' in the pod's resource group to firewall rules that you have in place prior to the update
You might have existing firewall rules that govern the traffic from the pod manager VMs' NICs' IP addresses. So that the traffic communication will work after the pod update as it worked prior to the update, you must ensure that all of the 8 IP addresses reserved by the NICs in the pod's resource group are reflected in your firewall rules after the update.

Things to Know About Pod Maintenance

The maintenance of the VMware software components that comprise the deployed Horizon Cloud pod is a necessary and required operation to maintain the health and stability of the virtual desktops and applications provisioned by that pod. As described in the VMware Horizon Cloud Service - Additional Service Details (87894) KB, VMware is responsible for the software components that reside on the pod and which are downloaded to that pod from the control plane. The VMware Horizon Cloud Service - Additional Service Details PDF attached to that KB article describes:

  • VMware's roles and responsibilities around the change management procedures for maintaining the health of the software components that are downloaded into the pod. Maintenance activities include updating the pod's software components.
  • The customer's (your) role and responsibilities around the change management procedures, including cooperating with VMware when a scheduled or emergency maintenance is required.

The VMware Horizon Cloud Service - Additional Service Details document contains a definition of scheduled maintenance, maintenance windows, and emergency maintenance. See that document for details. In the case of any discrepancies between this documentation page's contents and the contents of the VMware Horizon Cloud Service - Additional Service Details document, the VMware Horizon Cloud Service - Additional Service Details document takes precedence.

Attention: Before the pod is updated, you must ensure the pod's image VMs, farm VMs, and VDI desktop VMs all have the latest agent that is available for the pod. If you do not update them to the latest agent prior to the pod update, then after the pod update, they might be running incompatible agent versions, which will put the pod into an unsupported state. How can you tell if you need to update any of the agents? In the console, see if there are any blue dots next to an image or assignment. If you see any blue dots, the goal is to make all of the blue dots disappear from the console before the pod update. See Horizon Cloud Pod Updates — Steps For Continued Agents Compatibility and Support.

Specifying the Pod's Preferred Maintenance Window

To indicate your preference for any maintenance activity on your pod to start at a particular hour and day of the week, you use the console to specify what is called the preferred maintenance window for that pod. From the Capacity page, navigate to the Maintenance tab in the pod's details page. Look for the label Preferred Maintenance Time and then follow the on-screen controls to choose a weekday name and time (UTC) in that day. You can only choose from the displayed system pre-defined defaults.

Specify each pod's preferred maintenance time separately in each pod's details page in the console.

Note: A pod without a preferred maintenance window specified in the console will be taken to mean you allow VMware to schedule a maintenance on that pod at any time at VMware's convenience.

The system will read the weekday and time that you specify in the console and incorporate that data into its scheduling algorithm. When a new pod manifest is set as the default in the cloud plane, the system's scheduler will calculate out the actual update day and time it has determined the update can happen on each pod in your pod fleet. Although the system will do its best to accommodate the preferred maintenance start times specified in that pod's Maintenance tab, there are no guarantees that the system will be able to accommodate this preferred maintenance start time for a specific update operation.

As of this writing, the system's scheduler allots four (4) hours for the duration of the maintenance activity. The typical pod update takes less time than this allotted duration.

Maintenance Alerts and Notifications

The system will alert and notify your tenant environment's administrators when the system has scheduled a specific calendar date and time scheduled for a given pod's specific maintenance to occur. These alerts and notifications include the following:

Within the console
  • A persistent banner along the top of the console. The time in the banner is the maintenance time is local to your browser time zone, as you view the console. The following screenshot is an example where the pod's update is schedule to occur at 4 PM Eastern Time in the United States, on July 7, 2020. Use the View button to click through to the pod's details page and see more information about the scheduled maintenance on the pod's Maintenance tab.
    Screenshot example of the banner in the console that provides information about a pod's scheduled maintenance
  • In the pod's Audit Logs tab and in the console's Activity > Audit Logs, an audit log will state that an upgrade of the pod is scheduled by VMware Operations. The audit log line will include the pod's UUID.
  • On the pod's Maintenance tab, the Scheduled Maintenance section will display information about the scheduled maintenance.
Emails
The system will send emails about the pod's maintenance to your tenant environment's administrators — the ones specified in the console's General Settings > My VMware Accounts settings. The emails include one when the system has set the scheduled maintenance's specific calendar date and time. Examples of such emails include periodic reminders in the days and weeks ahead of that scheduled date and time, as well as when the maintenance activity has begun, and when it is complete.
Note: If you want to reschedule a scheduled maintenance date and time, you must contact VMware Support.

System Pre-Checks Prior to Performing Pod Maintenance

If you receive a notification email that says a pod has pod update errors or you see the console report pod update errors for pod, you must take actions to rectify the situation. If this happens, follow the console's on-screen guidance or the email's instructions. The usual resolution for such errors typically involves you taking steps in the Microsoft Azure Portal in the pod's subscription there. For additional information about remedies for typical pod update errors, see Horizon Cloud Pods — Remedies for Common Pre-Check Failures.

What is the purpose of these pre-checks? The maintenance activity for a pod update takes place in the pod's Microsoft Azure subscription and resource groups. A short time prior to the system scheduling a particular calendar date and time for a specific update on a given pod, the system runs a pre-check operation to determine whether any conditions exist that would block a pod from a successful update. As an example of one of these pre-checks, the system checks to see if your Microsoft Azure subscription has enough vCores of the appropriate VM Series to satisfy the update's requirements. If one of the pre-checks fails and the condition requires your action to fix, the following things occur:

  • A notification email is sent to you to alert you to this fact, and containing details about the actions required to rectify the error.
  • The console displays a visual alert that actions are required by you to rectify the pre-check errors for that pod.
Important: If you receive any notifications about pod upgrade errors, take the specified actions to remedy the errors in a timely fashion. Time is of the essence. Failure to act to remedy those errors in the time that VMware requires, the pod will go into an unsupported state due to the failure to remedy the pod update process.

Pod Updates — High-Level Overview

When the maintenance activity is a pod's update to a newer manifest version, the system appropriately moves the pod's current infrastructure components to a higher software manifest level. The infrastructure components are primarily the pod manager VMs and any Unified Access Gateway VMs that are configured for the pod. For example, a pod update can include updates for the pod management software or for the Unified Access Gateway software or for both.

The pod update process is patterned after a software industry technique known as blue-green deployment. The existing to-be-updated pod components are considered the blue components.


Conceptual illustration of the blue-green update process.

Though in most ways the pod update follows an industry blue-green pattern, there are some minor differences from a canonical blue-green update. The pod update does not 100% duplicate every single blue resource in the green build-out. Some of the existing blue resources get reused in the new green build-out, such as the NICs for the Unified Access Gateway instances. Another difference is that in the pod update process, when the newer instances are created alongside the existing ones, the newer ones are powered up and remain running until the pod has completed migrating to the new instances. Also, after the system migrates the pod to the green build-out and validates that the pod is successfully running on the new manifest version, the older blue VMs are deleted from the resource group. (A canonical blue-green update would typically retain the older blue artifacts after the switch to green, keeping the older ones in an idle state.)

  • The existing to-be-updated pod components — like the pod manager VMs and Unified Access Gateway VMs — are considered the blue components.
  • The service automatically builds the necessary green set of components for the pod in your Microsoft Azure subscription — new green pod manager VMs, Unified Access Gateway VMs, and the gateway connector VM (if your external gateway is deployed on its own VNet).
  • The newly-created components in the green build-out are created alongside the blue components, in the same resource groups.
  • The process of creating green build-out does not cause any downtime or data loss, and the parallel VMs do not affect the pod's operations.
  • The green set is a parallel environment, waiting ready for the scheduled maintenance activity that will make the switch from the blue to the green. The way the system schedules maintenance activity on a pod is covered in the preceding sections.
  • These green VMs are started and kept running until that scheduled maintenance activity is completed, the maintenance activity that migrates the blue to the green.
  • After the scheduled maintenance activity for migrating to the green build-out has completed and the pod is successfully running on the new instances, the system deletes the blue VMs from the pod's resource groups. Some resources, such as NICs for the Unified Access Gateway instances, remain to preserve configuration values that will be needed in the next pod update.
Note: You must avoid making changes in the Microsoft Azure Portal and in the pod's subscription that will impact the system's build out of the green components or will impact the system's pod update and maintenance processes.

Maintenance Activity Sequence

This sequence describes the migration to the green build-out — the switch from blue to green in the pod update.

  1. The system checks the pod's preferred maintenance window that you specified in the console, to use that information in its scheduler's algorithm to schedule the actual calendar date and time for the pod's maintenance activity.
  2. The system's scheduler chooses the actual calendar date and time for the maintenance to occur. As described in the preceding sections, the console visually displays the scheduled date and time and an email is sent to sent to the tenant's administrators.
  3. Important: Before the scheduled maintenance runs:
    • Ensure the pod's image VMs, farm VMs, and VDI desktop VMs all have the latest agent that is available for the pod. If you see any blue dots in the console, the goal is to make all of the blue dots disappear from the console before the pod update occurs. See Horizon Cloud Pod Updates — Steps For Continued Agents Compatibility and Support
    • Remove any management locks in Microsoft Azure that you might have set on any of the pod's virtual machines (VMs). Any VMs with names that have a portion like vmw-hcs-podID, where podID is the pod's ID value, belong to the pod. Microsoft Azure provides an ability to use the Microsoft Azure portal to lock resources to prevent changes to them. Such management locks can be applied on an entire resource group or on individual resources. If you or your organization has applied management locks on the pod's VMs, those locks must be removed before the update runs. Otherwise, the update process will not successfully complete. You can locate the pod's ID value in the pod's details page from the Capacity page.

    If required by your organization's needs, you can request a different scheduled date for the maintenance by contacting VMware Support at any time prior to the scheduled maintenance time.

    Important: The scheduled time that appears in the console is local to your browser time zone.
  4. At the scheduled maintenance time, the service starts the update activity. The full process typically takes between 20 to 30 minutes from start to finish for pods that have both an external and internal Unified Access Gateway configuration.
    Note: During the 20 to 30 minutes time for the process to complete, the console prevents you from performing administrative tasks on the pod that is undergoing the update. For example, until the pod manager appliances notify the cloud plane that the update is completed, the Edit action in the pod details page is unavailable to click to change characteristics of that pod.
    About end-user sessions and the update activity on the Unified Access Gateway appliances
    To achieve near-zero downtime for end-user sessions, within the overall maintenance activity time, the system uses counts of the end-user sessions on the appliances to determine the optimal timing for completing the update of those appliances.

    The completion time is optimized to occur at the time when there are a low number of users connected to the environment with active sessions.

    In that near-zero time window, end users with active sessions will have those sessions disconnected. Then in a few minutes, those users can reconnect.

    No data loss occurs, except in the scenario where you have set the Immediately option for the timeout handling in the farms and VDI desktop assignments. In that scenario, users with active sessions where you have used the Immediately option for the timeout handling will be immediately disconnected and those sessions are also logged off immediately, in accordance with that setting. In those conditions, any in-progress user work is lost. To avoid loss of in-process end user data in this scenario, before the maintenance activity starts, adjust the Logoff Disconnected Sessions setting in the farms and VDI desktop assignments to a time value that will give those users time to save their work. Then after the update is finished, you can change the setting back to what it was before.

    Also during that near-zero time window, if an end user who does not already have a connected session to their virtual desktop or remote application from the pod and who attempts to connect will not be able to connect until the process completes.

  5. After the maintenance activity is complete, the system deletes the components that are no longer needed — such as the blue components that are not reused in the green build-out, such as the pod manager VMs and Unified Access Gateway VMs. Some artifacts, such as certain NICs for the pod manager instances and Unified Access Gateway instances, remain to preserve configuration values that are needed for the next future maintenance.

After the Maintenance Activity

When the maintenance activity finishes, you can perform administrative tasks on the pod. To see the software version that a pod is currently running, select Settings > Capacity and click the pod to open its summary page. The page displays the current software version running.

  • After a pod update, ensure that the agents in all existing images, farms, and VDI assignments are updated to the latest available version. If those VMs' installed agents are not updated, the pod will be in an unsupported configuration. The maintenance process does not automatically update those installed agents. If you see any blue dots in the console for your images or assignments, it means that the agents need updating. Horizon Cloud Pod Updates — Steps For Continued Agents Compatibility and Support.
  • If this update was from a manifest earlier than manifest 3328, after updating the pod, ensure that you add to your firewall rules all of the IP addresses of the deployer-created NICs that are in the pod's resource group. You might have existing firewall rules that govern the traffic from the pod manager VMs' NICs' IP addresses. So that the traffic communication will work after this pod update and subsequent updates as it worked prior to this update, you must ensure that all of the 8 IP addresses reserved by the NICs in the pod's resource group are reflected in your firewall rules after this update.
  • If your configured two-factor authentication server is deployed in same VNet, then after the maintenance activity, you must update the settings on your two-factor authentication server to accept the new private IP addresses for the new internal Unified Access Gateway VMs. This is a one-time requirement for the first update on the pod, and does not have to be repeated for that pod's future updates. Refer to Update Your Two-Factor Authentication System with the Required Gateway Information.
  • Starting with the September 2019 quarterly service release, the pod architecture is updated to support the ability to have high availability (HA). Even when the high availability feature is not enabled, the new HA-capable architecture includes a Microsoft Azure load balancer in front of the pod's manager VM. After you update your pod to manifest 1600, if your pod was configured for direct connections, you should remap your DNS settings to point to pod manager's Azure load balancer IP address that will be newly displayed in the updated pod's details page. Until you update the DNS mapping — even though those direct user connections will still work — if the active pod manager VM goes down, those connections won't have the high availability fail over that an HA-enabled pod is designed to provide. For this use case, you map an FQDN to the IP address in the Pod Manager Load Balancer IP field that is displayed on the pod's details page, as described in Configure SSL Certificates Directly on the Pod Manager VMs, Such as When Integrating the Workspace ONE Access Connector Appliance with the Horizon Cloud Pod in Microsoft Azure, So that Connector Can Trust Connections to the Pod Manager VMs. Prior to pod manifest 1600, that IP was the one assigned to the pod's manager VM's NIC on the tenant subnet. Starting with pod manifest 1600 or later, the pod's IP address to map is the private IP address of the Microsoft Azure load balancer used for the pod's manager VMs. For existing pods that are updated to this release's manifest version, if you had configured a DNS name to point to the tenant appliance IP address for a pod of manifest 1493.1 or earlier, you should remap your DNS settings to point to the IP address displayed for the Pod Manager Load Balancer IP label in the updated pod's details page.
  • Prior to the 2474.x manifest, the system did not check your registered Active Directory servers for clock skew. With 2474.x, a check of the clock skew was introduced. If your registered Active Directory servers have any time synchronization issues (clockSkew > 4 minutes) now, when the pod is upgraded to 2474.x or later, this system validation will start on that pod. As a result, Active Directory server discovery might start failing until you address the clock skew issue, and the failing discovery will impact the end user desktop connection requests to that pod.