This article describes the system architecture of Horizon Cloud Connector 2.0 and later, which relies on Kubernetes pods running on primary and worker nodes in a cluster. It explains how this architecture supports high availability features for nodes and fault tolerance features for core Horizon Cloud Connector services including the Horizon Universal License.

Beginning with version 2.0, Horizon Cloud Connector provides support for dual-node clusters, node-level high availability, and service-level fault tolerance. In Horizon Cloud Connector 2.0 and later, all services run as Kubernetes pods on the nodes.

Note: This release supports dual-node clusters, node-level high availability, and service-level fault tolerance only for appliances paired with the following types of pods:
  • Horizon pods deployed on premises
  • Horizon pods deployed in VMware Cloud on AWS with all-in-SDDC architecture

Horizon pods deployed in all other environments support single-node clusters consisting of a primary node only and do not support node-level high availability and service-level fault tolerance.

What is a Horizon Cloud Connector cluster?

A Horizon Cloud Connector cluster consists of the following members:

  • The primary node of the Horizon Cloud Connector virtual appliance
  • The worker node of the Horizon Cloud Connector virtual appliance

At the minimum, a cluster must contain the primary node as a member. You can add and remove a worker node to and from an existing cluster that contains the primary node.

What is a primary node?

The primary node is the virtual machine (VM) of the Horizon Cloud Connector appliance that runs the control plane services required to manage the Horizon Cloud Connector cluster.

The primary node also runs the primary instance of the following services:

To deploy a primary node and pair it with your Horizon pod, follow the guidelines described in High-Level Workflow When You are Onboarding an Existing Horizon Pod That is Deployed in a VMware SDDC as Your First Pod to Your Horizon Cloud Tenant Environment.

What is a worker node?

The worker node is a secondary VM of the Horizon Cloud Connector appliance that runs replica instances of the following services:

By adding a worker node to the Horizon Cloud Connector cluster, you can scale up these services to support increased workloads, which are load-balanced across primary and replica instances of the services. If you remove the worker node from the cluster, services scale down to a single instance running on the primary node.

Note: In this release, the worker node only supports replica instances of the Horizon Cloud Connector application services. All other services, including CBCS, CSMS, ILS, and cluster-management services, run as a single instance on the primary node.

To deploy a worker node, follow the steps described in Horizon Cloud Connector 2.0 and Later - Add a Worker Node to a Horizon Cloud Connector Cluster. To remove the worker node from a cluster, follow the steps described in Horizon Cloud Connector 2.0 and Later - Remove the Worker Node from a Horizon Cloud Connector Cluster.

What is node-level high availability and how does it work?

For complete information, see Horizon Cloud Connector 2.0 and Later - Set Up Node-Level High Availability.

How does service-level fault tolerance work in different outage scenarios?

This section describes how a dual-node Horizon Cloud Connector cluster supports fault tolerance and the continued availability of the Horizon Universal License under various conditions of outage.

Note: In this release, Horizon Cloud Connector only supports fault tolerance for the Horizon Cloud Connector application services, as detailed in the previous section. All other services run as a single instance on the primary node and become unavailable if that single instance fails.
  1. If a framework service fails

    As described earlier, the Horizon Cloud Connector framework services (Connector Client Service, Cloud Proxy Service, Connection Server Proxy Service) run as dual instances on the primary and worker nodes. If a framework service fails on one node, the replica instance of that service continues running on the other node to ensure full operation of the Horizon Cloud Connector framework services and the Horizon Universal License.

    For example, if the Cloud Proxy Service fails on the primary node, the replica instance of the Cloud Proxy Service on the worker node continues to run. The fully operational framework services ensure that the Horizon Cloud license service can continue to sync with the pod every 24 hours.

  2. If the worker node fails
    Note: This outage scenario is only applicable if you have not configured node-level HA. When you configure node-level HA as described in Horizon Cloud Connector 2.0 and Later - Set Up Node-Level High Availability, vSphere HA ensures the high availability of the worker node.

    If the entire worker node loses operation, all services continue running without interruption as single instances on the primary node and the Horizon Universal License remains fully operational.

    The Horizon Cloud Connector application services scale down temporarily until the worker node is restored to full operation.

  3. If the primary node fails
    Note: This outage scenario is only applicable if you have not configured node-level HA. When you configure node-level HA as described in Horizon Cloud Connector 2.0 and Later - Set Up Node-Level High Availability, vSphere HA ensures the high availability of the primary node.

    If the entire primary node loses operation, the Horizon Universal License enters a 25-day sync grace period. During this period, the license remains valid and the pod remains fully operational. For more information, see Monitoring the Horizon Universal License.

    You can continue to monitor and perform administrative tasks on the pod using the Horizon Universal Console. However, the following limitations apply:

    • The Horizon Cloud Connector cluster goes into error state.
    • You cannot access the Horizon Cloud Connector configuration portal from the worker node.
    • The Universal Broker, Cloud Monitoring Service, and Horizon Image Management Service become temporarily unavailable.