This article describes the system architecture of Horizon Cloud Connector 2.0 and later, which relies on Kubernetes pods running on primary and worker nodes in a cluster. It explains how this architecture supports high availability features for nodes and fault tolerance features for core Horizon Cloud Connector services including the Horizon Universal License.
Beginning with version 2.0, Horizon Cloud Connector provides support for dual-node clusters, node-level high availability, and service-level fault tolerance. In Horizon Cloud Connector 2.0 and later, all services run as Kubernetes pods on the nodes.
- Horizon pods deployed on premises
- Horizon pods deployed in VMware Cloud on AWS with all-in-SDDC architecture
Horizon pods deployed in all other environments support single-node clusters consisting of a primary node only and do not support node-level high availability and service-level fault tolerance.
What is a Horizon Cloud Connector cluster?
A Horizon Cloud Connector cluster consists of the following members:
- The primary node of the Horizon Cloud Connector virtual appliance
- The worker node of the Horizon Cloud Connector virtual appliance
At the minimum, a cluster must contain the primary node as a member. You can add and remove a worker node to and from an existing cluster that contains the primary node.
What is a primary node?
The primary node is the virtual machine (VM) of the Horizon Cloud Connector appliance that runs the control plane services required to manage the Horizon Cloud Connector cluster.
The primary node also runs the primary instance of the following services:
- Horizon Cloud Connector application services, which encompass the following services as listed in the appliance configuration portal:
- Connector Client Service
- Cloud Proxy Service
- Connection Server Proxy Service
- Cloud Broker Client Service (CBCS), which supports Universal Broker
- Connection Server Monitoring Service (CSMS)
- Image Locality Service (ILS), which supports the optional Horizon Image Management Service
- Services made available after onboarding your Horizon Cloud tenant to VMware Cloud Services Engagement Platform. For more information, see Onboard Your Horizon Cloud Tenant to VMware Cloud Services Engagement Platform and VMware Cloud Services Using the Horizon Universal Console.
To deploy a primary node and pair it with your Horizon pod, follow the guidelines described in High-Level Workflow When You are Onboarding an Existing Horizon Pod That is Deployed in a VMware SDDC as Your First Pod to Your Horizon Cloud Tenant Environment.
What is a worker node?
The worker node is a secondary VM of the Horizon Cloud Connector appliance that runs replica instances of the following services:
- Horizon Cloud Connector application services, which encompass the following services as listed in the Horizon Cloud Connector configuration portal:
- Connector Client Service
- Cloud Proxy Service
- Connection Server Proxy Service
- Services made available after onboarding your Horizon Cloud tenant to VMware Cloud Services Engagement Platform. For more information, see Onboard Your Horizon Cloud Tenant to VMware Cloud Services Engagement Platform and VMware Cloud Services Using the Horizon Universal Console.
By adding a worker node to the Horizon Cloud Connector cluster, you can scale up these services to support increased workloads, which are load-balanced across primary and replica instances of the services. If you remove the worker node from the cluster, services scale down to a single instance running on the primary node.
To deploy a worker node, follow the steps described in Horizon Cloud Connector 2.0 and Later - Add a Worker Node to a Horizon Cloud Connector Cluster. To remove the worker node from a cluster, follow the steps described in Horizon Cloud Connector 2.0 and Later - Remove the Worker Node from a Horizon Cloud Connector Cluster.
What is node-level high availability and how does it work?
For complete information, see Horizon Cloud Connector 2.0 and Later - Set Up Node-Level High Availability.
How does service-level fault tolerance work in different outage scenarios?
This section describes how a dual-node Horizon Cloud Connector cluster supports fault tolerance and the continued availability of the Horizon Universal License under various conditions of outage.
- If a framework service fails
As described earlier, the Horizon Cloud Connector framework services (Connector Client Service, Cloud Proxy Service, Connection Server Proxy Service) run as dual instances on the primary and worker nodes. If a framework service fails on one node, the replica instance of that service continues running on the other node to ensure full operation of the Horizon Cloud Connector framework services and the Horizon Universal License.
For example, if the Cloud Proxy Service fails on the primary node, the replica instance of the Cloud Proxy Service on the worker node continues to run. The fully operational framework services ensure that the Horizon Cloud license service can continue to sync with the pod every 24 hours.
- If the worker node fails
Note: This outage scenario is only applicable if you have not configured node-level HA. When you configure node-level HA as described in Horizon Cloud Connector 2.0 and Later - Set Up Node-Level High Availability, vSphere HA ensures the high availability of the worker node.
If the entire worker node loses operation, all services continue running without interruption as single instances on the primary node and the Horizon Universal License remains fully operational.
The Horizon Cloud Connector application services scale down temporarily until the worker node is restored to full operation.
- If the primary node fails
Note: This outage scenario is only applicable if you have not configured node-level HA. When you configure node-level HA as described in Horizon Cloud Connector 2.0 and Later - Set Up Node-Level High Availability, vSphere HA ensures the high availability of the primary node.
If the entire primary node loses operation, the Horizon Universal License enters a 25-day sync grace period. During this period, the license remains valid and the pod remains fully operational. For more information, see Monitoring the Horizon Universal License.
You can continue to monitor and perform administrative tasks on the pod using the Horizon Universal Console. However, the following limitations apply:
- The Horizon Cloud Connector cluster goes into error state.
- You cannot access the Horizon Cloud Connector configuration portal from the worker node.
- The Universal Broker, Cloud Monitoring Service, and Horizon Image Management Service become temporarily unavailable.