VMware Tanzu for Kubernetes Operations using vSphere with Tanzu Reference Design

vSphere with Tanzu transforms the vSphere cluster into a platform for running Kubernetes workloads in dedicated resource pools. When vSphere with Tanzu is enabled on a vSphere cluster, vSphere with Tanzu creates a Kubernetes control plane directly in the hypervisor layer. You can then run Kubernetes containers by creating upstream Kubernetes clusters through the VMware Tanzu Kubernetes Grid Service, and run your applications inside these clusters.

This document provides a reference design for deploying VMware Tanzu for Kubernetes Operations (informally known as TKO) on vSphere with Tanzu.

The following reference design is based on the architecture and components described in VMware Tanzu for Kubernetes Operations Reference Architecture.

Diagram of TKO using vSphere with Tanzu Reference Architecture

vSphere with Tanzu Components

Supervisor Cluster: When Workload Management is enabled on a vSphere cluster, it creates a Kubernetes layer within the ESXi hosts that are part of the cluster. A cluster that is enabled for Workload Management is called a Supervisor Cluster. You run containerized workloads by creating upstream Kubernetes clusters on the Supervisor Cluster through the Tanzu Kubernetes Grid Service.

The Supervisor Cluster runs on top of an SDDC layer that consists of ESXi for compute, vSphere Distributed Switch for networking, and vSAN or another shared storage solution.
vSphere Namespaces: A vSphere Namespace is a tenancy boundary within vSphere with Tanzu. A vSphere Namespace allows for sharing vSphere resources (computer, networking, storage), and enforcing resource limits with the underlying objects such as Tanzu Kubernetes clusters. For each namespace, you configure role-based access control ( policies and permissions ), images library, and virtual machine classes.
Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service (TKGS) allows you to create and manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the Kubernetes Cluster API. The Cluster API provides declarative, Kubernetes-style APIs for the creation, configuration, and management of the Tanzu Kubernetes Cluster.

Tanzu Kubernetes Grid Service also provides self-service lifecycle management of Tanzu Kubernetes clusters.
Tanzu Kubernetes Cluster (Workload Cluster): Tanzu Kubernetes clusters are Kubernetes workload clusters in which your application workloads run. These clusters can be attached to SaaS solutions such as Tanzu Mission Control, Tanzu Observability, and Tanzu Service Mesh, which are part of Tanzu for Kubernetes Operations.
VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace. VM classes can be used by standalone VMs that run in a Supervisor Namespace and by VMs hosting a Tanzu Kubernetes cluster.

VM classes in vSphere with Tanzu are broadly categorized into the following groups:
- guaranteed: The guaranteed class fully reserves its configured resources.
- best-effort: The best-effort class allows resources to be overcommitted.
vSphere with Tanzu offers several default VM classes. You can use them as is or you can create new VM classes. The following screenshot shows the default VM classes that are available in vSphere with Tanzu.
Storage Classes in vSphere with Tanzu: A StorageClass provides a way for administrators to describe the classes of storage they offer. Different classes can map to quality-of-service levels, to backup policies, or to arbitrary policies determined by the cluster administrators.

You can deploy vSphere with Tanzu with an existing default StorageClass or the vSphere Administrator can define StorageClass objects (Storage policy) that let cluster users dynamically create PVC and PV objects with different storage types and rules.

The following table provides recommendations for configuring VM Classes/Storage Classes in a vSphere with Tanzu environment.

Decision ID	Design Decision	Design Justification	Design Implications
TKO-TKGS-001	Create custom Storage Classes/Profiles/Policies	To provide different levels of QoS and SLA for prod and dev/test K8s workloads. To isolate Supervisor clusters from workload clusters.	Default Storage Policy might not be adequate if deployed applications have different performance and availability requirements.
TKO-TKGS-002	Create custom VM Classes	To facilitate deployment of K8s workloads with specific compute/storage requirements.	Default VM Classes in vSphere with Tanzu are not adequate to run a wide variety of K8s workloads.

vSphere with Tanzu Architecture

The following diagram shows a high-level architecture of vSphere with Tanzu.

Diagram of vSphere with Tanzu Architecture

The Supervisor Cluster consists of the following components:

Kubernetes control plane VM: Three Kubernetes control plane VMs in total are created on the hosts that are part of the Supervisor cluster. The three control plane VMs are load-balanced as each one of them has its own IP address.
Cluster API and Tanzu Kubernetes Grid Service: These modules run on the Supervisor cluster and enable the provisioning and management of Tanzu Kubernetes clusters.

The following diagram shows the general architecture of the Supervisor cluster.

Diagram of Supervisor Cluster Architecture

After a Supervisor cluster is created, the vSphere administrator creates vSphere namespaces. When initially created, vSphere namespaces have unlimited resources within the Supervisor cluster. The vSphere administrator defines the limits for CPU, memory, and storage, as well as the number of Kubernetes objects such as deployments, replica sets, persistent volumes, and so on. that can run within the namespace. These limits are configured for each vSphere namespace.

For more information about the maximum supported number, see the vSphere with Tanzu Configuration Maximums guide.

vSphere Namespace

To provide tenants access to namespaces, the vSphere administrator assigns permission to users or groups available within an identity source that is associated with vCenter Single Sign-On.

Once the permissions are assigned, tenants can access the namespace to create Tanzu Kubernetes Clusters using YAML files and the Cluster API.

Here are some recommendations for using namespaces in a vSphere with Tanzu environment.

Decision ID	Design Decision	Design Justification	Design Implications
TKO-TKGS-003	Create namespaces to logically separate K8s workloads.	Create dedicated namespaces for the type of workloads (prod/dev/test) that you intend to run.	All Kubernetes clusters created under a namespace share the same access policy/quotas/network resources.
TKO-TKGS-004	Enable self-service namespaces.	Enable DevOps/Cluster admin users to provision namespaces in a self-service manner.	The vSphere administrator must publish a namespace template to the LDAP users/groups to enable them to create namespaces.
TKO-TKGS-005	Register external identity source (AD/LDAP) with vCenter.	Limit access to a namespace to authorized users/groups.	A prod namespace can be accessed by a handful of users, whereas a dev/test namespace can be exposed to a wider audience.

Supported Component Matrix

Software Components	Version
Tanzu Kubernetes Release	1.24.9
VMware vSphere ESXi	8.0 U1 or later
VMware vCenter (VCSA)	8.0 U1 or later
NSX Advanced Load Balancer	22.1.3

vSphere with Tanzu Storage

vSphere with Tanzu integrates with shared datastores available in the vSphere infrastructure. The following types of shared datastores are supported:

vSAN
VMFS
NFS
vVols

vSphere with Tanzu uses storage policies to integrate with shared datastores. The policies represent datastores and manage the storage placement of objects such as control plane VMs, container images, and persistent storage volumes.

Before you enable vSphere with Tanzu, create storage policies to be used by the Supervisor Cluster and namespaces. Depending on your vSphere storage environment, you can create several storage policies to represent different classes of storage.

vSphere with Tanzu is agnostic about which storage option you choose. For Kubernetes stateful workloads, vSphere with Tanzu installs the vSphere Container Storage Interface (vSphere CSI) to automatically provision Kubernetes persistent volumes for pods.

Tanzu Kubernetes Clusters Networking

A Tanzu Kubernetes cluster provisioned by the Tanzu Kubernetes Grid supports the following Container Network Interface (CNI) options:

The CNI options are open-source software that provide networking for cluster pods, services, and ingress.

When you deploy a Tanzu Kubernetes cluster using the default configuration of Tanzu CLI, Antrea CNI is automatically enabled in the cluster.

To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes clusters with Calico

Each CNI is suitable for a different use case. The following table lists some common use cases for the CNI options that Tanzu Kubernetes Grid supports. This table will help you select the most appropriate CNI for your Tanzu Kubernetes Grid implementation.

CNI	Use Case	Pros and Cons
Antrea	Enable Kubernetes pod networking with IP overlay networks using VXLAN or Geneve for encapsulation. Optionally encrypt node-to-node communication using IPSec packet encryption. Antrea supports advanced network use cases like kernel bypass and network service mesh.	Pros - Antrea leverages Open vSwitch as the networking data plane. Open vSwitch supports both Linux and Windows. - VMware supports the latest conformant Kubernetes and stable releases of Antrea.
Calico	Calico is used in environments where factors like network performance, flexibility, and power are essential. For routing packets between nodes, Calico leverages the BGP routing protocol instead of an overlay network. This eliminates the need to wrap packets with an encapsulation layer resulting in increased network performance for Kubernetes workloads.	Pros - Support for Network Policies - High network performance - SCTP Support Cons - No multicast support

CNI

Use Case

Pros and Cons

Antrea

Enable Kubernetes pod networking with IP overlay networks using VXLAN or Geneve for encapsulation. Optionally encrypt node-to-node communication using IPSec packet encryption.

Antrea supports advanced network use cases like kernel bypass and network service mesh.

Pros

- Antrea leverages Open vSwitch as the networking data plane. Open vSwitch supports both Linux and Windows.

- VMware supports the latest conformant Kubernetes and stable releases of Antrea.

Calico

Calico is used in environments where factors like network performance, flexibility, and power are essential.

For routing packets between nodes, Calico leverages the BGP routing protocol instead of an overlay network. This eliminates the need to wrap packets with an encapsulation layer resulting in increased network performance for Kubernetes workloads.

Pros

- Support for Network Policies

- High network performance

- SCTP Support

Cons

- No multicast support

Networking for vSphere with Tanzu

You can deploy vSphere with Tanzu on various networking stacks, including:

VMware NSX-T Data Center Networking.
vSphere Virtual Distributed Switch (VDS) Networking with NSX Advanced Load Balancer.

Note
The scope of this discussion is limited to vSphere Networking (VDS) with NSX Advanced Load Balancer.

vSphere with Tanzu on vSphere Networking with NSX Advanced Load Balancer

In a vSphere with Tanzu environment, a Supervisor Cluster configured with vSphere networking uses distributed port groups to provide connectivity to Kubernetes control plane VMs, services, and workloads. All hosts from the cluster, which is enabled for vSphere with Tanzu, are connected to the distributed switch that provides connectivity to Kubernetes workloads and control plane VMs.

You can use one or more distributed port groups as Workload Networks. The network that provides connectivity to the Kubernetes Control Plane VMs is called Primary Workload Network. You can assign this network to all the namespaces on the Supervisor Cluster, or you can use different networks for each namespace. The Tanzu Kubernetes clusters connect to the Workload Network that is assigned to the namespace.

The Supervisor Cluster leverages NSX Advanced Load Balancer (NSX ALB) to provide L4 load balancing for the Tanzu Kubernetes clusters control-plane HA. Users access the applications by connecting to the Virtual IP address (VIP) of the applications provisioned by NSX Advanced Load Balancer.

The following diagram shows a general overview for vSphere with Tanzu on vSphere Networking.

Overview diagram of vSphere with Tanzu on vSphere Networking

NSX Advanced Load Balancer Components

NSX Advanced Load Balancer is deployed in write access mode in a vSphere environment. This mode grants NSX Advanced Load Balancer Controllers full write access to the vCenter, which helps in automatically creating, modifying, and removing SEs and other resources as needed to adapt to changing traffic needs. The following are the core components of NSX Advanced Load Balancer:

NSX Advanced Load Balancer Controller: NSX Advanced Load Balancer Controller manages Virtual Service objects and interacts with the vCenter Server infrastructure to manage the lifecycle of the service engines (SEs). It is the central repository for the configurations and policies related to services and management and provides the portal for viewing the health of VirtualServices and SEs and the associated analytics that NSX Advanced Load Balancer provides.
NSX Advanced Load Balancer Service Engine: NSX Advanced Load Balancer Service Engines (SEs) are lightweight VMs that handle all data plane operations by receiving and executing instructions from the controller. The SEs perform load balancing and all client and server-facing network interactions.
Avi Kubernetes Operator (AKO): Avi Kubernetes Operator is a Kubernetes operator that runs as a pod in the Supervisor Cluster. It provides ingress and load balancing functionality. Avi Kubernetes Operator translates the required Kubernetes objects to NSX Advanced Load Balancer objects and automates the implementation of ingresses/routes/services on the Service Engines (SE) via the NSX Advanced Load Balancer Controller.

Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud in NSX Advanced Load Balancer maintains networking and NSX Advanced Load Balancer Service Engine settings. Each cloud is configured with one or more VIP networks to provide IP addresses to L4 load balancing virtual services created under that cloud.

The virtual services can be spanned across multiple Service Engines if the associated Service Engine Group is configured in Active/Active HA mode. A Service Engine can belong to only one Service Engine group at a time.

IP address allocation for virtual services can be over DHCP or via NSX Advanced Load Balancer in-built IPAM functionality. The VIP networks created/configured in NSX Advanced Load Balancer are associated with the IPAM profile.

Network Architecture

To deploy vSphere with Tanzu, build separate networks for the Tanzu Kubernetes Grid management (Supervisor) cluster, Tanzu Kubernetes Grid workload clusters, NSX Advanced Load Balancer components, and the Tanzu Kubernetes Grid control plane HA.

The network reference design can be mapped into this general framework.

Diagram of network reference design

Note
The network/portgroup designated for the workload cluster, carries both data and control traffic. Firewalls cannot be utilized to segregate traffic between workload clusters; instead, the underlying CNI must be employed as the main filtering system. Antrea CNI has the Custom Resource Definitions (CRDs) for firewall rules that can be enforce before Kubernetes network policy is added.

Based on your requirements, you can create additional networks for your workload cluster. These networks are also referred to as vSphere with Tanzu workload secondary network.

This topology enables the following benefits:

Isolate and separate SDDC management components (vCenter, ESX) from the vSphere with Tanzu components. This reference design allows only the minimum connectivity between the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer to the vCenter Server.
Isolate and separate the NSX Advanced Load Balancer management network from the supervisor cluster network and the Tanzu Kubernetes Grid workload networks.
Separate vSphere Admin and Tenant access to the supervisor cluster. This prevents tenants from attempting to connect to the supervisor cluster.
Allow tenants to access only their own workload cluster(s) and restrict access to this cluster from other tenants. This separation can be achieved by assigning permissions to the supervisor namespaces.
Depending on the workload cluster type and use case, multiple workload clusters may leverage the same workload network or new networks can be used for each workload cluster.

Network Requirements

As per the reference architecture, the list of required networks is as follows:

Network Type	DHCP Service	Description
NSX Advanced Load Balancer Management Network	Optional	NSX Advanced Load Balancer controllers and SEs will be attached to this network.
TKG Management Network	Optional	Supervisor Cluster nodes will be attached to this network.
TKG Workload Network (Primary)	Optional	Control plane and worker nodes of TKG workload clusters will be attached to this network. The second interface of the Supervisor nodes is also attached to this network.
TKG Cluster VIP/Data Network	No	Virtual Services (L4) for Control plane HA of all TKG clusters (Supervisor and Workload). Reserve sufficient IPs depending on the number of TKG clusters planned to be deployed in the environment.

Subnet and CIDR Examples

For the purpose of demonstration, this document makes use of the following Subnet CIDR for TKO deployment.

Network Type	Segment Name	Gateway CIDR	DHCP Pool	NSX Advanced Load Balancer IP Pool
NSX Advanced Load Balancer Mgmt Network	NSX-Advanced Load Balancer-Mgmt	192.168.10.1/27	NA	192.168.10.14 - 192.168.10.30
Supervisor Cluster Network	TKG-Management	192.168.40.1/28	192.168.40.2 - 192.168.40.14	NA
TKG Workload Primary Network	TKG-Workload-PG01	192.168.60.1/24	192.168.60.2 - 192.168.60.251	NA
TKG Cluster VIP/Data Network	TKG-Cluster-VIP	192.168.80.1/26	NA	SE Pool: 192.168.80.2 - 192.168.80.20 TKG Cluster VIP Range: 192.168.80.21 - 192.168.80.60

Firewall Requirements

To prepare the firewall, you need the following information:

NSX Advanced Load Balancer Controller node and VIP addresses
NSX Advanced Load Balancer Service Engine management IP address
Supervisor Cluster network (Tanzu Kubernetes Grid Management) CIDR
Tanzu Kubernetes Grid workload cluster CIDR
Tanzu Kubernetes Grid cluster VIP address range
Client machine IP address
vCenter server IP address
VMware Harbor registry IP address
DNS server IP address(es)
NTP server IP address(es)

The following table provides a list of firewall rules based on the assumption that there is no firewall within a subnet/VLAN.

Source	Destination	Protocol:Port	Description
Client Machine	NSX Advanced Load Balancer Controller Nodes and VIP	TCP:443	Access NSX Advanced Load Balancer portal for configuration.
Client Machine	vCenter Server	TCP:443	Access and configure WCP in vCenter.
Client Machine	TKG Cluster VIP Range	TCP:6443 TCP:443 TCP:80	TKG Cluster Access. Access https workload. Access http workload.
Client Machine (optional)	*.tmc.cloud.vmware.com console.cloud.vmware.com	TCP:443	Access TMC portal, and so on.
TKG Management and Workload Cluster CIDR	DNS Server NTP Server	TCP/UDP:53 UDP:123	DNS Service Time Synchronization
TKG Management Cluster CIDR	vCenter IP	TCP:443	Allow components to access vCenter to create VMs and Storage Volumes.
TKG Management and Workload Cluster CIDR	NSX Advanced Load Balancer controller nodes	TCP:443	Allow Avi Kubernetes Operator (AKO) and AKO Operator (AKOO) access to NSX Advanced Load Balancer Controller.
TKG Management and Workload Cluster CIDR	TKG Cluster VIP Range	TCP:6443	Allow Supervisor cluster to configure workload clusters.
TKG Management and Workload Cluster CIDR	Image Registry (Harbor) (If Private)	TCP:443	Allow components to retrieve container images.
TKG Management and Workload Cluster CIDR	wp-content.vmware.com *.tmc.cloud.vmware.com Projects.registry.vmware.com	TCP:443	Sync content library, pull TKG binaries, and interact with TMC.
TKG Management cluster CIDR	TKG Workload Cluster CIDR	TCP:6443	VM Operator and TKC VM communication.
TKG Workload Cluster CIDR	TKG Management Cluster CIDR	TCP:6443	Allow the TKG workload cluster to register with the Supervisor cluster.
NSX Advanced Load Balancer Management Network	vCenter and ESXi Hosts	TCP:443	Allow NSX Advanced Load Balancer to discover vCenter objects and deploy SEs as required.
NSX Advanced Load Balancer Controller Nodes	DNS Server NTP Server	TCP/UDP:53 UDP:123	DNS Service Time Synchronization
TKG Cluster VIP Range	TKG Management Cluster CIDR	TCP:6443	To interact with the Supervisor cluster.
TKG Cluster VIP Range	TKG Workload Cluster CIDR	TCP:6443 TCP:443 TCP:80	To interact with workload cluster and K8s applications
vCenter Server	TKG Management Cluster CIDR	TCP:443 TCP:6443 TCP:22 (optional)

Note
For TMC, if the firewall does not allow wildcards, all IP addresses of [account].tmc.cloud.vmware.com and extensions.aws-usw2.tmc.cloud.vmware.com need to be whitelisted.

Deployment options

Starting with vSphere 8, when you enable vSphere with Tanzu, you can configure either one-zone Supervisor mapped to one vSphere cluster or three-zone Supervisor mapped to three vSphere clusters.

Single-Zone Deployment of Supervisor

A supervisor deployed on s single vSphere cluster has three control plane VMs, which reside on the ESXi hosts part of the cluster. A single zone is created for the Supervisor automatically or you can use a zone that is created in advance. In a Single-Zone deployment, cluster-level high availability is maintained through vSphere HA and can scale with vSphere with Tanzu setup by adding physical hosts to vSphere cluster that maps to the Supervisor. You can run workloads through vSphere Pods, Tanzu Kubernetes Grid clusters and VMs when Supervisor is enabled with the NSX networking stack.

Three-Zone Deployment of Supervisor

Configure each vSphere cluster as an independent failure domain and map it to the vSphere zone. In a Three-Zone deployment, all three vSphere clusters become one Supervisor and can provide :

Cluster-level high availability to the Supervisor as vSphere cluster is an independent failure domain.
Distribute the nodes of Tanzu Kubernetes Grid clusters across all three vSphere zones and provide availability via vSphere HA at cluster level.
Scale the Supervisor by adding hosts to each of the three vSphere clusters.

For more information, see Supervisor Architecture and Components.

Installation Experience

vSphere with Tanzu deployment starts with deploying the Supervisor cluster (Enabling Workload Management). The deployment is directly done from the vCenter user interface (UI). The Get Started page lists the pre-requisites for the deployment.

Screenshot of the vCenter integrated Workload Management page

The vCenter UI shows that, in the current version, it is possible to install vSphere with Tanzu on the VDS networking stack as well as NSX-T Data Center as the networking solution.

Screenshot of the vCenter UI for configuring vSphere with Tanzu

This installation process takes you through the steps of deploying Supervisor Cluster in your vSphere environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission Control or Kubectl utility to deploy the Tanzu Kubernetes Shared Service and workload clusters.

Design Recommendations

NSX Advanced Load Balancer Recommendations

The following table provides recommendations for configuring NSX Advanced Load Balancer in a vSphere with Tanzu environment.

Decision ID	Design Decision	Design Justification	Design Implications
TKO-Advanced Load Balancer-001	Deploy NSX Advanced Load Balancer controller cluster nodes on a network dedicated to NSX-Advanced Load Balancer.	To isolate NSX Advanced Load Balancer traffic from infrastructure management traffic and Kubernetes workloads.	Allows for ease of management of controllers. Additional Network (VLAN) is required.
TKO-Advanced Load Balancer-002	Deploy 3 NSX Advanced Load Balancer controllers nodes.	To achieve high availability for the NSX Advanced Load Balancer platform. In clustered mode, NSX Advanced Load Balancer availability is not impacted by an individual controller node failure. The failed node can be removed from the cluster and redeployed if recovery is not possible.	Clustered mode requires more compute and storage resources.
TKO-Advanced Load Balancer-003	Configure vCenter settings in Default-Cloud.	Using a non-default vCenter cloud is not supported with vSphere with Tanzu.	Using a non-default cloud can lead to deployment failures.
TKO-Advanced Load Balancer-004	Use static IPs for the NSX Advanced Load Balancer controllers if DHCP cannot guarantee a permanent lease.	NSX Advanced Load Balancer Controller cluster uses management IP addresses to form and maintain quorum for the control plane cluster. Any changes would be disruptive.	NSX Advanced Load Balancer Controller control plane might go down if the management IPs of the controller node changes.
TKO-Advanced Load Balancer-005	Use NSX Advanced Load Balancer IPAM for Service Engine data network and virtual services IP assignment.	Guarantees IP address assignment for Service Engine Data NICs and Virtual Services.	Removes the corner case scenario when the DHCP server runs out of the lease or is down.
TKO-Advanced Load Balancer-006	Reserve an IP in the NSX Advanced Load Balancer management subnet to be used as the Cluster IP for the Controller Cluster.	NSX Advanced Load Balancer portal is always accessible over Cluster IP regardless of a specific individual controller node failure.	NSX Advanced Load Balancer administration is not affected by an individual controller node failure.
TKO-Advanced Load Balancer-007	Use default Service Engine Group for load balancing of TKG clusters control plane.	Using a non-default Service Engine Group for hosting L4 virtual service created for TKG control plane HA is not supported.	Using a non-default Service Engine Group can lead to Service Engine VM deployment failure.
TKO-Advanced Load Balancer-008	Share Service Engines for the same type of workload (dev/test/prod)clusters.	Minimize the licensing cost.	Each Service Engine contributes to the CPU core capacity associated with a license. Sharing Service Engines can help reduce the licensing cost.
TKO-Advanced Load Balancer-009	Configure anti-affinity rules for the NSX ALB controller cluster.	This is to ensure that no two controllers end up in same ESXi host and thus avoid single point of failure.	Anti-Affinity rules need to be created manually.
TKO-Advanced Load Balancer-0010	Configure backup for the NSX ALB Controller cluster.	Backups are required if the NSX ALB Controller becomes inoperable or if the environment needs to be restored from a previous state.	To store backups, a SCP capable backup location is needed. SCP is the only supported protocol currently.
TKO-Advanced Load Balancer-0011	Initial setup should be done only on one NSX ALB controller VM out of the three deployed to create an NSX ALB controller cluster.	NSX ALB controller cluster is created from an initialized NSX ALB controller which becomes the cluster leader. Follower NSX ALB controller nodes need to be uninitialized to join the cluster.	NSX ALB controller cluster creation fails if more than one NSX ALB controller is initialized.
TKO-Advanced Load Balancer-0012	Configure Remote logging for NSX ALB Controller to send events on Syslog.	For operations teams to centrally monitor NSX ALB and escalate alerts events must be sent from the NSX ALB Controller	Additional Operational Overhead. Additional infrastructure Resource.
TKO-Advanced Load Balancer-0013	Use LDAP/SAML based Authentication for NSX ALB	Helps to Maintain Role based Access Control.	Additional Configuration is required.

Network Recommendations

The following are the key network recommendations for a production-grade vSphere with Tanzu deployment:

Decision ID	Design Decision	Design Justification	Design Implications
TKO-NET-001	Use separate networks for Supervisor cluster and workload clusters.	To have a flexible firewall and security policies	Sharing the same network for multiple clusters can complicate creation of firewall rules.
TKO-NET-002	Use distinct port groups for network separation of K8s workloads.	Isolate production Kubernetes clusters from dev/test clusters by placing them on distinct port groups.	Network mapping is done at the namespace level. All Kubernetes clusters created in a namespace connect to the same port group.
TKO-NET-003	Use routable networks for Tanzu Kubernetes clusters.	Allow connectivity between the TKG clusters and infrastructure components.	Networks that are used for Tanzu Kubernetes cluster traffic must be routable between each other and the Supervisor Cluster Management Network.

Recommendations for Supervisor Clusters

Decision ID	Design Decision	Design Justification	Design Implications
TKO-TKGS-001	Create a Subscribed Content Library.	Subscribed Content Library can automatically pull the latest OVAs used by the Tanzu Kubernetes Grid Service to build cluster nodes. Using a subscribed content library facilitates template management as new versions can be pulled by initiating the library sync.	Local Content Library would require manual upload of images, suitable for air-gapped or Internet-restricted environment.
TKO-TKGS-002	Deploy Supervisor cluster control plane nodes in large form factor.	Large form factor should suffice to integrate Supervisor Cluster with TMC and velero deployment.	Consume more Resources from Infrastructure.
TKO-TKGS-003	Register Supervisor cluster with Tanzu Mission Control.	Tanzu Mission Control automates the creation of the Tanzu Kubernetes clusters and manage the life cycle of all clusters centrally.	Need outbound connectivity to internet for TMC registration.

Decision ID

Design Decision

Design Justification

Design Implications

TKO-TKGS-001

Create a Subscribed Content Library.

Subscribed Content Library can automatically pull the latest OVAs used by the Tanzu Kubernetes Grid Service to build cluster nodes.

Using a subscribed content library facilitates template management as new versions can be pulled by initiating the library sync.

Local Content Library would require manual upload of images, suitable for air-gapped or Internet-restricted environment.

TKO-TKGS-002

Deploy Supervisor cluster control plane nodes in large form factor.

Large form factor should suffice to integrate Supervisor Cluster with TMC and velero deployment.

Consume more Resources from Infrastructure.

TKO-TKGS-003

Tanzu Mission Control automates the creation of the Tanzu Kubernetes clusters and manage the life cycle of all clusters centrally.

Need outbound connectivity to internet for TMC registration.

Note
SaaS endpoints here refers to Tanzu Mission Control, Tanzu Service Mesh and Tanzu Observability.

Recommendations for Tanzu Kubernetes Clusters

Decision ID	Design Decision	Design Justification	Design Implications
TKO-TKC-001	Deploy Tanzu Kubernetes clusters with prod plan and multiple worker nodes.	The prod plan provides high availability for the control plane.	Consume from resource from Infrastructure.
TKO-TKC-002	Use guaranteed VM class for Tanzu Kubernetes clusters.	Guarantees compute resources are always available for containerized workloads.	Could prevent automatic migration of nodes by DRS.
TKO-TKC-003	Implement RBAC for Tanzu Kubernetes clusters.	To avoid the usage of administrator credentials for managing the clusters.	External AD/LDAP needs to be integrated with vCenter or SSO groups need to be created manually.
TKO-TKC-04	Deploy Tanzu Kubernetes clusters from Tanzu Mission Control.	Tanzu Mission Control provides life-cycle management for the Tanzu Kubernetes clusters and automatic integration with Tanzu Service Mesh and Tanzu Observability.	Only Antrea CNI is supported on Workload clusters created from TMC portal.

Kubernetes Ingress Routing

vSphere with Tanzu does not ship with a default ingress controller. Any Tanzu-supported ingress controller can be used.

One example of an ingress controller is Contour, an open-source controller for Kubernetes ingress routing. Contour is part of a Tanzu package and can be installed on any Tanzu Kubernetes cluster. Deploying Contour is a prerequisite for deploying Prometheus, Grafana, and Harbor on a workload cluster.

For more information about Contour, see the Contour site and Implementing Ingress Control with Contour.

Tanzu Service Mesh also offers an Ingress controller based on Istio.

Each ingress controller has pros and cons of its own. The below table provides general recommendations on when you should use a specific ingress controller for your Kubernetes environment.

Ingress Controller	Use Cases
Contour	Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security policies for the north-south traffic by defining the policies in the manifest file for the application. Contour is a reliable solution for simple Kubernetes workloads.
Istio	Use Istio ingress controller when you need to provide security, traffic direction, and insight within the cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).

Ingress Controller

Use Cases

Contour

Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security policies for the north-south traffic by defining the policies in the manifest file for the application.

Contour is a reliable solution for simple Kubernetes workloads.

Istio

Use Istio ingress controller when you need to provide security, traffic direction, and insight within the cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).

NSX Advanced Load Balancer Sizing Guidelines

NSX Advanced Load Balancer Controller Configuration

Regardless of NSX Advanced Load Balancer Controller configuration, each controller cluster can achieve up to 5,000 virtual services; 5,000 is a hard limit. For more information, see Avi Controller Sizing.

Controller Size	VM Configuration	Virtual Services	Avi SE Scale
Essentials	4 vCPUS, 24 GB RAM	0-50	0-10
Small	6 vCPUS, 24 GB RAM	0-200	0-100
Medium	10 vCPUS, 32 GB RAM	200-1000	100-200
Large	16 vCPUS, 48 GB RAM	1000-5000	200-400

Service Engine Sizing Guidelines

See Sizing Service Engines for guidance on sizing your SEs.

Performance metric	1 vCPU core
Throughput	4 Gb/s
Connections/s	40k
SSL Throughput	1 Gb/s
SSL TPS (RSA2K)	~600
SSL TPS (ECC)	2500

Multiple performance vectors or features may have an impact on performance. For example, to achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX Advanced Load Balancer recommends two cores.

NSX Advanced Load Balancer Service Engines may be configured with as little as 1 vCPU core and 2 GB RAM, or up to 64 vCPU cores and 256 GB RAM. It is recommended for a Service Engine to have at least 4 GB of memory when GeoDB is in use.

Container Registry

VMware Tanzu for Kubernetes Operations using vSphere with Tanzu includes Harbor as a container registry. Harbor is an open-source, trusted, cloud-native container registry that stores, signs, and scans content.

The initial configuration and setup of the platform does not require any external registry because the required images are delivered through vCenter. Customer can choose any existing repository and if required can deploy harbor registry for storing the images.

When vSphere with Tanzu is deployed on VDS networking, you can deploy an external container registry (Harbor) for Tanzu Kubernetes clusters.

You may use one of the following methods to install Harbor:

Tanzu Kubernetes Grid Package deployment - VMware recommends this installation method for general use cases. The Tanzu packages, including Harbor, must either be pulled directly from VMware or be hosted in an internal registry.
VM-based deployment using OVA - VMware recommends this installation method in cases where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid system images. VM-based deployments are only supported by VMware Global Support Services to host the system images for air-gapped or Internet-restricted deployments. Do not use this method for hosting application images.

When deploying Harbor with self-signed certificates or certificates signed by internal CAs, it is necessary for the Tanzu Kubernetes cluster to establish trust with the registry’s certificate. To do so, follow the procedure in Trust Custom CA Certificates on Cluster Nodes.

Screenshot of Harbor Registry UI

vSphere with Tanzu SaaS Integration

The SaaS products in the VMware Tanzu portfolio are on the critical path for securing systems at the heart of your IT infrastructure. VMware Tanzu Mission Control provides a centralized control plane for Kubernetes, and Tanzu Service Mesh provides a global control plane for service mesh networks. Tanzu Observability features include Kubernetes monitoring, application observability, and service insights.

To learn more about Tanzu Kubernetes Grid integration with Tanzu SaaS, see Tanzu SaaS Services.

Custom Tanzu Observability Dashboards

Tanzu Observability provides various out-of-the-box dashboards. You can customize the dashboards for your particular deployment. For information about customizing Tanzu Observability dashboards for Tanzu for Kubernetes Operations, see Customize Tanzu Observability Dashboard for Tanzu for Kubernetes Operations.

Summary

vSphere with Tanzu on hyper-converged hardware offers high-performance potential and convenience and addresses the challenges of creating, testing, and updating on-premises Kubernetes platforms in a consolidated production environment. This validated approach results in a production installation with all the application services needed to serve combined or uniquely separated workload types via a combined infrastructure solution.

This plan meets many Day-0 needs for quickly aligning product capabilities to full-stack infrastructure, including networking, configuring firewall rules, load balancing, workload compute alignment, and other capabilities.

Deployment Instructions

For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes Operations using vSphere with Tanzu.