VMware Tanzu Standard for Telco

Tanzu Standard for Telco provisions and manages the lifecycle of Tanzu Kubernetes clusters.

A Tanzu Kubernetes cluster is an opinionated installation of Kubernetes open-source software that is built and supported by VMware. With Tanzu Standard for Telco, administrators provision and use Tanzu Kubernetes clusters in a declarative manner that is familiar to Kubernetes operators and developers.

Figure 1. Tanzu Standard for Telco Architecture

Cluster API for Life Cycle Management

The Cluster API (CAPI) brings declarative, Kubernetes style APIs for cluster creation, configuration, and management. CAPI uses native Kubernetes manifests and APIs to manage bootstrapping and life cycle management of Kubernetes.

CAPI relies on a pre-defined cluster YAML specification that describes the desired state of the cluster and attributes such as the class of VM, size, and the total number of nodes in the Kubernetes cluster.

Based on the cluster YAML specification, the Kubernetes admin can bootstrap a fully redundant, highly available management cluster. From the management cluster, additional workload clusters can be deployed on top of the Telco Cloud Platform based on the CAPI extensions. Binaries required to deploy the cluster are bundled as VM images (OVAs) generated using the image-builder tool.

The following figure shows the overview of bootstrapping a cluster to deploy Management and Workload clusters using CAPI.

Tanzu Standard for Telco Management and Workload Clusters

Tanzu Kubernetes Management Cluster is a Kubernetes cluster that functions as the primary management and operational center for the Tanzu Standard for Telco instance. In this management cluster, the Cluster API for vSphere (CAPV) runs to create Tanzu Kubernetes clusters and you configure the shared and in-cluster services that the clusters use.

Tanzu Kubernetes Workload Cluster is a Kubernetes cluster that is deployed from the Tanzu Kubernetes management cluster. Tanzu Kubernetes clusters can run different versions of Kubernetes, depending on the CNF workload requirements. Tanzu Kubernetes clusters support multiple types of CNIs for Pod-to-Pod networking, with Antrea as the default CNI and the vSphere CSI provider for storage by default. When deployed through Telco Cloud Automation, VMware NodeConfig Operator is bundled into every workload cluster to handle the node Operating System (OS) configuration, performance tuning, and OS upgrades required for various types of Telco CNF workloads.

Tanzu Kubernetes Cluster Architecture

The following diagram shows different hosts and components of the Tanzu Standard for Telco cluster architecture.

Management and Workload Kubernetes Clusters

A Kubernetes cluster in the Telco Cloud Platform consists of etcd and the Kubernetes control and data planes.

Etcd: Etcd must run in the cluster with an odd number of cluster members to establish a quorum. A 3-node cluster tolerates the loss of a single member, while a 5-node cluster tolerates the loss of two members. Etcd availability requirements determine the number of Kubernetes Control Plane nodes.

Control Plane node: The Kubernetes control plane must run in redundant mode to avoid a single point of failure. To improve API availability, Kube-VIP is placed in front of the Control Plane nodes. The load balancer must perform health checks to ensure the API server availability. The following table lists the HA characteristics of the Control Plane node components.


Component	Availability
API Server	Active/Active
Kube-controller-manager	Active/Passive
Kube-scheduler	Active/Passive

Worker Node

NUMA Topology: When deploying Kubernetes worker nodes that host high bandwidth applications, ensure that the processor, memory, and vNIC are vertically aligned and remain within a single NUMA boundary.

The topology manager is a new component in the Kubelet and provides NUMA awareness to Kubernetes at the pod admission time. The topology manager figures out the best locality of resources by pulling topology hints from the Device Manager and the CPU manager. Pods are then placed based on the topology information to ensure optimal performance.

CPU Core Affinity: CPU pinning can be achieved in different ways. Kubernetes built-in CPU manager is the most common. The CPU manager implementation is based on cpuset. When a VM host initializes, host CPU resources are assigned to a shared CPU pool. All non-exclusive CPU containers run on the CPUs in the shared pool. When the Kubelet creates a container requesting a guaranteed CPU, CPUs for that container are removed from the shared pool and assigned exclusively for the life cycle of the container. When a container with exclusive CPUs is terminated, its CPUs are added back to the shared CPU pool.

The CPU manager includes the following two policies:

None: Default policy. The kubelet uses the CFS quota to enforce pod CPU limits. The workload can move between different CPU cores depending on the load on the Pod and the available capacity on the worker node.

Static: With the static policy enabled, the CPU request results in the container getting allocated the whole CPU and no other container can schedule on that CPU.

CPU Manager for Kubernetes (CMK) is another tool used by selective CNF vendors to assign the core and NUMA affinity for data plane workloads. Unlike the built-in CPU manager, CMK is not bundled with Kubernetes and requires separate download and installation. CMK must be used over the built-in CPU manager if required by the CNF vendor.

Huge Pages: For Telco workloads, the default huge page size can be 2 MB or 1 GB. To report its huge page capacity, the worker node determines the supported huge page sizes by parsing the /sys/kernel/mm/hugepages/hugepages-{size} directory on the host. Huge pages must be set to pre-allocated for maximum performance. Pre-allocated huge pages reduce the amount of available memory on a worker node. A node can only pre-allocate huge pages for the default size. The Transport Huge Pages must be disabled.

Container workloads requiring huge pages use hugepages-<hugepagesize> in the Pod specification. As of Kubernetes 1.18, multiple huge page sizes are supported per Pod. Huge Pages allocation occurs at the pod level.


Design Decision	Design Justification	Design Implication
Deploy Tanzu Kubernetes clusters using three Control Plane nodes to ensure full redundancy	A three-node cluster tolerates the loss of a single member	Each Control Plane node requires CPU and memory resources. CPU/Memory overhead is high for small sized Kubernetes clusters.
Install and activate the NTP clock synchronization service with custom NTP servers.	Kubernetes and its components rely on the system clock to track events, logs, and state.	None
Disable Swap on all Kubernetes Cluster Nodes.	Swap causes a decrease in the overall performance of the cluster.	None
Vertically align Processor, memory, and vNIC and keep them within a single NUMA boundary for data plane intensive workloads.	High packet throughput can be maintained for data transfer across vNICs within the same NUMA zone than in different NUMA zones. Latency_sensitivity must be enabled for best effort NUMA placement.	Requires an extra configuration step on the vCenter to ensure NUMA alignment. Note: This is not required for generic workloads such as web services, lightweight databases, and monitoring dashboards.
Set the CPU manager policy to `static` for data plane intensive workloads.	Static mode is required to guarantee exclusive CPU cores on the worker node for data-intensive workloads.	Requires an extra configuration step for CPU Manager through NodeConfig Operator. Note: This is not required for generic workloads such as web services, lightweight databases, monitoring dashboards, and so on.
When enabling a static CPU manager policy, set aside sufficient CPU resources for the kubelet operation.	The kubelet requires a CPU reservation to ensure that the shared CPU pool is not exhausted under load. The amount of CPU to be reserved depends on the pod density per node.	Requires an extra configuration step for CPU Manager. Less CPU reservation can impact the Kubernetes cluster stability. Note: This is not required for generic workloads such as web services, lightweight databases, monitoring dashboards, and so on.
Enable huge page allocation at boot time.	Huge pages reduce the TLB miss. Huge page allocation at boot time prevents memory from becoming unavailable later due to fragmentation. Update VM setting for Worker Nodes 1G Hugepage. Enable IOMMUs to protect system memory between I/O devices.	Pre-allocated huge pages reduce the amount of available memory on a worker node. Requires an extra configuration step in the worker node VM GRUB configuration. Enabling huge pages requires a VM reboot. Note: This is not required for generic workloads such as web services, lightweight databases, monitoring dashboards, and so on.
Set the default huge page size to 1 GB. Set the overcommit size to 0.	For 64-bit applications, use 1 GB huge pages if the platform supports them. Overcommit size defaults to 0, no actions required.	For 1 GB pages, the huge page memory cannot be reserved after the system boot. Note: This is not required for generic workloads such as web services, lightweight databases, monitoring dashboards, and so on.
Mount the file system type hugetlbfs on the root file system.	The file system of type hugetlbfs is required by the mmap system call. Create an entry in fstab so the mount point persists after a reboot.	Perform an extra configuration step in the worker node VM configuration. Note: This is not required for generic workloads such as web services, lightweight databases, monitoring dashboards, and so on.