Each type of workload places different requirements on the Tanzu Standard for Telco Kubernetes cluster or worker node. This section provides the Kubernetes worker node and cluster sizing based on workload characteristics.

The ETSI NFV Performance & Portability Best Practices (GS NFV-PER 001) classifies NFV workloads into different classes. The characteristics distinguishing the workload classes are as follows:

Workload Classes

Workload Characteristics

Data plane workloads

Data plane workloads cover all tasks related to packet handling in an end-to-end communication between Telco applications. These tasks are intensive in I/O operations and memory R/W operations.

Control plane workloads

  • Control plane workloads cover any other Network Function communication that is not directly related to the end-to-end data communication between edge applications. This category of communication includes session management, routing, and authentication.

  • Compared to data plane workloads, control plane workloads are less intensive in terms of transactions per second, while the complexity of the transactions might be high.

Signal processing workloads

Signal processing workloads cover all tasks related to digital processing, such as the FFT decoding and encoding in a cellular base station. These tasks are intensive in CPU processing capacity and are highly latency-sensitive.

Storage workloads

Storage workloads cover all tasks related to disk storage.

Data plane workloads are further categorized into the following profiles:

Workload Profile

Workload Description

Profile 1

Low data rate with numerous endpoints, best-effort bandwidth, jitter, or latency.

Example: Massive Machine-type Communication (MMC) application

Profile 2

High data rate with bandwidth guarantee only, no jitter or latency.

Example: Enhanced Mobile Broadband (eMBB) application

Profile 3

High data rate with bandwidth, latency, and jitter guarantee.

Examples: factory automation, virtual and augmented reality applications

Worker nodes for profile 3 may require Kernel module updates such as huge pages and SCTP with complete vertical NUMA alignment. Profiles 2 and 3 might not require specialized hardware but have requirements for kernel module updates. When you size the Kubernetes cluster to meet the SLA requirements, size the cluster based on workload characteristics.

Control Plane Node Sizing

By default, dedicate Control Plane nodes to the control plane components only. All control plane nodes deployed have a taint applied to them. This taint instructs the Kubernetes scheduler not to schedule any user workloads on the control plane nodes. Assigning nodes in this way improves security, stability, and management of the control plane. Isolating the control plane from other workloads significantly reduces the attack surface as user workloads no longer share a kernel with Kubernetes components.

By following the above recommendation, the size of the Kubernetes control Plane node depends on the size of the Kubernetes cluster. When sizing the control Plane node for CPU, memory, and disk, consider both the number of worker nodes and the total number of Pods.

The following table lists the Kubernetes control Plane node sizing estimations based on the cluster size:

Cluster Size

Control Plane Node vCPU

Control Plane Node Memory

Up to 10 Worker Nodes

2

8 GB

Up to 100 Worker Nodes

4

16 GB

Up to 250 Worker Nodes

8

32 GB

Up to 500 Worker Nodes

16

64 GB

Worker Node Sizing

When sizing a worker node, consider the number of Pods per node. For low-performance pods (Profile 1 workload), ensuring Kubernetes pods are running and constant reporting of pod status to the Control Plane node contributes to majority of the kubelet utilization. High pod counts lead to higher kubelet utilization. Kubernetes official documentation recommends limiting Pods per node to less than 100. This limit can be set high for Profile 1 workload, based on the hardware configuration.

Profile 2 and Profile 3 require dedicated CPU, memory, and vNIC to ensure throughput, latency, or jitter. When considering the number of pods per node for high-performance pods, use the hardware resource limits as the design criteria.

As a rule, allocate 1 vCPU and 10 GB memory for every 10 running Pods for CPU and memory for generic workloads. In scenarios where the CNF vendor has specific vCPU and memory requirements, the worker node must be sized based on CNF vendor recommendations.

Tanzu Kubernetes Cluster Sizing

When you design a Tanzu Kubernetes cluster for workload with high availability, consider the following:

  • Number of failed nodes to be tolerated at once

  • Number of nodes available after a failure

  • Remaining capacity to reschedule pods of the failed node

  • Remaining Pod density after rescheduling

Based on failure tolerance, Pods per node, workload characteristics, worker nodes to deploy for each workload profile can be generalized using the following formula:

Worker Node (Z)= (pods * D) + (N+1)

  • Z specifies workload profile 1 - 3

  • N specifies the number of failed nodes that can be tolerated

  • D specifies the max density. 1/110 is the K8s recommendation for generic workloads.

For example, if each node supports 100 Pods, building a cluster that supports 100 pods with a failure tolerance of one node requires two worker nodes. Supporting 200 pods with a failure tolerance of one node requires three worker nodes.

Total Worker node = Worker Nodes (profile 1) + Worker Nodes (profile 2) + Worker Nodes (profile 3)

Total worker node per cluster is the sum of worker nodes required for each profile. If a cluster supports only one profile, the number of worker nodes required to support other two profiles is set to zero.

Design Decision

Design Justification

Design Implication

Dedicate the Kubernetes Control Plane node to Kubernetes control components only.

Improved security, stability, and management of the control plane.

  • Special taints must be configured on the Control Plane nodes to indicate that scheduling is not allowed on Control Plane nodes.

  • Special toleration must be applied to override the taint for control plane pods such as the cloud provider.

Kubernetes Control Plane node VMs in the same cluster must be sized identically based on the maximum number of Pods and nodes in a cluster.

  • Control Plane node sizing is based on the size of the cluster and pod.

  • Insufficient CPU/Memory/Disk in the Control Plane node can lead to unpredictable performance.

Idle resources as only the Kubernetes API server runs active/active under the steady state.

Set the maximum number of Pods per worker node based on the HW profile.

  • Not all clusters are built using the same CPU/Memory/Disk sizing.

  • Larger nodes reduce the number of nodes to manage by supporting more pod per node. Smaller nodes minimize the blast radius but cannot support numerous Pods per node.

K8s cluster-wide setting does not work properly in a heterogeneous cluster.

Use failure tolerance, workload type, and pod density to plan the minimum cluster size.

  • Kubernetes enforces a cluster-wide pod/worker node ratio.

  • To honor SLA, the remaining capacity after the node failure must be greater than the number of pods needing to reschedule.

High failure tolerance can lead to low server utilization.