Data plane intensive CNFs require CPU pinning to avoid CPU scheduling delays. CPU pinning also ensures that a deterministic CPU capacity is available consistently to the data plane CNF. The availability of CPU capacity leads to deterministic network performance, which is the desired outcome for data plane intensive workloads.

You must apply CPU pinning for data plane CNFs through two distinct levels in the CaaS layer:

  • The Worker Node CPUs are pinned (with exclusive affinity) to the CPU cores (Physical or Hyperthreaded) in the ESXi Hypervisor.

  • The data plane containers are pinned to the Worker Node CPUs.

This ensures that the data plane containers have exclusive access to the CPU cores and meet the low latency and high throughput requirements.

Note:

Depending on the characteristics of the data plane intensive CNF, you can pin the Worker Node CPUs to either Physical or Hyperthreaded CPU cores in the ESXi Hypervisor. Ensure that you do not assign all cores in a NUMA node, particularly NUMA node 0, to the Worker Node. Some compute must be available for ESXi and other system services.

Pin all Worker Node CPUs to Physical CPU cores

For the data plane intensive CNFs that are extremely sensitive to latency, set LatencySensitivity=High on the Worker Nodes. This setting enables CPU pinning for all Worker Node CPUs by applying exclusive affinity. It also enables complete core isolation for the pinned cores by leaving the hyperthread siblings idle in ESXi. CPU pinning and core isolation eliminate interference and improve the performance of the workloads running in the Worker Node.

When the LatencySensitivity=High setting is applied on the Worker Node, its CPU and memory resources are NUMA aligned. Because the hyperthread siblings remain idle in ESXi, the Worker Nodes in a single NUMA can only use half of the maximum number of the Logical cores available in a CPU.

Note:

The following node customization is required for applying LatencySensitivity=High configuration on the Worker Nodes. For more details, see Node Customization in the Telco Cloud Automation documentation. Latency Sensitivity can also be set to High on the Network Function Designer on TCA.

  • Add the node component latency_sensitivity with the value high.

The Numa Alignment option on the TCA Network Function Designer helps mark the Network Function for instantiation on ESXi hosts that guarantees strict NUMA alignment of pNICs, vCPUs, and vNICs on the worker node. If no nodes are found that meet these requirements, the network function will fail to be deployed.

  • This setting is only needed for CNFs that have requirements for SRIOV interfaces and is not to be used when data plane acceleration is achieved using EDP.

Virtual Hyper Threading

The Virtual Hyper Threading feature, also known as virtual Simultaneous Multi-Threading (SMT), is introduced in ESXi 8.0. This feature exposes the underlying NUMA and HT topology to the guest operating system, allowing it to use CPU resources more efficiently.

Before this feature was introduced, the Guest OS interpreted each vCPU as a single-threaded virtual core. For example, without Virtual Hyper Threading, a VM created with 8 vCPUs appear as a single socket machine with 8 cores per socket and 1 thread per core. With Virtual Hyper Threading enabled, the same machine appear as a single socket machine with 4 cores per socket and 2 threads per core.

To guarantee that the virtual HT pairs are vertically aligned with the host physical hyperthreads, use this feature in conjunction with latency_sensitivity=high.

Note:

The following node customizations are required for applying LatencySensitivity=High with enableSMT=true configuration on the Worker Nodes:

  • Add the node component latency_sensitivity with the value high

  • Add the node component enable_SMT with the value true

Virtual SMT can also be set to Yes on the Network Function Designer on TCA. For more details, see Node Customization in the Telco Cloud Automation documentation.

Pin Worker Node CPUs to Hyperthreaded CPU cores

For CPU-bound data plane intensive CNFs that are not extremely sensitive to latency, pin the Worker Node CPUs to the Hyperthreaded CPU cores in ESXi. This ensures that the Worker Nodes in a single NUMA can use the maximum number of the Logical cores available in a CPU.

LatencySensitivity=High must not be set on the Worker Node when pinning its CPUs on the Hyperthreaded cores in ESXi. CPU pinning is enabled by applying exclusive affinity in ESXi. Core isolation is not completely applied because Hyperthread siblings must share hardware resources.

Note:

Do not pin all the CPUs of the Worker Node to the Hyperthreaded CPU cores in ESXi. Leave some CPUs in a Worker node for its OS and Kubernetes tasks. These unpinned CPUs will use shared resources allocated by ESXi.

Node Customization in Telco Cloud Automation ensures that the CPU and memory resources of the Worker Node are fully reserved to apply exclusive affinity and they are also NUMA aligned in a multi-socket server.

Note:

The following node customizations are required for pinning Worker Node CPUs to the Hyperthreaded CPU cores in ESXi:

  • Define the kernel argument isolcpus under kernel: kernel_args

  • Add the node component isNumaConfigNeeded with the value true

For more details, see Node Customization in the Telco Cloud Automation documentation.

Pin Data Plane Container to Worker Node CPUs

When you deploy the Workload Cluster with a dedicated Node Pool for the data plane intensive CNF, use the CPU Manager Policy as 'Static'. This setting enables 'Static' CPU management policy in the Kubernetes CPU Manager. Static policy is required to allow CPU affinity for the data plane containers.

For deterministic performance, the data plane containers must be exclusively pinned to the Worker Node CPUs. To apply this setting, you must define the container as part of a Guaranteed Pod and with integer CPU requests.