vSphere Host and Cluster Design

This section describes the physical specifications of ESXi servers used to create workload clusters or standalone hosts (for RAN deployments) for the successful deployment and operation of the Telco Cloud.

Physical Design Specification Fundamentals

The configuration and assembly process for each system is standardized, with all components installed in the same way on each ESXi host. The standardization of the physical configuration across the ESXi hosts helps you operate an easily manageable and supportable infrastructure. This standardization applies to the ESXi hosts for each cluster in the workload domain. Components of each workload domain might have different physical requirements based on the applications.

As an example, RAN Distributed Unit (DU) workloads deployed to standalone hosts leverage look-aside or in-line accelerator cards. Deploying these PCI cards to Core or VNF workload pods residing in the same workload domain is expensive and unnecessary.

Therefore, ESXi hosts in a cluster must have identical configurations, including storage and networking configurations. For example, consistent PCI card slot placement, especially for network controllers, is essential for the accurate alignment of physical to virtual I/O resources. By using identical configurations, you can balance the VM storage components across storage and compute resources.

The sizing of physical servers running ESXi requires special considerations depending on the workload. Traditional clusters use vSAN as the primary storage system, hence the vSAN requirements must be considered.

The section outlines the generic and specific recommendations for each workload type such as 5G Core and RAN.

ESXi Host Memory

The amount of memory required for vSphere compute clusters varies according to the workloads running in clusters. When sizing the memory of hosts in a compute cluster, consider the admission control setting (n+1) that reserves the resources of a host for failover or maintenance. In addition, leave a memory budget of 8-12 GB for ESXi host operations.

The number of vSAN disk groups and disks managed by an ESXi host determines the memory requirements. To support the maximum number of disk groups, up to 100 GB RAM can be required for vSAN. For more information about the vSAN configuration maximums, see VMware Configuration Maximums.

ESXi Boot Device

The following considerations apply when you select a boot device type and size for vSAN:

vSAN does not support stateless vSphere Auto Deploy.
Device types supported as ESXi boot devices:
- SATADOM devices. The size of the boot device per host must be at least 32 GB.
- USB or SD embedded devices. The USB or SD flash drive must be at least 8 GB.

vSphere 8.0 is deprecating the ability to boot from SD disks. A dedicated boot disk with at least 128 GB is required to include optimal support for ESX-OSData partitions. This local disk can also be RAID-1 Mirrored to provide boot disk resiliency; this is handled at the server layer.


Design Recommendation	Design Justification	Design Implication	Domain Applicability
Use vSAN ready nodes.	Ensures full compatibility with vSAN	Hardware choices might be limited.	Management domain, Compute clusters, VNF, CNF, C-RAN, Near/Far Edge, and NSX-Edge. Not applicable to RAN sites.
All ESXi hosts must have a uniform configuration across a cluster in a workload environment.	A balanced cluster has more predictable performance even during hardware failures. In addition, the impact on performance during re-sync or rebuild is minimal.	As new models of servers become available, the deployed model phases out. So, it becomes difficult to keep a uniform cluster when adding hosts later.	Management domain, Compute clusters, VNF, CNF, C-RAN, Near/Far Edge, and NSX edge. For RAN sites, ensure that all RAN sites in a workload domain have uniform configuration.
Set up the management cluster with a minimum of four ESXi hosts.	Provides full redundancy for the management cluster when vSAN is used	Additional ESXi host resources are required for redundancy.	Management Domain
Set up the workload and edge clusters with a minimum of four ESXi hosts.	Provides full redundancy for the workload cluster when vSAN is used	Additional ESXi host resources are required for redundancy. If Not using vSAN the clusters can be minimally sized according to the workload requirements.	Compute clusters, VNF, CNF ,C-RAN, Near/Far Edge, and NSX Edge deployments
Set up each ESXi host in the management cluster with a minimum of 256 GB RAM.	Ensures that the management components have enough memory to run during a single host failure. Provides a buffer for future management or monitoring of components in the management cluster.	In a four-node cluster, only the resources of three ESXi hosts are available as the resources of one host are reserved for vSphere HA. Depending on the products deployed and their configuration, more memory per host (or more hosts) might be required.	Management Domain
Size the RAM on workload clusters/ hosts appropriately for the planned applications	Ensures that the Network Functions have enough memory to run during a single host failure. Provides a buffer for scaling or additional network functions to be deployed to the cluster.	In a four-node cluster, only the resources of three ESXi hosts are available as the resources of one host are reserved for vSphere HA. Depending on the Functions deployed and their configuration, more memory per host might be required.	Compute Clusters, VNF, CNF, C-RAN, Near/Far Edge, and NSX Edge deployments
Use a disk or redundant Boot Optimized Server Storage unit as the ESXi boot disk.	Enables a more stable deployment than USB or SD cards Stores coredumps and rotating logs	Requires additional disks for the ESXi boot disk	All domains

Management Cluster

The management cluster runs multiple VMs, from multiple instances of vCenter or NSX to scaled deployments of Aria Operations, Aria Operations for logs, and so on.

When deploying new functions to the management domain, ensure that the management cluster has the failover capacity to handle host component failures or downtime caused by host upgrades.

Compute Clusters

The compute cluster sizing depends on various factors. The control plane and user plane workloads are deployed to unique compute clusters (different vSphere clusters). To maximize the overall performance, the user plane workloads require more resources for features such as Latency Sensitivity, NUMA awareness, and infrastructure awareness.

A compute pod design must be rack-aligned with 16/24/32 servers per rack, depending on the server RU size, the power available to the rack, and so on.

Network Edge Clusters

Network Edge Clusters are smaller in size than the Management or Core Compute clusters. The core compute clusters consume an entire rack. The network edge cluster primarily hosts NSX Edge Gateway VMs and Avi Load Balancer Service Gateways. It must be sized according to the edge domain requirements.

Near/Far Edge Clusters

Near/Far Edge clusters are smaller in size than the Management or Core Compute clusters. The Near/Far Edge clusters are used in a distributed environment. The use cases for this cluster type include:

C-RAN Deployments (hosting multiple DUs in a single location)
Edge services (UPF Breakout)
Multi-Access Edge Computing (MEC) use cases for enterprise verticals or co-located offerings

The hardware that is installed in the Near/Far Edge clusters might also differ from the general-purpose compute based on the service offerings. GPUs can be deployed for AI/ML services and inline/look-aside accelerators can be deployed for C-RAN type services.

Note:

The standardization of physical specifications across the cluster applies to all cluster types.

RAN Cell Sites

RAN Cell Sites run fewer workloads than other cluster types; however, these workloads require low latency and more resources in real time.

ESXi Host memory must be sized to accommodate both the Tanzu Kubernetes Grid RAN worker nodes and ensure that memory is available for the hypervisor. Standalone hosts do not require RAM overhead for vSAN.

The minimum memory reservation for RAN worker nodes is 12 GB RAM and 2 physical cores per NUMA node. Remove these resources from the overall host capacity when performing function sizing and capacity planning and include overhead for ESXi and the Guest OS or Kubernetes processing requirements.

Note:

Do not forget to include PaaS component dimensioning when performing CPU and Memory calculations.