Deploy a Workload Cluster

Deploy a workload cluster using the VMware Telco Cloud Automation user interface.

Prerequisites

You require a role with Infrastructure Lifecycle Management privileges.
You must have uploaded the Virtual Machine template of the specific k8s version to VMware Telco Cloud Automation that the cluster will run. Refer to Import New BYOI Templates into vSphere.
You must have onboarded a vSphere VIM.
You must have created a Management cluster or uploaded a Workload cluster template.
A network must be present with a DHCP range and a static IP of the same subnet.
When you enable multi-zone, ensure that:
- For region: vSphere data center has tags attached for the selected category.
- For zone: vSphere Cluster or hosts under the vSphere cluster has tags attached for the selected category. Ensure that vSphere Cluster and hosts under vSphere cluster does not share the same tags.

Procedure

Log in to the VMware Telco Cloud Automation web interface.
Go to Infrastructure > CaaS Infrastructure and click Deploy Cluster.
Destination Info: Provide the following information:
- Management Cluster - Select the Management cluster from the drop-down menu. You can also select a Management cluster deployed in a different vCenter.
- Destination Cloud - Select a cloud on which you want to deploy the Kubernetes cluster.
- Datacenter - Select a data center that is associated with the cloud.
Advanced Options - Provide the secondary cloud information here. These options are applicable when creating stretch clusters.
- (Optional)
  Secondary Cloud - Select the secondary cloud. It is required for stretched cluster creation.
- (Optional)
  Secondary Data Center - Select the secondary data center.
  
  Note: The secondary cloud and data center are not applicable for classy standard and classy single node clusters.
- (Optional)
  NF Orchestration VIM - Provide the details of the VIM. VMware Telco Cloud Automation uses this VIM and associated Control Planes for NF life cycle management.
Cluser Type: Select on of the following clusters:
- Standard Cluster - A cluster managed by legacy declarative cluster APIs.
  Note: New features are not supported in the legacy cluster APIs .
- Classy Standard Cluster - A standard cluster based on ClusterClass APIs where the APIs manage the control plane and node pools.
- Classy Single Node Cluster - A standard cluster based on ClusterClass APIs where the control plane and worker node are in a single Kubernetes node. This cluster type is suitable for CNFs running on RAN edge sites.
  ClusterClass is implemented on top of existing interfaces to streamline cluster lifecycle management while maintaining the same underlying API. ClusterClass templates can be modified to change the cluster topology without having to change the specification files. You can customize ClusterClass templates by using variables. Variables are name:value pairs defined in the Topology section.
  Note: By default, classy standard and classy single node clusters are hardened to the levels as described in STIG Results and Exceptions and CIS Results and Exceptions, with Kubernetes version of cluster 1.26.14 or higher.
Click Next.

Cluster Info: Provide the following information:

Name - Enter a name for the Workload cluster. The cluster name must be compliant with DNS hostname requirements as outlined in RFC-952 and amended in RFC-1123.
Note: Please add the following cluster name to deny list.
- capi-kubeadm-bootstrap-system
- capi-kubeadm-control-plane-system
- capi-system
- capv-system
- cert-manager
- default
- fluent-system
- istio-system
- kube-node-lease
- kube-public
- kube-system
- kube-node-lease
- metallb-system
- postgres-operator-system
- tanzu-package-repo-global
- tanzu-system
- tca-services
- tca-mgrtca-system
- tkg-system
- tkg-system-public
- tkg-system-telemetry
- tkr-system
TCA BOM Release - The TCA BOM Release file contains information about the Kubernetes version and add-on versions. You can select multiple BOM release files.
Note: After you select the BOM release file, the Security Options section is made available.
CNI - Select a Container Network Interface (CNI) such as Antrea or Calico.
Proxy Repository Access - Available only when the selected management cluster uses a proxy repository. Select the proxy repository from the drop-down list.
Airgap Repository Access - Available only when the selected management cluster uses a airgap repository. Select the airgap repository from the drop-down list.
IP Version - The IP version specified in the Management cluster is displayed here.
Note:
IP version can be IPv4, IPv6, or 'IPv6 and IPv4' when the type of workload cluster is Classy Standard Cluster, Kubernetes version is v1.28 or higher, and the IP version of management cluster is 'IPv6 and IPv4'.
Cluster Endpoint IP - Enter the IP of the API server load balancer.
Note:
- Assign an IP address that is not within your DHCP range, but in the same subnet as your DHCP range.
- If IP version of the workload cluster is IPv6 or 'IPv6 and IPv4', Virtual IP Address of ‘IPv6 and IPv4’ cluster must be in IPv6 format.
Cluster (pods) CIDR - Enter the IP for clusters. VMware Telco Cloud Automation uses the CIDR pool to assign IP addresses to pods in the cluster.
Note: Cluster (pods) CIDR will include both IPv6 and IPv4 CIDR when IP version of cluster is 'IPv6 and IPv4'. The default input will be set to fd10:100:64::/48, 100.96.0.0/11.
Service CIDR - Enter the IP for clusters. VMware Telco Cloud Automation uses the CIDR pool to assign IP addresses to the services in the cluster.
Note: Service CIDR will include both IPv6 and IPv4 CIDR when IP version of cluster is 'IPv6 and IPv4'. The default input will be set to fd10:100:96::/108, 100.64.0.0/13.

Topology Variable - Click Add Variable and perform the following:

Note: Topology variables are not applicable for standard clusters.

Select the required variable from the drop-down list.
Note: Ensure that you have selected BOM to populate variables in the drop-down menu.

Enter the value for the variable.

The following table lists the variables.


Variable	Description	Input Type	Input
vipNetworkInterface	Network interface name, for example, an Ethernet interface	String	Default value is eth0.
aviAPIServerHAProvider	You can use NSX Advanced Load Balancer or Kube-Vip as the Control Plane API Server endpoint.	Boolean	True: Enables NSX Advanced Load Balancer as the Control Plane API Server endpoint. False: Enables Kube-Vip as the Control Plane API Server endpoint.
kubeVipLoadBalancerProvider	You can either use Kube-Vip as the load balancer or use an external load balancer for workloads.	Boolean	True: Enables Kube-Vip as the load balancer. False: Directs the load balancer traffic to an exeternal load balancer.
ntpServers	Configure the cluster's NTP server if you are deploying clusters in vSphere without DHCP Option 42.	String Note: Multiple NTPS are comma- separated.	Enter the NTP server IP address.
controlPlaneTaint	Kubeadm applies taint on control plane nodes allow only specific PODs to schedule on them. This ensures proper workload placement and avoids placing PODs on the node that has no toleration for the taint. Note: This variable is not applicable for Classy Single Node Clusters.	Boolean	True: Control plane nodes allow only critical workloads to be scheduled onto them. False: Control plane nodes allow all workloads to be scheduled onto them.
etcdExtraArgs	Specify the etcd flags. For example, if the cluster has more than 500 nodes or the storage performance is not good, you can increase the heartbeat-interval to 300 and election-timeout to 2000.	Object	YAML code. Example to increase the heartbeat interval to 300 and election timeout to 2000: election-timeout: '2000' heartbeat-interval: '300'
apiServerExtraArgs	Specify kube-apiserver flags. For example, set cipher suites to `tls-min-version: VersionTLS12` and `tls-cipher-suites: TLS_RSA_WITH_AES_256_GCM_SHA384`.	Object	YAML code. Example to set the cipher suites with tls-minimum version and tls-cipher-suites to TLS_RSA_WITH_AES_256_GCM_SHA384: tls-min-version: 'VersionTLS12' tls-cipher-suites: 'TLS_RSA_WITH_AES_256_GCM_SHA384'
kubeSchedulerExtraArgs	Specify kube-scheduler flags. For example, enable Single Pod Access Mode with `feature-gates: ReadWriteOncePod=true`	Object	YAML code. Example to enable the single pod access mode with feature-gates set to ReadWriteOncePod=true: feature-gates: 'ReadWriteOncePod=true'
kubeControllerManagerExtraArgs	Specify the kube-controller-manager flags. For example, turn off performance profiling with `profiling:false`	Object	YAML code. Example to turn off performance profiling to false: profiling: 'false'
controlPlaneKubeletExtraArgs	Specify the control plane kubelet flags. For example, limit the number of control plane PODs with `max-pods: 50`	Object	YAML code. Sample code to set the maximum limit of control plane pods to 50: max-pods: '50' read-only-port: '10255' max-open-files: '100000'
workerKubeletExtraArgs	Specify the worker kubelet flags. For example, limit the number of worker nodes with `max-pods: 50` Note: This variable is not applicable for Classy Single Node Clusters.	Object	YAML code. Sample code to set the maximum limit of worker pods to 50: max-pods: '50' read-only-port: '10255' max-open-files: '100000'
identityRef	A reference to a Secret or VSphereClusterIdentity containing the identity to be used when reconciling a cluster.	Object	YAML code. Example of a reference to VSphereClusterIdentity: kind: VSphereClusterIdentity name: "identity name"
pci	Configures PCI passthrough on all control planes or worker machines.	Object	YAML code. Example to configure PCI passthrough on the control plane and worker node devices: Example: controlPlane: devices: - vendorId: 0x10DE deviceId: 0x1EB8 hardwareVersion: vmx-15 worker: devices: - vendorId: 0x10DE deviceId: 0x1EB9 hardwareVersion: vmx-17
eventRateLimitConf	You can enable and configure an EventRateLimit admission controller to moderate traffic to the Kubernetes API server.	String	A base64 string of the EventRateLimit configuration file. YXBpVmVyc2lvbjogZXZlbnRyYXRlbGltaXQuYWRtaXNzaW9uLms4cy5pby92MWFscGhhMQpraW5kOiBDb25maWd1cmF0aW9uCmxpbWl0czoKLSB0eXBlOiBOYW1lc3BhY2UKICBxcHM6IDUwCiAgYnVyc3Q6IDEwMAogIGNhY2hlU2l6ZTogMjAwMAotIHR5cGU6IFVzZXIKICBxcHM6IDEwCiAgYnVyc3Q6IDUwCg==
security	Specify security-related configurations.	Object	YAML code. Example to set the minimum TLS protocol version to 1.2: fileIntegrityMonitoring: enabled: false imagePolicy: pullAlways: false webhook: enabled: false spec: allowTTL: 50 defaultAllow: true denyTTL: 60 retryBackoff: 500 kubeletOptions: eventQPS: 50 streamConnectionIdleTimeout: 4h0m0s systemCryptoPolicy: default minimumTLSProtocol: tls_1.2

Enable Autoscaler - Click the toggle button to activate the autoscaler feature.
Note: The autoscaler feature is unavailable for Classy Single Node clusters.

The autoscaler feature automatically controls the replica count on the node pool by increasing or decreasing the replica counts based on the workload. If you activate this feature for a particular cluster, you cannot deactivate it after the deployment.When you activate the autoscaler feature, the following fields are displayed:

Note:
The values in these fields are automatically populated from the cluster. However, you can edit the values.
- Min Size - Sets a minimum limit to the number of worker nodes that autoscaler should decrease.
- Max Size - Sets a maximum limit to the number of worker nodes that autoscaler should increase.
- Max Node - Sets a maximum limit to the number of worker and control plane nodes that autoscaler should increase. The default value is 0.
- Max Node Provision Time - Sets the maximum time that autoscaler should wait for the nodes to be provisioned. The default value is 15 minutes.
- Delay After Add - Sets the time limit for the autoscaler to start the scale-down operation after a scale-up operation. For example, if you specify the time as 10 minutes, autoscaler resumes the scale-down scan after 10 minutes of adding a node.
- Delay After Failure - Sets the time limit for the autoscaler to restart the scale-down operation after a scale-down operation fails. For example, if you specify the time as 3 minutes and there is a scale-down failure, the next scale-down operation starts after 3 minutes.
- Delay After Delete - Sets the time limit for the autoscaler to start the scale-down operation after deleting a node. For example, if you specify the time as 10 minutes, autoscaler resumes the scale-down scan after 10 minutes of deleting a node.
- Unneeded Time - Sets the time limit for the autoscaler to scale-down an unused node. For example, if you specify the time as 10 minutes, any unused node is scaled down only after 10 minutes.

Click Next.
Security Options
- Click the Enable toggle button to apply the customized audit configuration. Otherwise, the default audit configuration is applied to the workload cluster.
- Click the POD Security Default Policy toggle button to apply the POD security policies to the workload cluster.
  - POD Security Standard Audit: Policy violation adds an audit annotation to the event recorded in the audit log, but does not reject the POD.
  - POD Security Standard Warn: Policy violation displays an error message on the UI, but does not reject the POD.
  - POD Security Standard Enforce: Policy violation rejects the POD.
    Select one of the following options from the preceding drop-down lists:
    
    Restricted: A fully restrictive policy that follows the current POD security hardening best practices for providing permissions.
    
    Baseline: A minimal restrictive policy that prevents known privilege escalations. Allows the default Pod configurations.
    
    Privileged: An unrestrictive policy providing the widest possible permissions. Allows known privilege escalations.
Control Plane Info
- To configure Control Plane node placement, click the Settings icon in the Control Plane Node Placement table.
  - Name - Enter the name of the Control Plane node.
  - Destination Cloud - The destination cloud is selected by default. To make a different selection, use the drop-down menu.
  VM Placement
  - Datacenter - Select a data center for the Control Plane node.
  - Resource Pool - Select the default resource pool on which the Control Plane node is deployed.
  - VM Folder - Select the virtual machine folder on which the Control Plane node is placed.
  - Datastore - Select the default datastore for the Control Plane node.
  - VM Template - Select a VM template.
    Note: Based on the TCA BOM file you select, relevant templates are available for selection. For example, if you select the Kubernetes version v1.26.8 as the BOM file, the template with version v1.26.8 is available for selection. Therefore, select the right VM template based on your cluster type.
  VM Size
  - Number of Replicas - Number of controller node VMs to be created. The ideal number of replicas for production or staging deployment is 3.
    Note: For a Classy Single Node Cluster, by default the replica count is 1 and you cannot change the count.
  - Number of vCPUs - To ensure that the physical CPU core is used by the same node, provide an even count of vCPUs if the underlying ESXi host is hyper threading-enabled, and if the network function requires NUMA alignment and CPU reservation.
  - Cores Per Socket (Optional) - Enter the number of cores per socket if you require more that 64 cores.
  - Memory - Enter the memory in GB.
  - Disk Size - Enter the disk size in GB
    Note: For Classy Single Node clusters, the minimum disk size must be 70 GB.
- Network
  - Management Network - Select the Management network.
  - MTU - Enter the maximum transmission unit in bytes.
  - DNS - Provide comma-separated primary and secondary DNS servers.
  - IPAM Type - Select DHCP or IP Pool.
    Note: You must provide a DNS server for the IP pool. It is optioanl for DHCP.
  - IP Pool - Select the IP pool that you want to use for the workload cluster.
    Note: The IP addresses that you added to the management cluster's IP pool are available for selection. Therefore, ensure that the management cluster you selected in the Destination Info section has an IP pool. IP pool for IPv6 or ‘IPv6 and IPv4’ clusters is not supported.
  Labels
  - To add the appropriate labels for this profile, click Add Label. These labels are added to the Kubernetes node.
  Advanced Options
  - Clone Mode - Specify the type of clone operation. Linked Clone is supported on templates that have at least one snapshot. Otherwise, the clone mode defaults to Full Clone.
  - Certificate Expiry Days - Specify the number of days for automatic certificate renewal by TKG before its expiry. By default, the certificate expires after 365 days. If you specify a value in this field, the certificate is automatically renewed before the set number of days. For example, if you specify the number of days as 50, the certificate is renewed 50 days before its expiry, which is after 315 days.
    The default value is 90 days. The minimum number of days you can specify is 7 and the maximum is 180.
    Note: You cannot edit the number of days after you deploy the cluster.
- - Kubeadmin Config Template (YAML) - Activate or deactivate the Kubeadmin Config Template YAML.
    Note: The Kubeadmin Config Template (YAML) field is enabled by default for a Classy Single Node Cluster as this cluster is deployed at the RAN edge site and used to instantiate vDU CNF POD. Therefore, you must configure the static CPU manager policy on the kubernetes node. The following YAML code is used to configure the Kubernetes node.
    joinConfiguration: nodeRegistration: kubeletExtraArgs: cpu-manager-policy: static system-reserved: 'cpu=1,memory=1Gi'
    For information on controlling CPU Management Policies on the nodes, see the Kubernetes documentation at https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/.
- Click Apply.
Add-Ons
To deploy an add-on such as NFS Client or Harbor, click Deploy Add-on.
1. From the Select Add-On wizard, select the add-on and click Next.
2. For add-on configuration information, see #GUID-18C37109-38A5-431D-A130-DD45C6E6AE96.
Click Next.
Node Pools
Note: The Node Pools section is not applicable for a Classy Single Node Cluster as it has only one node.
- A node pool is a set of nodes that have similar properties. Pooling is useful when you want to group the VMs based on the number of CPUs, storage capacity, memory capacity, and so on. You can add one node pool to a Management cluster and multiple node pools to a Workload cluster, with different groups of VMs. To add a Worker node pool, click Add Node Pool.
  - Name - Enter the name of the node pool.
  - Destination Cloud - The destination cloud is selected by default. To make a different selection, use the drop-down menu.
  VM Placement
  - Datacenter - Select a data center for the node pool.
  - Resource Pool - Select the default resource pool on which the node pool is deployed.
  - VM Folder - Select the virtual machine folder on which the node pool is placed.
  - Datastore - Select the default datastore for the node pool.
  - VM Template - Select a VM template.
  - Enable Autoscaler - This field is available only if autoscaler is enabled for the associated cluster. At the node level, you can activate or deactivate autoscaler based on your requirement.
    The following field values are automatically populated from the cluster.
    - Min Size (Optional) - Sets a minimum limit to the number of worker nodes that autoscaler should scale down. Edit the value, as required.
    - Max Size (Optional) - Sets a maximum limit to the number of worker nodes that autoscaler should scale up. Edit the value, as required.
      
      Note:
      
      Using autoscaler on a cluster does not automatically change its node group size. Therefore, changing the maximum or minimum size does not scale up or scale down the cluster size. When you are editing the autoscaler-configured maximum size of the node pool, ensure that the maximum size limit of the node pool is lesser than or equal to the current replica count.
      
      When a scale-down is in progress, it is not recommended to edit the maximum size of the cluster.
      
      You can view the scale-up and scale-down events under the Events tab of the Telco Cloud Automation portal.
  VM Size
  - Number of Replicas - Number of node pool VMs to be created. The ideal number of replicas for production or staging deployment is 3.
    
    Note:
    The Number of Replicas field is unavailable if autoscaler is enabled for the node.
- - Number of vCPUs - To ensure that the physical CPU core is used by the same node, provide an even count of vCPUs if the underlying ESXi host is hyper threading-enabled, and if the network function requires NUMA alignment and CPU reservation.
  - Cores Per Socket (Optional) - Enter the number of cores per socket if you require more that 64 cores.
  - Memory - Enter the memory in GB.
  - Disk Size - Enter the disk size in GB.
  Network
  - Management Network - Select the Management network.
  - MTU - Enter the maximum transmission unit in bytes.
  - DNS - Provide comma-separated primary and secondary DNS servers.
  - IPAM Type - The IPAM type defaults to DHCP or IP Pool based on your selection in the Control Plane Info section.
  - IP Pool - Displays the IP pool you selected in the Control Plane Info section. However, you can select a different IP pool.
    Note: When you change the IP pool, it causes a rolling update to the node pool.
  - ADD NETWORK DEVICE - Click this button to add a dedicated NFS interface to the node pool, select the interface, and then enter the following:
    - Interface Name - Enter the interface name as tkg-nfs to reach the NFS server.
  Labels
  - To add the appropriate labels for this profile, click Add Label. These labels are added to the Kubernetes node.
  Advanced Options
  - Clone Mode - Specify the type of clone operation. Linked Clone is supported on templates that have at least one snapshot. Otherwise, the clone mode defaults to Full Clone.
  - To enable Machine Health Check, select Configure Machine Health Check
  - Kubeadmin Config Template (YAML) - Enable or deactivate the Kubeadmin Config Template YAML.
  - Node Pool Upgrade Strategy (YAML) - Enable or deactivate the node pool upgrade strategy YAML.
- Click Apply.
Ready to Deploy - Click Deploy.

Results

The cluster details page displays the status of the overall deployment and the deployment status of each component.