Create a VMware Tanzu Kubernetes Grid Cluster

Starting with VMware Cloud Director 10.3.1, you can create Tanzu Kubernetes Grid clusters by using the Kubernetes Container Clusters plug-in.

Prerequisites

Verify that your service provider published the Kubernetes Container Clusters plug-in to your organization. Kubernetes Container Clusters is the VMware Cloud Director Container Service Extension plug-in for VMware Cloud Director. You can find the plug-in on the top navigation bar under More > Kubernetes Container Clusters.
Verify that your service provider completed the VMware Cloud Director Container Service Extension 4.1.x server setup, that assigns the Kubernetes Clusters right bundle automatically.
Verify that your organization administrator has assigned the Kubernetes Cluster Author role to you. This role allows you to perform cluster management functions, such as creating, upgrading and deleting clusters.

Procedure

Log in to VMware Cloud Director, and from the top navigation bar, select More > Kubernetes Container Clusters > New.
Select the VMware Tanzu Kubernetes Grid runtime option, and click Next.
Enter a name and select a Kubernetes Template from the list.
Enter a name, select a Kubernetes template from the list, and click Next.
In the VDC & Network window, select the organization VDC to which you want to deploy a Tanzu Kubernetes Grid cluster, select a VDC network for the cluster, and click Next.
In Control Plane window, select the number of nodes, disk size, and optionally select a sizing policy, a placement policy, a storage profile, and click Next.

Note: The number of nodes input allows for clusters to have multiple control plane nodes.
In Worker Pools window, enter a name, number of nodes, disk size, optionally select a sizing policy, a placement policy, a storage profile, and click Next. For more information on worker node pools, see Working with Worker Node Pools.
Note:
- To configure vGPU settings, select the Activate GPU toggle and select a vGPU policy. For more information on vGPU configuration, see Configuring vGPU on Tanzu Kubernetes Grid Clusters to allow Artificial Intelligence and Machine Learning Workloads.
- When you create clusters with vGPU functionality, it is recommended to increase the disk size to between 40-50 GB as vGPU libraries occupy a large amount storage space.
- You can select a sizing policy in this workflow or separately in VMware Cloud Director Container Service Extension server configuration. When you select a sizing policy in conjunction with a vGPU Policy that contains VM Sizing, the sizing information in the vGPU policy takes precedence over the selected sizing policy. It is recommended to include sizing in your vGPU policy, and only specify a vGPU policy when you leave the Sizing Policy field empty.
(Optional) To create additional worker node pools, click Add New Worker Node Pool, and configure worker node pool settings.
Click Next.
In the Kubernetes Storage window, activate the Create Default Storage Class toggle, select a storage profile, and enter a storage class name.
(Optional) Configure Reclaim Policy and Filesystem settings.

In the Kubernetes Network window, specify a range of IP addresses for Kubernetes services and a range for Kubernetes pods, and click Next.

Classless Inter-Domain Routing (CIDR) is a method for IP routing and IP address allocation.

Option	Description
Pods CIDR	Specifies a range of IP addresses to use for Kubernetes pods. The default value is 100.96.0.0/11. The pods subnet size must be equal to or larger than /24. You can enter one IP range.
Services CIDR	Specifies a range of IP addresses to use for Kubernetes services. The default value is 100.64.0.0/13. You can enter one IP range.
Control Plane IP	You can specify your own IP address as the control plane endpoint. You can use an external IP from the gateway or an internal IP from a subnet that is different from the routed IP range. If you do not specify an IP as the control plane endpoint, VMware Cloud Director Container Service Extension server selects one of the unused IP addresses from the associated tenant gateway.
Virtual IP Subnet	You can specify a subnet CIDR from which one unused IP address is assigned as Control Plane Endpoint. The subnet must represent a set of addresses that are present in the gateway. The same CIDR is also propagated as the subnet CIDR for the ingress services on the cluster.

You can use the following IP addresses as the Control Plane IP:


IP Type	Description
External IP addresses	Any of the IP addresses in the external gateway that connect to the OVDC network.
Internal IP addresses	Any private IP address that is internal to the tenant, with the following exceptions: IP addresses in the LB network service definition, usually 192.168.255.1/24. IP addresses that are in the organization VDC IP subnet. IP address that is in use.
Note: When an IP address does not have the above characteristics, the following behavior occurs: If the IP address is already in use, and VMware Cloud Director detects the usage, an error appears in the logs during LB creation. If the IP address is already in use, and VMware Cloud Director does not detect the usage, the behavior is undefined.

In the Debug Settings window, activate or deactivate the Auto Repair on Errors toggle, and the Node Health Check toggle.


Toggle	Description
Auto Repair on Errors	This toggle applies to failures that occur during the cluster creation process. If you activate this toggle, the VMware Cloud Director Container Service Extension server attempts to recreate the clusters that are in an error state during the cluster creation process. If you deactivate this toggle, the VMware Cloud Director Container Service Extension server leaves the cluster in an error state for manual troubleshooting. Note: This toggle is deactivated by default in VMware Cloud Director Container Service Extension 4.1.x.
Node Health Check	In contrast to Auto Repair on Errors when the remediation process is only applicable during cluster creation, the remediation process in Node Health Check begins after the cluster reaches an available state. If any of the nodes become unhealthy during the life time of the cluster, Node Health Check detects and remediates them. For more information, see Node Health Check Configuration. Note: This toggle is deactivated by default in VMware Cloud Director Container Service Extension 4.1.x.

Enter an SSH public key.
Click Next.
Review the cluster settings, and click Finish.

Review Cluster Status

When you create a Tanzu Kubernetes Grid cluster in VMware Cloud Director Container Service Extension, the following status appear:

Table 1. Cluster Status
Cluster Status	Description
Pending	The cluster request has not yet been processed by the VMware Cloud Director Container Service Extension server.
Creating	The cluster is currently being processed by the VMware Cloud Director Container Service Extension server.
Available	The cluster is ready for users to operate on and host workloads.
Deleting	The cluster is being deleted
Error	The cluster is in an error state. Note: If you want to manually debug a cluster, deactivate Auto Repair on Errors mode.