Clusters on vSphere

The following sections describe how to configure Tanzu Kubernetes Grid (TKG) workload clusters to use features that are specific to vSphere with a standalone management cluster. The features are not entirely configurable in the cluster’s flat configuration file or Kubernetes-style object spec.

For information about how to configure workload clusters on vSphere using configuration files and object specs, see:

Deploy a Cluster with a Custom OVA Image

If you are using a single custom OVA image for each version of Kubernetes to deploy clusters on one operating system, you import the OVA into vSphere and then specify it for tanzu cluster create with the --tkr option.

However, if you are using multiple custom OVA images for the same Kubernetes version, then the --tkr value is ambiguous. This happens when the OVAs for the same Kubernetes version:

Have different operating systems, for example, created by make build-node-ova-vsphere-ubuntu-1804, make build-node-ova-vsphere-photon-3, and make build-node-ova-vsphere-rhel-7.
Have the same name but reside in different vCenter folders.

To resolve this ambiguity, set the VSPHERE_TEMPLATE option to the desired OVA image before you run tanzu cluster create:

If the OVA template image name is unique, set VSPHERE_TEMPLATE to just the image name.
If multiple images share the same name, set VSPHERE_TEMPLATE to the full inventory path of the image in vCenter. This path follows the form /MY-DC/vm/MY-FOLDER-PATH/MY-IMAGE, where:
- MY-DC is the datacenter containing the OVA template image.
- MY-FOLDER-PATH is the path to the image from the datacenter, as shown in the vCenter VMs and Templates view.
- MY-IMAGE is the image name.
For example:
```
VSPHERE_TEMPLATE: "/TKG_DC/vm/TKG_IMAGES/ubuntu-2004-kube-v1.29.9-vmware.1"
```
You can determine the image’s full vCenter inventory path manually or use the govc CLI:
1. Install govc. For installation instructions, see the govmomi repository on GitHub.
2. Set environment variables for govc to access your vCenter:
  - export GOVC_USERNAME=VCENTER-USERNAME
  - export GOVC_PASSWORD=VCENTER-PASSWORD
  - export GOVC_URL=VCENTER-URL
  - export GOVC_INSECURE=1
3. Run govc find / -type m and find the image name in the output, which lists objects by their complete inventory paths.

For more information about custom OVA images, see Build Machine Images.

Deploy a Cluster with Region and Zone Tags for CSI

You can specify a region and zone for your workload cluster, to integrate it with region and zone tags configured for vSphere CSI (Cloud Storage Interface). For clusters that span multiple zones, this lets worker nodes find and use shared storage, even if they run in zones that have no storage pods, for example in a telecommunications Radio Access Network (RAN).

To deploy a workload cluster with region and zone tags that enable shared storage with vSphere CSI:

Create tags on vCenter Server:
1. Create tag categories on vCenter Server following Create and Edit a Tag Category. For example, k8s-region and k8s-zone.
2. Follow Create and Edit a vSphere Tag to create tags within the region and zone categories in the datacenter, as shown in this table:
  
  Category Tags
  
  k8s-zone zone-a
  zone-b
  zone-c
  
  k8s-region region-1

Category	Tags
`k8s-zone`	`zone-a` `zone-b` `zone-c`
`k8s-region`	`region-1`

Create corresponding tags to the clusters and to the datacenter following Assign or Remove a vSphere Tag as indicated in the table.


vSphere Objects	Tags
`datacenter`	`region-1`
`cluster1`	`zone-a`
`cluster2`	`zone-b`
`cluster3`	`zone-c`

To enable custom regions and zones for a vSphere workload cluster’s CSI driver, set the variables VSPHERE_REGION and VSPHERE_ZONE in the cluster configuration file to the tags above. For example:
```
VSPHERE_REGION: region-1
VSPHERE_ZONE: zone-a
```
When the Tanzu CLI creates a workload cluster with these variables set, it labels each cluster node with the topology keys failure-domain.beta.kubernetes.io/zone and failure-domain.beta.kubernetes.io/region.
Run tanzu cluster create to create the workload cluster, as described in Create a Plan-Based or a TKC Cluster.
After you create the cluster, and with the kubectl context set to the cluster, you can check the region and zone labels by doing one of the following:
- Run kubectl get nodes -L failure-domain.beta.kubernetes.io/zone -L failure-domain.beta.kubernetes.io/region and confirm that the output lists the cluster nodes.
- Run kubectl get csinodes -o jsonpath='{range .items\[\*\]}{.metadata.name} {.spec}{"\\n"}{end}' and confirm that the region and zone are enabled on vsphere-csi.

For more information on configuring vSphere CSI, see vSphere CSI Driver - Deployment with Topology.

Clusters on Different vSphere Accounts

Tanzu Kubernetes Grid can run workload clusters on multiple target platform accounts, for example to split cloud usage among different teams or apply different security profiles to production, staging, and development workloads.

To deploy workload clusters to an alternative vSphere account, different from the one used to deploy their management cluster, do the following:

Set the context of kubectl to your management cluster:
```
kubectl config use-context MY-MGMT-CLUSTER@MY-MGMT-CLUSTER
```
Where MY-MGMT-CLUSTER is the name of your management cluster.
Create a secret.yaml file with the following contents:
```
apiVersion: v1
kind: Secret
metadata:
  name: SECRET-NAME
  namespace: CAPV-MANAGER-NAMESPACE
stringData:
  username: VSPHERE-USERNAME
  password: VSPHERE-PASSWORD
```
Where:
- SECRET-NAME is a name that you give to the client secret.
- CAPV-MANAGER-NAMESPACE is namespace where the capv-manager pod is running. Default: capv-system.
- VSPHERE-USERNAME and VSPHERE-PASSWORD are login credentials that enable access to the alternative vSphere account.
Use the file to create the Secret object:
```
kubectl apply -f secret.yaml
```

Create an identity.yaml file with the following contents:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereClusterIdentity
metadata:
  name: EXAMPLE-IDENTITY
spec:
  secretName: SECRET-NAME
  allowedNamespaces:
    selector:
      matchLabels: {}

Where:

EXAMPLE-IDENTITY is the name to use for the VSphereClusterIdentity object.
SECRET-NAME is the name you gave to the client secret, above.

Use the file to create the VsphereClusterIdentity object:
```
kubectl apply -f identity.yaml
```

The management cluster can now deploy workload clusters to the alternative account.

To deploy a workload cluster to the account:

Create a cluster manifest by running tanzu cluster create --dry-run.

Edit the VSphereCluster definition in the manifest to set the spec.identityRef.name value for its VSphereClusterIdentity object to the EXAMPLE-IDENTITY you created above:

...
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
metadata:
 name: new-workload-cluster
spec:
 identityRef:
   kind: VSphereClusterIdentity
   name: EXAMPLE-IDENTITY
...

Run kubectl apply -f my-cluster-manifest.yaml to create the workload cluster.

After you create the workload cluster, log in to vSphere with the alternative account credentials, and you should see it running.

For more information, see Identity Management in the Cluster API Provider vSphere repository.

Deploy a Cluster that Uses a Datastore Cluster

Note
This feature does not work as expected; if you tag multiple datastores in a datastore cluster, as the basis for a workload cluster’s storage policy, the workload cluster uses only one of the datastores.

To enable a workload cluster to use a datastore cluster instead of a single datastore set up a storage policy that targets all datastores within the datastore cluster as follows:

Create a tag and associate it with the relevant datastores:
1. Follow the procedures in vSphere Tags to create tag categories on vCenter Server. Ensure the category has Datastore as an associable object type.
2. Follow the other procedures in vSphere Tags to create a tag within the category created in the previous step and to associate the new tag with all of the datastores belonging to the datastore cluster.
Follow Create a VM Storage Policy for Tag-Based Placement to create a tag-based storage policy.
In the cluster configuration file:
- Set VSPHERE_STORAGE_POLICY_ID to the name of the storage policy created in the previous step.
- Ensure that VSPHERE_DATASTORE is not set. A VSPHERE_DATASTORE setting would override the storage policy setting.

Deploy a Multi-OS Workload Cluster

To deploy a multi-OS workload cluster that has both Windows- and Linux-based worker nodes, you create a custom Windows machine image, deploy a Windows workload cluster, and then add a Linux MachineDeployment to convert the Windows-only workload cluster into a multi-OS cluster.

Multi-OS clusters can host both Windows and Linux workloads, while running Linux-based TKG components on worker nodes, where they belong.

Create a Windows machine image by following all the procedures in Windows Custom Machine Images.

Create a YAML file, for example, win-osimage.yaml to add an OSImage in the management cluster that points to the template when you created a Windows machine image.

You can use the following sample YAML file. Change the spec.image.ref.template value to the location of the Windows template you created. The path is specific to your vSphere environment.

apiVersion: run.tanzu.vmware.com/v1alpha3
kind: OSImage
metadata:
 name: v1.25.7---vmware.2-tkg.1-windows
spec:
 image:
   ref:
     template: /dc0/vm/windows-2019-kube-v1.25.7+vmware.2-tkg.1
     version: v1.25.7+vmware.2-tkg.1-windows
   type: ova
 kubernetesVersion: v1.25.7+vmware.2
 os:
   arch: amd64
   name: windows
   type: windows
   version: "2019"

Run kubectl apply -f win-osimage.yaml to add the OSImage.
Add the TKR version to spec.osImages so that the TKR resolution and webhook validation occurs successfully. Use the following command to add the TKR version to spec.osImages.
```
$ kubectl edit tkr v1.25.7---vmware.2-tkg.1
apiVersion: run.tanzu.vmware.com/v1alpha3
kind: TanzuKubernetesRelease
metadata:
  name: v1.25.7---vmware.2-tkg.1
spec:
  bootstrapPackages:
  # Keep the other packages listed here.
  - name: tkg-windows.tanzu.vmware.com.0.29.0+vmware.1
  osImages:
  # Keep the other images listed here.
  - name: v1.25.7---vmware.2-tkg.1-windows
```
On the TKR, enable the tkg-windows package by adding a new item in the spec.bootstrapPackages. The package can be found in the official repository with tanzu package available list tkg-windows.tanzu.vmware.com. The following is an example of a working TKR:
Create a class-based cluster object spec by running the following command:
```
tanzu cluster create my-cluster --file my-cluster-config.yaml --dry-run > my-cluster-spec.yaml
```
Where: * WINDOWS-CLUSTER is the name of the Windows cluster. * CLUSTER-CONFIG is the name of the configuration file.

Add the new tkg-worker machine deployment class to the cluster object in my-cluster-spec.yaml. Ensure that the annotation is correct so that TKG can search for the OSImage object.

You can add the new tkg-worker specification to spec.workers.machineDeployments similar to the following example:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: WINDOWS-CLUSTER
spec:
  workers:
    machineDeployments:
    - class: tkg-worker
        metadata:
        annotations:
            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=photon
        name: md-0-l
        replicas: 1
    - class: tkg-worker-windows
        metadata:
        annotations:
            run.tanzu.vmware.com/resolve-os-image: image-type=ova,os-name=windows
        name: md-0
        replicas: 1

Deploy the multi-OS cluster by running the following command:

tanzu cluster create my-cluster -f my-cluster-spec.yaml

The nodes are labeled and tainted with the OS information in Well-Known Labels, Annotations and Taints.

Note
Backup and restore of multi-OS workload clusters is not supported.

Windows Antrea CNI reliability

HNS Network is not persistent on Windows. After the Windows node reboots, the HNS Network created by antrea-agent is removed and the Open vSwitch Extension is disabled by default. Therefore, after the Windows node reboots, remove the stale OVS bridge and ports. You can use the help script Clean-AntreaNetwork.ps1 to clean the OVS bridge.

Use one of the following methods to install the help script:

Manual installation
Automated installation

Manual installation

To install the help script manually on each isolated workload node:

Download the Clean-AntreaNetwork.ps1 installation script from this code sample. The downloaded installation script snippet.ps1 installs Clean-AntreaNetwork.ps1.
SSH into the node and run the following command.
```
powershell snippet.ps1
```

Automated installation

To create a custom ClusterClass that automatically installs the help script on every new workload node:

Follow the steps in Create a ClusterClass to create your custom ClusterClass object.
Apply the patch from this code sample using YTT, and apply the specification on your management cluster:
```
ytt -f custom-cluster-class.yaml -f snippet.yaml | kubectl apply -f -
```

Notes on Distributed Port Group Security

If you deploy Windows or MultiOS clusters, you must make sure that the Distributed Port Groups have certain security policies set to Reject. For example, if promiscuous mode is set to Accept, nodes can alternate between Ready and NotReady states.

In the vSphere Client, select the network that you use for the Windows nodes, go to virtual Distributed Switch > Distributed Portgroup Security Policy settings and set these policies to Reject:

Promiscuous mode
MAC address changes
Forged transmits