Clusters on Azure

This topic describes ways of configuring Tanzu Kubernetes Grid (TKG) workload clusters to use features that are specific to Microsoft Azure, and that are not entirely configurable in the cluster’s flat configuration file or Kubernetes-style object spec.

For information about how configure workload clusters on Azure using configuration files and object specs, see Azure Cluster Configuration Files.

Important
Tanzu Kubernetes Grid v2.4.x is the last version of TKG that supports the creation of TKG workload clusters on Azure. The ability to create TKG workload clusters on Azure will be removed in the Tanzu Kubernetes Grid v2.5 release.

Starting from now, VMware recommends that you use Tanzu Mission Control to create native Azure AKS clusters instead of creating new TKG workload clusters on Azure. For information about how to create native Azure AKS clusters with Tanzu Mission Control, see Managing the Lifecycle of Azure AKS Clusters in the Tanzu Mission Control documentation.

For more information, see Deprecation of TKG Management and Workload Clusters on AWS and Azure in the VMware Tanzu Kubernetes Grid v2.4 Release Notes.

Azure Private Clusters

By default, Azure management and workload clusters are public. But you can also configure them to be private, which means their API server uses an Azure internal load balancer (ILB) and is therefore only accessible from within the cluster’s own VNet or peered VNets.

To make an Azure cluster private, include the following in its configuration file:

Set AZURE_ENABLE_PRIVATE_CLUSTER to true.
(Optional) Set AZURE_FRONTEND_PRIVATE_IP to an internal address for the cluster’s load balancer.
- This address must be within the range of its control plane subnet and must not be used by another component.
- If not set, this address defaults to 10.0.0.100.
Set AZURE_VNET_NAME, AZURE_VNET_CIDR, AZURE_CONTROL_PLANE_SUBNET_NAME, AZURE_CONTROL_PLANE_SUBNET_CIDR, AZURE_NODE_SUBNET_NAME, and AZURE_NODE_SUBNET_CIDR to the VNet and subnets that you use for other Azure private clusters.
- Because Azure private clusters are not accessible outside their VNet, the management cluster and any workload and shared services clusters that it manages must be in the same private VNet.
- The bootstrap machine, where you run the Tanzu CLI to create and use the private clusters, must also be in the same private VNet.
(Optional) Set AZURE_ENABLE_CONTROL_PLANE_OUTBOUND_LB and AZURE_ENABLE_NODE_OUTBOUND_LB to true if you require the control plane and worker nodes to be able to access the internet via an Azure internet connection.
- By default, Azure Private Clusters create a Public IP address for each Kubernetes service of type Load Balancer. To configure the load balancer service to instead use a private IP address, add the following annotation to your deployment manifest:
```
---
metadata:
annotations:
  service.beta.kubernetes.io/azure-load-balancer-internal: "true"
```

For more information, see API Server Endpoint in the Cluster API Provider Azure documentation.

Clusters on Different Azure Accounts

Tanzu Kubernetes Grid can run workload clusters on multiple target platform accounts, for example to split cloud usage among different teams or apply different security profiles to production, staging, and development workloads.

To deploy workload clusters to an alternative Azure Service Principal account, different from the one used to deploy their management cluster, do the following:

Create the alternative Azure account. You use the details of this account to create an AzureClusterIdentity in a later step. For information about creating an Azure Service Principal Account, see How to: Use the portal to create an Azure AD application and service principal that can access resources in the Azure documentation.
Set the context of kubectl to your management cluster:
```
kubectl config use-context MY-MGMT-CLUSTER@MY-MGMT-CLUSTER
```
Where MY-MGMT-CLUSTER is the name of your management cluster.
Create a secret.yaml file with the following contents:
```
apiVersion: v1
kind: Secret
metadata:
  name: SECRET-NAME
type: Opaque
data:
  clientSecret: CLIENT-SECRET
```
Where:
- SECRET-NAME is the secret name for the client password.
- CLIENT-SECRET is the client secret of your Service Principal Identity. The client secret must be base64-encoded.
Use the file to create the Secret object:
```
kubectl apply -f secret.yaml
```
Create an identity.yaml file with the following contents:
```
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureClusterIdentity
metadata:
  name: EXAMPLE-IDENTITY
  namespace: EXAMPLE-NAMESPACE
spec:
  type: ManualServicePrincipal
  tenantID: AZURE-TENANT-ID
  clientID: CLIENT-ID
  clientSecret: {"name":"SECRET-NAME","namespace":"default"}
  allowedNamespaces:
    list:
    - CLUSTER-NAMESPACE-1
    - CLUSTER-NAMESPACE-1
```
Where:
- EXAMPLE-IDENTITY is the name to use for the AzureClusterIdentity.
- EXAMPLE-NAMESPACE is the namespace for your AzureClusterIdentity.
- AZURE-TENANT-ID is your Azure tenant ID.
- CLIENT-ID is the client ID (also known as an AppID) for the Azure AD application.
- SECRET-NAME is the secret name for the client password.
- CLUSTER-NAMESPACE-1 and CLUSTER-NAMESPACE-2 are Kubernetes namespaces that the clusters are allowed to use identities from. These namespaces can be selected using an array of namespaces.
Use the file to create the AzureClusterIdentity object:
```
kubectl apply -f identity.yaml
```

The management cluster can now deploy workload clusters to the alternative account by using the new AzureClusterIdentity object.

To create workload clusters that use the alternative Azure account, include the following variables in the cluster configuration file:

AZURE_IDENTITY_NAME: EXAMPLE-IDENTITY
AZURE_IDENTITY_NAMESPACE: EXAMPLE-NAMESPACE

Where:

EXAMPLE-IDENTITY is the name to use for the AzureClusterIdentity.
EXAMPLE-NAMESPACE is the namespace for your AzureClusterIdentity.

After you create the workload cluster, sign in to the Azure Portal using the alternative account, and you should see the cluster running.

GPU-Enabled Clusters

There are two ways of deploying NVIDIA GPU-enabled workload clusters on Azure:

Create a workload cluster with GPU workers, and manually install a GPU policy and operator onto the cluster
(Technical Preview) Configure the management cluster with a ClusterResourceSet (CRS) to create one or more GPU-enabled workload clusters automatically

The subsections below explain these two approaches, and how to test the GPU-enabled clusters.

Deploy and GPU-Enable a Single Cluster

To deploy a workload cluster and configure it manually to take advantage of NVIDIA GPU VMs available on Azure:

In the configuration file for the cluster, set AZURE_NODE_MACHINE_TYPE, for worker nodes, to a GPU-compatible VM type, such as Standard_NC4as_T4_v3.
- For GPU VM types on Azure, see GPU optimized virtual machine sizes in the Azure documentation.
Deploy the cluster with the cluster configuration file:
```
tanzu cluster create MY-GPU-CLUSTER -f MY-GPU-CONFIG
```
Where MY-GPU-CLUSTER is a name that you give to the cluster.
Install a GPU cluster policy and GPU operator on the cluster:
1. Set the kubectl context to the cluster, if it is not already the current context.
2. Download the required NVIDIA GPU resources from the Cluster API Provider Azure repository, and save them to your current directory:
  - GPU cluster policy resource definition
  - GPU operator components
3. Apply the cluster policy:
```
kubectl apply clusterpolicy-crd.yaml
```
4. Apply the GPU operator:
```
kubectl apply gpu-operator-components.yaml
```
Run kubectl get pods -A. You should see listings for gpu-operator- pods in the default namespace, and nvidia- pods in the gpu-operator-resources namespace.

Configure the Management Cluster for GPU Cluster Deploys (Technical Preview)

Note
This feature is in the unsupported Technical Preview state; see TKG Feature States.

You can configure the management cluster to create GPU-enabled workload clusters automatically whenever you add gpu: nvidia to the labels in the cluster manifest. To do this, you install a ClusterResourceSet (CRS) and activate it as follows:

To configure the management cluster to create GPU clusters:
1. Search the Broadcom Communities for GPU CRS for TKG and download the gpu-crs.yaml file for Tanzu Kubernetes Grid v1.4.
2. Set the context of kubectl to the context of your management cluster:
```
kubectl config use-context my-management-cluster-admin@my-management-cluster
```
3. Apply the CRS file to the management cluster, using the --server-side option to handle the the large size of ConfigMap data:
```
kubectl apply -f gpu-crs.yaml --server-side
```
To create a GPU workload cluster:
1. In the configuration file for the cluster, set AZURE_NODE_MACHINE_TYPE, for worker nodes, to a GPU-compatible VM type, such as Standard_NC4as_T4_v3.
  - For GPU VM types on Azure, see GPU optimized virtual machine sizes in the Azure documentation.
2. Use tanzu cluster create with the --dry-run option to generate a deployment manifest from the cluster configuration file:
```
tanzu cluster create MY-GPU-CLUSTER -f MY-GPU-CONFIG --dry-run > MY-GPU-CLUSTER-MANIFEST
```
  Where MY-GPU-CLUSTER is a name that you give to the cluster.
3. Create the cluster by passing it to kubectl apply:
```
kubectl apply -f MY-GPU-CLUSTER-MANIFEST
```
4. Run kubectl get pods -A. You should see listings for gpu-operator- pods in the default namespace, and nvidia- pods in the gpu-operator-resources namespace.

Test GPU-Enabled Clusters

To test a GPU-enabled cluster:

Test GPU processing by running the CUDA VectorAdd vector addition test in the NVIDIA documentation.
Test the GPU operator:
1. Scale up the workload cluster’s worker node count:
```
tanzu cluster scale MY-GPU-CLUSTER -w 2
```
2. Run kubectl get pods -A again. You should see additional gpu-operator- and nvidia- pods listed for the added nodes.