When you deploy Tanzu Kubernetes (workload) clusters to Microsoft Azure, you must specify options in the cluster configuration file to connect to your Azure account and identify the resources that the cluster will use.

For the basic process of deploying workload clusters, see Deploy a Workload Cluster: Basic Process.

For the full list of options that you must specify when deploying workload clusters to Azure, see the Tanzu CLI Configuration File Variable Reference.

Tanzu Kubernetes Cluster Template

The template below includes all of the options that are relevant to deploying Tanzu Kubernetes clusters on Azure. You can copy this template and update it to deploy Tanzu Kubernetes clusters to Azure.

Mandatory options are uncommented. Optional settings are commented out. Default values are included where applicable.

The way in which you configure the variables for Tanzu Kubernetes clusters that are specific to Azure is identical for both management clusters and workload clusters. For information about how to configure the variables, see Create a Management Cluster Configuration File and Management Cluster Configuration for Azure.

#! ---------------------------------------------------------------------
#! Cluster creation basic configuration
#! ---------------------------------------------------------------------

# CLUSTER_NAME:
CLUSTER_PLAN: dev
NAMESPACE: default
CNI: antrea
IDENTITY_MANAGEMENT_TYPE: oidc

#! ---------------------------------------------------------------------
#! Node configuration
#! ---------------------------------------------------------------------

# SIZE:
# CONTROLPLANE_SIZE:
# WORKER_SIZE:
# AZURE_CONTROL_PLANE_MACHINE_TYPE: "Standard_D2s_v3"
# AZURE_NODE_MACHINE_TYPE: "Standard_D2s_v3"
# CONTROL_PLANE_MACHINE_COUNT: 1
# WORKER_MACHINE_COUNT: 1
# WORKER_MACHINE_COUNT_0:
# WORKER_MACHINE_COUNT_1:
# WORKER_MACHINE_COUNT_2:
# AZURE_CONTROL_PLANE_DATA_DISK_SIZE_GIB : ""
# AZURE_CONTROL_PLANE_OS_DISK_SIZE_GIB : ""
# AZURE_CONTROL_PLANE_MACHINE_TYPE : ""
# AZURE_CONTROL_PLANE_OS_DISK_STORAGE_ACCOUNT_TYPE : ""
# AZURE_ENABLE_NODE_DATA_DISK : ""
# AZURE_NODE_DATA_DISK_SIZE_GIB : ""
# AZURE_NODE_OS_DISK_SIZE_GIB : ""
# AZURE_NODE_MACHINE_TYPE : ""
# AZURE_NODE_OS_DISK_STORAGE_ACCOUNT_TYPE : ""

#! ---------------------------------------------------------------------
#! Azure Configuration
#! ---------------------------------------------------------------------

AZURE_ENVIRONMENT: "AzurePublicCloud"
AZURE_TENANT_ID:
AZURE_SUBSCRIPTION_ID:
AZURE_CLIENT_ID:
AZURE_CLIENT_SECRET:
AZURE_LOCATION:
AZURE_SSH_PUBLIC_KEY_B64:
# AZURE_CONTROL_PLANE_SUBNET_NAME: ""
# AZURE_CONTROL_PLANE_SUBNET_CIDR: ""
# AZURE_NODE_SUBNET_NAME: ""
# AZURE_NODE_SUBNET_CIDR: ""
# AZURE_RESOURCE_GROUP: ""
# AZURE_VNET_RESOURCE_GROUP: ""
# AZURE_VNET_NAME: ""
# AZURE_VNET_CIDR: ""
# AZURE_CUSTOM_TAGS : ""
# AZURE_ENABLE_PRIVATE_CLUSTER : ""
# AZURE_FRONTEND_PRIVATE_IP : ""
# AZURE_ENABLE_ACCELERATED_NETWORKING : ""

#! ---------------------------------------------------------------------
#! Machine Health Check configuration
#! ---------------------------------------------------------------------

ENABLE_MHC:
ENABLE_MHC_CONTROL_PLANE: true
ENABLE_MHC_WORKER_NODE: true
MHC_UNKNOWN_STATUS_TIMEOUT: 5m
MHC_FALSE_STATUS_TIMEOUT: 12m

#! ---------------------------------------------------------------------
#! Common configuration
#! ---------------------------------------------------------------------

# TKG_CUSTOM_IMAGE_REPOSITORY: ""
# TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE: ""

# TKG_HTTP_PROXY: ""
# TKG_HTTPS_PROXY: ""
# TKG_NO_PROXY: ""

ENABLE_AUDIT_LOGGING: true
ENABLE_DEFAULT_STORAGE_CLASS: true

CLUSTER_CIDR: 100.96.0.0/11
SERVICE_CIDR: 100.64.0.0/13

# OS_NAME: ""
# OS_VERSION: ""
# OS_ARCH: ""

#! ---------------------------------------------------------------------
#! Autoscaler configuration
#! ---------------------------------------------------------------------

ENABLE_AUTOSCALER: false
# AUTOSCALER_MAX_NODES_TOTAL: "0"
# AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD: "10m"
# AUTOSCALER_SCALE_DOWN_DELAY_AFTER_DELETE: "10s"
# AUTOSCALER_SCALE_DOWN_DELAY_AFTER_FAILURE: "3m"
# AUTOSCALER_SCALE_DOWN_UNNEEDED_TIME: "10m"
# AUTOSCALER_MAX_NODE_PROVISION_TIME: "15m"
# AUTOSCALER_MIN_SIZE_0:
# AUTOSCALER_MAX_SIZE_0:
# AUTOSCALER_MIN_SIZE_1:
# AUTOSCALER_MAX_SIZE_1:
# AUTOSCALER_MIN_SIZE_2:
# AUTOSCALER_MAX_SIZE_2:

#! ---------------------------------------------------------------------
#! Antrea CNI configuration
#! ---------------------------------------------------------------------

# ANTREA_NO_SNAT: false
# ANTREA_TRAFFIC_ENCAP_MODE: "encap"
# ANTREA_PROXY: false
# ANTREA_POLICY: true
# ANTREA_TRACEFLOW: false

Create a Network Security Group for Each Cluster

Each workload cluster on Azure requires a Network Security Group (NSG) for its worker nodes named CLUSTER-NAME-node-nsg, where CLUSTER-NAME is the name of the cluster.

For more information, see Network Security Groups on Azure.

Azure Private Clusters

By default, Azure management and workload clusters are public. But you can also configure them to be private, which means their API server uses an Azure internal load balancer (ILB) and is therefore only accessible from within the cluster’s own VNET or peered VNETs.

To make an Azure cluster private, include the following in its configuration file:

  • Set AZURE_ENABLE_PRIVATE_CLUSTER to true.

  • (Optional) Set AZURE_FRONTEND_PRIVATE_IP to an internal address for the cluster's load balancer.

    • This address must be within the range of its control plane subnet and must not be used by another component.
    • If not set, this address defaults to 10.0.0.100.
  • Set AZURE_VNET_NAME, AZURE_VNET_CIDR, AZURE_CONTROL_PLANE_SUBNET_NAME, AZURE_CONTROL_PLANE_SUBNET_CIDR, AZURE_NODE_SUBNET_NAME, and AZURE_NODE_SUBNET_CIDR to the VNET and subnets that you use for other Azure private clusters.

    • Because Azure private clusters are not accessible outside their VNET, the management cluster and any workload and shared services clusters that it manages must be in the same private VNET.
    • The bootstrap machine, where you run the Tanzu CLI to create and use the private clusters, must also be in the same private VNET.

For more information, see API Server Endpoint in the Cluster API Provider Azure documentation.

Clusters on Different Azure Accounts

Tanzu Kubernetes Grid can run workload clusters on multiple infrastructure provider accounts, for example to split cloud usage among different teams or apply different security profiles to production, staging, and development workloads.

To deploy workload clusters to an alternative Azure Service Principal account, different from the one used to deploy their management cluster, do the following:

  1. Create the alternative Azure account. You use the details of this account to create an AzureClusterIdentity in a later step. For information about creating an Azure Service Principal Account, see How to: Use the portal to create an Azure AD application and service principal that can access resources in the Azure documentation.

  2. Set the context of kubectl to your management cluster:

    kubectl config use-context MY-MGMT-CLUSTER@MY-MGMT-CLUSTER
    

    Where MY-MGMT-CLUSTER is the name of your management cluster.

  3. Create a secret.yaml file with the following contents:

    apiVersion: v1
    kind: Secret
    metadata:
      name: SECRET-NAME
    type: Opaque
    data:
      clientSecret: CLIENT-SECRET
    

    Where:

    • SECRET-NAME is the secret name for the client password.
    • CLIENT-SECRET is the client secret of your Service Principal Identity. The client secret must be base64-encoded.
  4. Use the file to create the Secret object:

    kubectl apply -f secret.yaml
    
  5. Create an identity.yaml file with the following contents:

    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4
    kind: AzureClusterIdentity
    metadata:
      name: EXAMPLE-IDENTITY
      namespace: EXAMPLE-NAMESPACE
    spec:
      type: ServicePrincipal
      tenantID: AZURE-TENANT-ID
      clientID: CLIENT-SECRET
      clientSecret: {"name":"SECRET-NAME","namespace":"default"}
      allowedNamespaces:
        - CLUSTER-NAMESPACE-1
        - CLUSTER-NAMESPACE-1
    

    Where:

    • EXAMPLE-IDENTITY is the name to use for the AzureClusterIdentity.
    • EXAMPLE-NAMESPACE is the namespace for your AzureClusterIdentity.
    • AZURE-TENANT-ID is your Azure tenant ID.
    • CLIENT-SECRET is the client secret of your Service Principal Identity. The client secret must be base64-encoded.
    • SECRET-NAME is the secret name for the client password.
    • CLUSTER-NAMESPACE-1 and CLUSTER-NAMESPACE-2 are Kubernetes namespaces that the clusters are allowed to use identities from. These namespaces can be selected using an array of namespaces.
  6. Use the file to create the AzureClusterIdentity object:

    kubectl apply -f identity.yaml
    

The management cluster can now deploy workload clusters to the alternative account by using the new AzureClusterIdentity object.

To create workload clusters that use the alternative Azure account, set the following variables in the cluster configuration file:

AZURE_IDENTITY_NAME: EXAMPLE-IDENTITY
AZURE_IDENTITY_NAMESPACE: EXAMPLE-NAMESPACE

Where:

  • EXAMPLE-IDENTITY is the name to use for the AzureClusterIdentity.
  • EXAMPLE-NAMESPACE is the namespace for your AzureClusterIdentity.

After you create the workload cluster, sign in to the Azure Portal using the alternative account, and you should see the cluster running.

Deploy GPU-Enabled Clusters

There are two ways of deploying NVIDIA GPU-enabled workload clusters on Azure:

  • Create a workload cluster with GPU workers, and manually install a GPU policy and operator onto the cluster
  • (Experimental) Configure the management cluster with a ClusterResourceSet (CRS) to create one or more GPU-enabled workload clusters automatically

The subsections below explain these two approaches, and how to test the GPU-enabled clusters.

Deploy and GPU-Enable a Single Cluster

To deploy a workload cluster and configure it manually to take advantage of NVIDIA GPU VMs available on Azure:

  1. In the configuration file for the cluster, set AZURE_NODE_MACHINE_TYPE, for worker nodes, to a GPU-compatible VM type, such as Standard_NC4as_T4_v3.

  2. Deploy the cluster with the cluster configuration file:

    tanzu cluster create MY-GPU-CLUSTER -f MY-GPU-CONFIG
    

    Where MY-GPU-CLUSTER is a name that you give to the cluster.

  3. Install a GPU cluster policy and GPU operator on the cluster:

    1. Set the kubectl context to the cluster, if it is not already the current context.

    2. Download the required NVIDIA GPU resources from the Cluster API Provider Azure repository, and save them to your current directory:

    3. Apply the cluster policy:

      kubectl apply clusterpolicy-crd.yaml
      
    4. Apply the GPU operator:

      kubectl apply gpu-operator-components.yaml
      
  4. Run kubectl get pods -A. You should see listings for gpu-operator- pods in the default namespace, and nvidia- pods in the gpu-operator-resources namespace.

Configure the Management Cluster for GPU Cluster Deploys (Experimental)

You can configure the management cluster to create GPU-enabled workload clusters automatically whenever you add gpu: nvidia to the labels in the cluster manifest. To do this, you install a ClusterResourceSet (CRS) and activate it as follows:

  1. To configure the management cluster to create GPU clusters:

    1. Search the VMware {code} Sample Exchange for GPU CRS for TKG and download the gpu-crs.yaml file for Tanzu Kubernetes Grid v1.4.

    2. Set the context of kubectl to the context of your management cluster:

      kubectl config use-context my-management-cluster-admin@my-management-cluster
      
    3. Apply the CRS file to the management cluster, using the --server-side option to handle the the large size of ConfigMap data:

      kubectl apply -f gpu-crs.yaml --server-side
      
  2. To create a GPU workload cluster:

    1. In the configuration file for the cluster, set AZURE_NODE_MACHINE_TYPE, for worker nodes, to a GPU-compatible VM type, such as Standard_NC4as_T4_v3.

    2. Use tanzu cluster create with the --dry-run option to generate a deployment manifest from the cluster configuration file:

      tanzu cluster create MY-GPU-CLUSTER -f MY-GPU-CONFIG --dry-run > MY-GPU-CLUSTER-MANIFEST
      

      Where MY-GPU-CLUSTER is a name that you give to the cluster.

    3. Create the cluster by passing it to kubectl apply:

      kubectl apply -f MY-GPU-CLUSTER-MANIFEST
      
    4. Run kubectl get pods -A. You should see listings for gpu-operator- pods in the default namespace, and nvidia- pods in the gpu-operator-resources namespace.

Test GPU-Enabled Clusters

To test a GPU-enabled cluster:

  1. Test GPU processing by running the CUDA VectorAdd vector addition test in the NVIDIA documentation.

  2. Test the GPU operator:

    1. Scale up the workload cluster's worker node count:

      tanzu cluster scale MY-GPU-CLUSTER -w 2
      
    2. Run kubectl get pods -A again. You should see additional gpu-operator- and nvidia- pods listed for the added nodes.

What to Do Next

Advanced options that are applicable to all infrastructure providers are described in the following topics:

After you have deployed your cluster, see Manage Clusters.

check-circle-line exclamation-circle-line close-line
Scroll to top icon