Deploy Tanzu Kubernetes Clusters to Multiple Availability Zones on vSphere (Experimental)

To enable multiple availability zones for Tanzu Kubernetes clusters on vSphere, two new custom resource definitions (CRD) have been introduced in Cluster API Provider vSphere (CAPV).

  • The VSphereFailureDomain CRD captures the region/zone specific tagging information and the topology definition which includes the vSphere datacenter, cluster, host, and datastore information.
  • The VSphereDeploymentZone CRD captures the association of a VSphereFailureDomain with placement constraint information for the Kubernetes node.

IMPORTANT: Multiple availability zones on vSphere is an experimental feature in Tanzu Kubernetes Grid 1.4.0, for testing and proof-of-concept purposes. VMware welcomes your feedback on this experimental feature, but its use is not supported in production environments in this release.

The configurations in this topic spread the Kubernetes control plane and worker nodes across vSphere objects, namely hosts and compute clusters and datacenters.

Spread Nodes Across Multiple Compute Clusters in a Datacenter

The example in this section shows how to achieve multiple availability zones by spreading nodes across multiple compute clusters.

  1. Create the custom resources for defining the region and zones.

    To spread the Kubernetes nodes for a Tanzu Kubernetes cluster across multiple compute clusters within a datacenter you must create custom resources. This example describes 3 deployment zones named us-west-1a, us-west-1b and us-west-1c, with each one being a compute cluster with its network and storage parameters.

    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereFailureDomain
    metadata:
     name: us-west-1a
    spec:
     region:
       name: us-west-1
       type: Datacenter
       tagCategory: k8s-region
     zone:
       name: us-west-1a
       type: ComputeCluster
       tagCategory: k8s-zone
     topology:
       datacenter: dc0
       computeCluster: cluster1
       datastore: ds-c1
       networks:
       - net1
       - net2
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereFailureDomain
    metadata:
     name: us-west-1b
    spec:
     region:
       name: us-west-1
       type: Datacenter
       tagCategory: k8s-region
     zone:
       name: us-west-1b
       type: ComputeCluster
       tagCategory: k8s-zone
     topology:
       datacenter: dc0
       computeCluster: cluster2
       datastore: ds-c2
       networks:
       - net3
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereFailureDomain
    metadata:
     name: us-west-1c
    spec:
     region:
       name: us-west-1
       type: Datacenter
       tagCategory: k8s-region
     zone:
       name: us-west-1c
       type: ComputeCluster
       tagCategory: k8s-zone
     topology:
       datacenter: dc0
       computeCluster: cluster3
       datastore: ds-c3
       networks:
       - net4
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereDeploymentZone
    metadata:
     name: us-west-1a
    spec:
     server: vcenter.sddc-54-70-161-229.com
     failureDomain: us-west-1a
     placementConstraint:
       resourcePool: pool1
       folder: foo
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereDeploymentZone
    metadata:
     name: us-west-1b
    spec:
     server: vcenter.sddc-54-70-161-229.com
     failureDomain: us-west-1b
     placementConstraint:
       resourcePool: pool2
       folder: bar
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereDeploymentZone
    metadata:
     name: us-west-1c
    spec:
     server: vcenter.sddc-54-70-161-229.com
     failureDomain: us-west-1c
     placementConstraint:
       resourcePool: pool3
       folder: baz 
    

    If different compute clusters have identically-named resource pools, set the VSphereDeploymentZone objects’ spec.placementConstraint.resourcePool to a full resource path, not just the name.

  2. Tag the vSphere objects.

    From the first VSphereFailureDomain CR, named us-west-1a, use govc to apply the following tags to the datacenter dc0 and the compute cluster cluster1.

    $ govc tags.attach -c k8s-region us-west-1 /dc0
    
    $ govc tags.attach -c k8s-zone us-west-1a /dc0/host/cluster1
    

    Similarly, perform the following tagging operations for the other compute clusters.

    $ govc tags.attach -c k8s-zone us-west-1b /dc0/host/cluster2
    
    $ govc tags.attach -c k8s-zone us-west-1c /dc0/host/cluster3
    

    You can skip this step if spec.region.autoConfigure and spec.zone.autoConfigure are set to true when creating the VSphereFailureDomain CRs.

For the next steps to deploy the cluster, see Deploy a Workload Cluster with Nodes Spread Across Availability Zones.

Spread Nodes Across Multiple Hosts in a Single Compute Cluster

The example in this section spreads workload cluster nodes across 3 different host groups in a single cluster.

  1. In vCenter Server, create Host groups, for example rack1, and VM groups, for example rack1-vm-group, for each failure domain.

    • Create Host and VM groups from Configure > VM/Host Groups > Add…
    • The number of host groups should match the number of availability zones you plan to use.
    • To create a VM group, you may have to create a dummy VM to add as a group member.
    • Alternatively, you can use govc to create host and VM groups by running commands similar to the following, without having to create a dummy VM:
      govc cluster.group.create -cluster=RegionA01-MGMT -name=rack1 -host esx-01a.corp.tanzu esx-02a.corp.tanzu
      
      govc cluster.group.create -cluster=RegionA01-MGMT -name=rack1-vm-group -vm
      
  2. Add affinity rules between the created VM groups and Host groups so that the VMs in the VM group must run on the hosts in the created Host group.

    • Set Type to Virtual Machines to Hosts and include the rule Must run on hosts in group.
  3. Create the VSphereFailureDomain and VSphereDeploymentZone custom resource (CR) definitions in a file vsphere-zones.yaml.

    • Everything under spec.region, spec.zone, and spec.topology must match what you have configured in vCenter.
    • For VSphereDeploymentZone objects, the spec.failuredomain value must match one of the metadata.name values of the VSphereFailureDomain definitions
    • The spec.server value in the VSphereDeploymentZone objects must match the vCenter server address (IP or FQDN) entered for VCENTER SERVER in the installer interface IaaS Provider pane or the VSPHERE_SERVER setting in the management cluster configuration file.
    • metadata.name values must be all lowercase.

    For example, the following vsphere-zones.yaml files define three zones within a region room1, where each zone is a rack of hosts within the same cluster.

    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereFailureDomain
    metadata:
     name: rack1
    spec:
     region:
       name: room1
       type: ComputeCluster
       tagCategory: k8s-region
     zone:
       name: rack1
       type: HostGroup
       tagCategory: k8s-zone
     topology:
       datacenter: dc0
       computeCluster: cluster1
       hosts:
         vmGroupName: rack1-vm-group
         hostGroupName: rack1
       datastore: ds-r1
       networks:
       - net1
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereFailureDomain
    metadata:
     name: rack2
    spec:
     region:
       name: room1
       type: ComputeCluster
       tagCategory: k8s-region
     zone:
       name: rack2
       type: HostGroup
       tagCategory: k8s-zone
     topology:
       datacenter: dc0
       computeCluster: cluster1
       hosts:
         vmGroupName: rack2-vm-group
         hostGroupName: rack2
       datastore: ds-r2
       networks:
       - net2
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereFailureDomain
    metadata:
     name: rack3
    spec:
     region:
       name: room1
       type: ComputeCluster
       tagCategory: k8s-region
     zone:
       name: rack3
       type: HostGroup
       tagCategory: k8s-zone
     topology:
       datacenter: dc0
       computeCluster: cluster1
       hosts:
         vmGroupName: rack3-vm-group
         hostGroupName: rack3
       datastore: ds-c3
       networks:
       - net3
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereDeploymentZone
    metadata:
     name: rack1
    spec:
     server: vcenter.sddc-54-70-161-229.com
     failureDomain: rack1
     placementConstraint:
       resourcePool: pool1
       folder: foo
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereDeploymentZone
    metadata:
     name: rack2
    spec:
     server: vcenter.sddc-54-70-161-229.com
     failureDomain: rack2
     placementConstraint:
       resourcePool: pool2
       folder: bar
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereDeploymentZone
    metadata:
     name: rack3
    spec:
     server: vcenter.sddc-54-70-161-229.com
     failureDomain: rack3
     placementConstraint:
       resourcePool: pool3
       folder: baz
    
  4. Apply the CR definitions file to create the VSphereFailureDomain and VSphereDeploymentZone objects:

    kubectl apply -f vsphere-zones.yaml
    
  5. Use govc to create tag categories and tags for regions and zones, to apply to the compute clusters and hosts listed in your VSphereFailureDomain CRs.

    • Create a region tag category and tags for compute clusters, for example:

      govc tags.category.create -t ClusterComputeResource k8s-region
      
      govc tags.create -c k8s-region room1
      

      Repeat for all regions:

      govc tags.create -c k8s-region REGION
      
    • Create a zone tag category and tags for hosts, for example:

      govc tags.category.create -t HostSystem k8s-zone
      
      govc tags.create -c k8s-zone rack1
      

      Repeat for all zones:

      govc tags.create -c k8s-zone ZONE
      

    Alternatively, you can perform the tag operations in this and the following steps from the Tags & Custom Attributes pane in vCenter.

  6. Attach the region tags to all the compute clusters listed in the CR definitions, for example:

    $ govc tags.attach -c k8s-region room1 /dc1/host/room1-mgmt
    

    Use the full path for each compute cluster.

  7. Attach the zone tags to all the host objects listed in the CR definitions, for example:

    $ govc tags.attach -c k8s-zone rack1 /dc1/host/room1-mgmt/esx-01a.corp.tanzu
    

    Use the full path for each host.

For the next steps to deploy the cluster, see Deploy a Workload Cluster with Nodes Spread Across Availability Zones.

Deploy a Workload Cluster with Nodes Spread Across Availability Zones

After you have performed the steps in Spread Nodes Across Multiple Compute Clusters in a Datacenter or Spread Nodes Across Multiple Hosts in a Single Compute Cluster, you can deploy a workload cluster with its nodes spread across multiple availability zones.

  1. To spread the worker nodes across the zones, use an overlay to update the VSphereMachineTemplate object to include the name of the failure domain in the spec definition.

    To do this, edit the ~/.config/tanzu/tkg/providers/infrastructure-vsphere/ytt/vsphere-overlay.yaml file to include the following:

    #! Please add any overlays specific to vSphere provider under this file.
    #@ load("@ytt:overlay", "overlay")
    #@ load("@ytt:data", "data")
    
    #@overlay/match by=overlay.subset({"kind":"MachineDeployment", "metadata":{"name": "{}-md-0".format(data.values.CLUSTER_NAME)}})
    ---
    spec:
     template:
       spec:
         #@overlay/match missing_ok=True
         failureDomain: FAILURE-DOMAIN-1
    
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereMachineTemplate
    metadata:
     name: #@ "{}-worker-1".format(data.values.CLUSTER_NAME)
    spec:
     template:
       spec:
         cloneMode:  #@ data.values.VSPHERE_CLONE_MODE
         datacenter: #@ data.values.VSPHERE_DATACENTER
         datastore: #@ data.values.VSPHERE_DATASTORE
         storagePolicyName: #@ data.values.VSPHERE_STORAGE_POLICY_ID
         diskGiB: #@ data.values.VSPHERE_WORKER_DISK_GIB
         folder: #@ data.values.VSPHERE_FOLDER
         memoryMiB: #@ data.values.VSPHERE_WORKER_MEM_MIB
         network:
           devices:
           #@ if data.values.TKG_IP_FAMILY == "ipv6":
           #@overlay/match by=overlay.index(0)
           #@overlay/replace
           - dhcp6: true
             networkName: #@ data.values.VSPHERE_NETWORK
           #@ else:
           #@overlay/match by=overlay.index(0)
           #@overlay/replace
           - dhcp4: true
             networkName: #@ data.values.VSPHERE_NETWORK
           #@ end
         numCPUs: #@ data.values.VSPHERE_WORKER_NUM_CPUS
         resourcePool: #@ data.values.VSPHERE_RESOURCE_POOL
         server: #@ data.values.VSPHERE_SERVER
         template: #@ data.values.VSPHERE_TEMPLATE
    
    ---
    apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
    kind: KubeadmConfigTemplate
    metadata:
     name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
     namespace: '${ NAMESPACE }'
    spec:
     template:
       spec:
         useExperimentalRetryJoin: true
         joinConfiguration:
           nodeRegistration:
             criSocket: /var/run/containerd/containerd.sock
             kubeletExtraArgs:
               cloud-provider: external
               tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
             name: '{{ ds.meta_data.hostname }}'
         preKubeadmCommands:
         - hostname "{{ ds.meta_data.hostname }}"
         - echo "::1         ipv6-localhost ipv6-loopback" >/etc/hosts
         - echo "127.0.0.1   localhost" >>/etc/hosts
         - echo "127.0.0.1   {{ ds.meta_data.hostname }}" >>/etc/hosts
         - echo "{{ ds.meta_data.hostname }}" >/etc/hostname
         files: []
         users:
         - name: capv
           sshAuthorizedKeys:
           - '${ VSPHERE_SSH_AUTHORIZED_KEY }'
           sudo: ALL=(ALL) NOPASSWD:ALL
    
    ---
    apiVersion: cluster.x-k8s.io/v1alpha3
    kind: MachineDeployment
    metadata:
     labels:
       cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
     name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
    spec:
     clusterName: #@ data.values.CLUSTER_NAME
     replicas: #@ data.values.WORKER_MACHINE_COUNT
     selector:
       matchLabels:
         cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
     template:
       metadata:
         labels:
           cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
           node-pool: #@ "{}-worker-pool".format(data.values.CLUSTER_NAME)
       spec:
         bootstrap:
           configRef:
             apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
             kind: KubeadmConfigTemplate
             name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
         clusterName: #@ data.values.CLUSTER_NAME
         infrastructureRef:
           apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
           kind: VSphereMachineTemplate
           name: #@ "{}-worker-1".format(data.values.CLUSTER_NAME)
         version: #@ data.values.KUBERNETES_VERSION
         failureDomain: FAILURE-DOMAIN-2
    
    ---
    apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
    kind: VSphereMachineTemplate
    metadata:
     name: #@ "{}-worker-2".format(data.values.CLUSTER_NAME)
    spec:
     template:
       spec:
         cloneMode:  #@ data.values.VSPHERE_CLONE_MODE
         datacenter: #@ data.values.VSPHERE_DATACENTER
         datastore: #@ data.values.VSPHERE_DATASTORE
         storagePolicyName: #@ data.values.VSPHERE_STORAGE_POLICY_ID
         diskGiB: #@ data.values.VSPHERE_WORKER_DISK_GIB
         folder: #@ data.values.VSPHERE_FOLDER
         memoryMiB: #@ data.values.VSPHERE_WORKER_MEM_MIB
         network:
           devices:
           #@ if data.values.TKG_IP_FAMILY == "ipv6":
           #@overlay/match by=overlay.index(0)
           #@overlay/replace
           - dhcp6: true
             networkName: #@ data.values.VSPHERE_NETWORK
           #@ else:
           #@overlay/match by=overlay.index(0)
           #@overlay/replace
           - dhcp4: true
             networkName: #@ data.values.VSPHERE_NETWORK
           #@ end
         numCPUs: #@ data.values.VSPHERE_WORKER_NUM_CPUS
         resourcePool: #@ data.values.VSPHERE_RESOURCE_POOL
         server: #@ data.values.VSPHERE_SERVER
         template: #@ data.values.VSPHERE_TEMPLATE
    
    ---
    apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
    kind: KubeadmConfigTemplate
    metadata:
     name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
     namespace: '${ NAMESPACE }'
    spec:
     template:
       spec:
         useExperimentalRetryJoin: true
         joinConfiguration:
           nodeRegistration:
             criSocket: /var/run/containerd/containerd.sock
             kubeletExtraArgs:
               cloud-provider: external
               tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
             name: '{{ ds.meta_data.hostname }}'
         preKubeadmCommands:
         - hostname "{{ ds.meta_data.hostname }}"
         - echo "::1         ipv6-localhost ipv6-loopback" >/etc/hosts
         - echo "127.0.0.1   localhost" >>/etc/hosts
         - echo "127.0.0.1   {{ ds.meta_data.hostname }}" >>/etc/hosts
         - echo "{{ ds.meta_data.hostname }}" >/etc/hostname
         files: []
         users:
         - name: capv
           sshAuthorizedKeys:
           - '${ VSPHERE_SSH_AUTHORIZED_KEY }'
           sudo: ALL=(ALL) NOPASSWD:ALL
    
    ---
    apiVersion: cluster.x-k8s.io/v1alpha3
    kind: MachineDeployment
    metadata:
     labels:
       cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
     name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
    spec:
     clusterName: #@ data.values.CLUSTER_NAME
     replicas: #@ data.values.WORKER_MACHINE_COUNT
     selector:
       matchLabels:
         cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
     template:
       metadata:
         labels:
           cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
           node-pool: #@ "{}-worker-pool".format(data.values.CLUSTER_NAME)
       spec:
         bootstrap:
           configRef:
             apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
             kind: KubeadmConfigTemplate
             name: #@ "{}-md-2".format(data.values.CLUSTER_NAME)
         clusterName: #@ data.values.CLUSTER_NAME
         infrastructureRef:
           apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
           kind: VSphereMachineTemplate
           name: #@ "{}-worker-2".format(data.values.CLUSTER_NAME)
         version: #@ data.values.KUBERNETES_VERSION
         failureDomain: FAILURE-DOMAIN-3
    
    #@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate"}), expects="1+"
    ---
    spec:
     template:
       spec:
         users:
         #@overlay/match by=overlay.index(0)
         #@overlay/replace
         - name: capv
           sshAuthorizedKeys:
           - #@ data.values.VSPHERE_SSH_AUTHORIZED_KEY
           sudo: ALL=(ALL) NOPASSWD:ALL
    

    The above overlay adds the names of the VSphereFailureDomain objects, rack1, rack2, and rack3 in the single-cluster example above, to the VSphereMachineTemplate objects to create worker nodes. Since there are 3 different failure domains, this example creates 2 extra sets of MachineDeployment + VSphereMachineTemplate + KubeadmConfigTemplate objects. Each VSphereMachineTemplate object has the name of a failure domain that controls the placement of the machines.

    The overlay also renames VSphereMachineTemplate objects from the default pattern CLUSTER-NAME-worker-# to CLUSTER-NAME-md-#, to match the naming of the MachineDeployment objects.

  2. Create the cluster configuration file for the workload cluster you are deploying.

    • Set VSPHERE_REGION and VSPHERE_ZONE to the region and zone tag categories, k8s-region and k8s-zone in the example above.
    • WORKER_MACHINE_COUNT sets the number of workers per availability zone. The cluster’s total number of workers is the WORKER_MACHINE_COUNT setting times the number of zones.

    For the full list of options that you must specify when deploying workload clusters to vSphere, see the Tanzu CLI Configuration File Variable Reference.

  3. Run tanzu cluster create to create the workload cluster. For the basic process of deploying workload clusters, see Deploy a Workload Cluster: Basic Process.

    • During the cluster creation process, you can see its VMs and other resources appear in vCenter.
    • If you created a dummy VM in vCenter in order to create a VM group, you can delete or remove the VM from the VM groups once the cluster is running.
check-circle-line exclamation-circle-line close-line
Scroll to top icon