You can scale a TKG Service cluster horizontally by changing the number of nodes, or vertically by changing the virtual machine class hosting the nodes. You can also scale volumes attached to cluster nodes.

Supported Manual Scaling Operations

The table lists the supported scaling operations for TKG clusters.
Table 1. Supported Scaling Operations for TKGS Clusters
Node Horizontal Scale Out Horizontal Scale In Vertical Scale Volume Scale
Control Plane Yes No Yes Yes*
Worker Yes Yes Yes Yes
Keep in mind the following considerations:
  • The number of control plane nodes must be odd, either 1 or 3. Scaling out the control plane is supported, but scaling in the control plane is not supported. See Scale Out the Control Plane.
  • While vertically scaling a cluster node, workloads may no longer be able to run on the node for lack of available resources. Thus, horizontal scaling is generally the preferred approach. See Scale Out Worker Nodes.
  • VM classes are not immutable. If you scale out a TKG cluster after editing a VM class used by that cluster, new cluster nodes use the updated class definition, but existing cluster nodes continue to use the initial class definition, resulting in a mismatch. See About VM Classes.
  • Worker node volumes can be changed after provisioning; likewise control plane nodes can be changed provided you are using vSphere 8 U3 or later and a compatible TKr. See Scale Cluster Node Volumes.

Scaling Prerequisite: Configure Kubectl Editing

To scale a TKG cluster, you update the cluster manifest using the command kubectl edit CLUSTER-KIND/CLUSTER-NAME. When you save the manifest changes, the cluster is updated with the changes. See Configure a Text Editor for Kubectl.

For example:
kubectl edit tanzukubernetescluster/tkg-cluster-1
tanzukubernetescluster.run.tanzu.vmware.com/tkg-cluster-1 edited
To cancel changes, close the editor without saving.
kubectl edit tanzukubernetescluster/tkg-cluster-1
Edit cancelled, no changes made.

Scale Out the Control Plane

Scale out a TKG cluster by increasing number of control plane nodes from 1 to 3.
Note: Production clusters require 3 control plane nodes.
  1. Log in to Supervisor.
    kubectl vsphere login --server=SUPERVISOR-IP-ADDRESS --vsphere-username USERNAME
  2. Switch context to the vSphere Namespace where the TKG cluster is running.
    kubectl config use-context tkg-cluster-ns
  3. List the Kubernetes clusters running in the vSphere Namespace.

    Use the following syntax:

    kubectl get CLUSTER-KIND -n tkg-cluster-ns
    For example, for a v1alpha3 API cluster:
    kubectl get tanzukubernetescluster -n tkg-cluster-ns
    For example, for a v1beta1 API cluster:
    kubectl get cluster -n tkg-cluster-ns
  4. Get the number of nodes running in the target cluster.

    For a v1alpha3 API cluster:

    kubectl get tanzukubernetescluster tkg-cluster-1
    The TKG cluster has 1 control plane nodes and 3 worker nodes.
    NAMESPACE             NAME             CONTROL PLANE   WORKER   TKR NAME                   AGE     READY
    tkg-cluster-ns        tkg-cluster-1    1               3        v1.24.9---vmware.1-tkg.4   5d12h   True
    

    For a v1beta1 API cluster:

    kubectl get cluster tkg-cluster-1
  5. Load the cluster manifest for editing using the kubectl edit command.
    For a v1alpha3 API cluster:
    kubectl edit tanzukubernetescluster/tkg-cluster-1
    For a v1beta1 API cluster:
    kubectl edit cluster/tkg-cluster-1

    The cluster manifest opens in the text editor defined by your KUBE_EDITOR or EDITOR environment variables.

  6. Increase the number of control plane nodes from 1 to 3 in the spec.topology.controlPlane.replicas section of the manifest.
    For a v1alpha3 API cluster:
    ...
    spec:
      topology:
        controlPlane:
          replicas: 1
    ...
    
    ...
    spec:
      topology:
        controlPlane:
          replicas: 3
    ...
    
    For a v1beta1 API cluster:
    ...
    spec:
      ...
      topology:
        class: tanzukubernetescluster
        controlPlane:
          metadata: {}
          replicas: 1
        variables:
    ...
    
    ...
    spec:
      ...
      topology:
        class: tanzukubernetescluster
        controlPlane:
          metadata: {}
          replicas: 3
        variables:
    ...
    
  7. Save the file in the text editor to apply the changes. (To cancel, close the editor without saving.)

    When you save the manifest changes, kubectl applies the changes to the cluster. In the background, the Virtual Machine Service on Supervisor provisions the new control plane node.

  8. Verify that the new nodes are added.

    For a v1alpha3 API cluster:

    kubectl get tanzukubernetescluster tkg-cluster-1
    The scaled out control plane now has 3 nodes.
    NAMESPACE             NAME             CONTROL PLANE   WORKER   TKR NAME                   AGE     READY
    tkg-cluster-ns        tkg-cluster-1    3               3        v1.24.9---vmware.1-tkg.4   5d12h   True
    

    For a v1beta1 API cluster:

    kubectl get cluster tkg-cluster-1

Scale Out Worker Nodes

You can scale out a TKG cluster by increasing the number of worker nodes.

  1. Log in to Supervisor.
    kubectl vsphere login --server=SVC-IP-ADDRESS --vsphere-username USERNAME
  2. Switch context to the vSphere Namespace where the TKG cluster is running.
    kubectl config use-context tkg-cluster-ns
  3. List the Kubernetes clusters running in the vSphere Namespace.

    Use the following syntax:

    kubectl get CLUSTER-KIND -n tkg-cluster-ns
    For example, for a v1alph3 API cluster:
    kubectl get tanzukubernetescluster -n tkg-cluster-ns
    For example, for a v1beta1 API cluster:
    kubectl get cluster -n tkg-cluster-ns
  4. Get the number of nodes running in the target cluster.

    For a v1alpha3 API cluster:

    kubectl get tanzukubernetescluster tkg-cluster-1
    For example, the following cluster has 3 control plane nodes and 3 worker nodes.
    NAMESPACE             NAME             CONTROL PLANE   WORKER   TKR NAME                   AGE     READY
    tkg-cluster-ns        tkg-cluster-1    3               3        v1.24.9---vmware.1-tkg.4   5d12h   True
    
    For a v1beta1 API cluster:
    kubectl get cluster tkg-cluster-1
  5. Load the cluster manifest for editing using the kubectl edit command.

    For a v1alpha3 API cluster:

    kubectl edit tanzukubernetescluster/tkg-cluster-1
    For a v1beta1 API cluster:
    kubectl edit cluster/tkg-cluster-1

    The cluster manifest opens in the text editor defined by your KUBE_EDITOR or EDITOR environment variables.

  6. Increase the number of worker nodes by editing the spec.topology.nodePools.NAME.replicas value for the target worker node pool.
    For a v1alpha3 API cluster:
    ...
    spec:
      topology:
        ...
        nodePools:
        - name: worker-1
          replicas: 3
    ...
    ...
    spec:
      topology:
        ...
        nodePools:
        - name: worker-1
          replicas: 4
    ...
    For a v1beta1 API cluster:
    ...
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      ...
    spec:
      ...
      topology:
        ...
        class: tanzukubernetescluster
        controlPlane:
        ...
        workers:
          machineDeployments:
          - class: node-pool
            metadata: {}
            name: node-pool-1
            replicas: 3
    ...
    ...
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      ...
    spec:
      ...
      topology:
        ...
        class: tanzukubernetescluster
        controlPlane:
        ...
        workers:
          machineDeployments:
          - class: node-pool
            metadata: {}
            name: node-pool-1
            replicas: 4
    ...
  7. To apply the changes, save the file in the text editor. To cancel the changes, close the editor without saving.

    When you save the file, kubectl applies the changes to the cluster. In the background, the Virtual Machine Service on Supervisor provisions the new worker node.

  8. Verify that the new nodes are added.

    For a v1alpha3 API cluster:

    kubectl get tanzukubernetescluster tkg-cluster-1
    After scaling out, the cluster has 4 worker nodes.
    NAMESPACE             NAME             CONTROL PLANE   WORKER   TKR NAME                   AGE     READY
    tkg-cluster-ns        tkg-cluster      3               4        v1.24.9---vmware.1-tkg.4   5d12h   True
    

    For a v1beta1 API cluster:

    kubectl get cluster tkg-cluster-1

Scale In Worker Nodes

You can scale in a TKG cluster by decreasing the number of worker nodes.

  1. Log in to Supervisor.
    kubectl vsphere login --server=SVC-IP-ADDRESS --vsphere-username USERNAME
  2. Switch context to the vSphere Namespace where the TKG cluster is running.
    kubectl config use-context tkg-cluster-ns
  3. List the Kubernetes clusters running in the vSphere Namespace.

    Use the following syntax:

    kubectl get CLUSTER-KIND -n tkg-cluster-ns
    For example, for a v1alph3 API cluster:
    kubectl get tanzukubernetescluster -n tkg-cluster-ns
    For example, for a v1beta1 API cluster:
    kubectl get cluster -n tkg-cluster-ns
  4. Get the number of nodes running in the target cluster.
    kubectl get tanzukubernetescluster tkg-cluster-1
  5. Get the number of nodes running in the target cluster.

    For a v1alpha3 API cluster:

    kubectl get tanzukubernetescluster tkg-cluster-1
    For example, the following cluster has 3 control plane nodes and 4 worker nodes.
    NAMESPACE             NAME             CONTROL PLANE   WORKER   TKR NAME                   AGE     READY
    tkg-cluster-ns        tkg-cluster      3               4        v1.24.9---vmware.1-tkg.4   5d12h   True
    
    For a v1beta1 API cluster:
    kubectl get cluster tkg-cluster-1
  6. Load the cluster manifest for editing using the kubectl edit command.

    For a v1alpha3 API cluster:

    kubectl edit tanzukubernetescluster/tkg-cluster-1
    For a v1beta1 API cluster:
    kubectl edit cluster/tkg-cluster-1

    The cluster manifest opens in the text editor defined by your KUBE_EDITOR or EDITOR environment variables.

  7. Decrease the number of worker nodes by editing the spec.topology.nodePools.NAME.replicas value for the target worker node pool.
    For a v1alpha3 API cluster:
    ...
    spec:
      topology:
        ...
        nodePools:
        - name: worker-1
          replicas: 4
    ...
    ...
    spec:
      topology:
        ...
        nodePools:
        - name: worker-1
          replicas: 3
    ...
    For a v1beta1 API cluster:
    ...
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      ...
    spec:
      ...
      topology:
        ...
        class: tanzukubernetescluster
        controlPlane:
        ...
        workers:
          machineDeployments:
          - class: node-pool
            metadata: {}
            name: node-pool-1
            replicas: 4
    ...
    ...
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      ...
    spec:
      ...
      topology:
        ...
        class: tanzukubernetescluster
        controlPlane:
        ...
        workers:
          machineDeployments:
          - class: node-pool
            metadata: {}
            name: node-pool-1
            replicas: 3
    ...
  8. To apply the changes, save the file in the text editor. To cancel the changes, close the editor without saving.

    When you save the file, kubectl applies the changes to the cluster. In the background, the Virtual Machine Service on Supervisor provisions the new worker node.

  9. Verify that the worker nodes were removed.

    For a v1alpha3 API cluster:

    kubectl get tanzukubernetescluster tkg-cluster-1
    After scaling in, the cluster has 3 worker nodes.
    NAMESPACE             NAME             CONTROL PLANE   WORKER   TKR NAME                   AGE     READY
    tkg-cluster-ns        tkg-cluster-1    3               3        v1.24.9---vmware.1-tkg.4   5d12h   True
    

    For a v1beta1 API cluster:

    kubectl get cluster tkg-cluster-1

Scale a Cluster Vertically

TKG on Supervisor supports vertical scaling for cluster control plane and worker nodes. You scale a TKG cluster vertically by changing the virtual machine class used for cluster nodes. The VM class you use must be bound to the vSphere Namespace where the TKG cluster is provisioned.

TKG on Supervisor supports vertical scaling through the rolling update mechanism built into the system. When you change the VirtualMachineClass definition, the system rolls out new nodes with that new class and spins down the old nodes. See Updating TKG Service Clusters.

  1. Log in to Supervisor.
    kubectl vsphere login --server=SVC-IP-ADDRESS --vsphere-username USERNAME
  2. Switch context to the vSphere Namespace where the TKG cluster is running.
    kubectl config use-context tkg-cluster-ns
  3. List the Kubernetes clusters running in the vSphere Namespace.

    Use the following syntax:

    kubectl get CLUSTER-KIND -n tkg-cluster-ns
    For example, for a v1alph3 API cluster:
    kubectl get tanzukubernetescluster -n tkg-cluster-ns
    For example, for a v1beta1 API cluster:
    kubectl get cluster -n tkg-cluster-ns
  4. Describe the target TKG cluster and check the VM class.

    For a v1alpha3 API cluster:

    kubectl describe tanzukubernetescluster tkg-cluster-1

    For example, the following cluster is using the best-effort-medium VM class.

    spec:
      topology:
        controlPlane:
          replicas: 3
          vmClass: best-effort-medium
          ...
        nodePools:
        - name: worker-nodepool-a1
          replicas: 3
          vmClass: best-effort-medium
          ...
    

    For a v1beta1 API cluster:

    kubectl describe cluster tkg-cluster-1

    For example, the following cluster is using the best-effort-medium VM class.

    ...
    Topology:
          ...
        Variables:
          ...
          Name:   vmClass
          Value:  best-effort-medium
        ...
    Note: For v1beta1 API clusters, by default the vmClass is set globally as a single variable. You can override this setting and use a different VM Class for control plane and worker nodes. See vmClassin the API reference.
  5. List and describe the available VM classes.
    kubectl get virtualmachineclass
    kubectl describe virtualmachineclass
    Note: The VM class must be bound to the vSphere Namespace. See Using VM Classes with TKG Service Clusters.
  6. Open for editing the target cluster manifest.

    For a v1alpha3 API cluster:

    kubectl edit tanzukubernetescluster/tkg-cluster-1
    For a v1beta1 API cluster:
    kubectl edit cluster/tkg-cluster-1

    The cluster manifest opens in the text editor defined by your KUBE_EDITOR or EDITOR environment variables.

  7. Edit the manifest by changing the VM class.
    For a v1alpha3 API cluster, change the VM class for the control plane to guaranteed-medium and the VM class for worker nodes to guaranteed-large.
    spec:
      topology:
        controlPlane:
          replicas: 3
          vmClass: guaranteed-medium
          ...
        nodePools:
        - name: worker-nodepool-a1
          replicas: 3
          vmClass: guaranteed-large
          ...
    
    For a v1beta API cluster, change the VM class to guaranteed-large.
    ...
    Topology:
          ...
        Variables:
          ...
          Name:   vmClass
          Value:  guaranteed-large
        ...
  8. To apply the changes, save the file in the text editor. To cancel the changes, close the editor without saving.

    When you save the file, kubectl applies the changes to the cluster. In the background, the TKG on Supervisor performs a rolling update of the TKG cluster.

  9. Verify that the TKG cluster is updated with the new VM class.

    For a v1alpha3 API cluster:

    kubectl describe tanzukubernetescluster tkg-cluster-1

    For a v1beta1 API cluster:

    kubectl describe cluster tkg-cluster-1

Scale Cluster Node Volumes

In the TKG cluster specification for nodes, optionally you can declare one or more persistent volumes for the node. Declaring a node volume is useful for high-churn components such as the container runtime and kubelet on worker nodes.

If you want to add or change one or more node volumes after cluster creation, keep in mind the following considerations:
Volume Node Description

Worker node volume changes are allowed

After a TKG cluster is provisioned, you can add or update worker node volumes. When you initiate a rolling update, the cluster is updated with the new or changed volume.

Warning: If you scale the worker node with a new or changed volume, data in the current volume is deleted during the rolling update. Refer to the explanation that follows.

A volume declared for a TKG cluster node is treated as ephemeral. A TKG cluster uses a persistent volume claim (PVC) in the vSphere Namespace so that the volume capacity is counted against the TKG cluster's storage quota. If you increase a TKC volume's capacity, the Kubernetes Cluster API (CAPI) will roll out new workers with a new PVC. TKG does not perform any data migration in this case, but Kubernetes will (re)schedule workload pods accordingly.

Control plane node volume changes are allowed if you are using vSphere 8 U3 or later

If you are using vSphere 8 U3 or later and a compatible Tanzu Kubernetes release, you can add or update a control plane node volume after a TKG Service cluster has been provisioned.

If you are not using vSphere 8 U3 or later, the Kubernetes Cluster API (CAPI) forbids post-creation changes to spec.toplogy.controlPlane.volumes.

If you attempt to add or change a control plane volume after cluster creation, the request is denied and you receive the error message "updates to immutable fields are not allowed."

The following is an excerpted cluster specification based on the v1alpha3 API with declared node volumes. Refer to the complete TKG cluster example from which this excerpt is taken as needed: v1alpha3 Example: TKC with Default Storage and Node Volumes. For a v1beta1 API cluster example, see v1beta1 Example: Custom Cluster Based on the Default ClusterClass.

apiVersion: run.tanzu.vmware.com/v1alpha3
kind: TanzuKubernetesCluster
...
spec:
   topology:
     controlPlane:
       replicas: 3
       storageClass: tkg-storage-policy
       vmClass: guaranteed-medium
       tkr:
         reference:
           name: v1.24.9---vmware.1-tkg.4
     nodePools:
     - name: worker-nodepool-a1
       replicas: 3
       storageClass: tkg-storage-policy
       vmClass: guaranteed-large
       tkr:
         reference:
           name: v1.24.9---vmware.1-tkg.4
       volumes:
       - name: containerd
         mountPath: /var/lib/containerd
         capacity:
           storage: 50Gi
       - name: kubelet
         mountPath: /var/lib/kubelet
         capacity:
           storage: 50Gi
     - name: worker-nodepool-a2
       ...
   settings:
     ...