Manage Node Pools of Different VM Types

This topic explains how to create, update and delete node pools in a workload cluster. Node pools enable a single workload cluster to contain and manage different types of nodes, to support the diverse needs of different applications.

For example, a cluster can use nodes with high storage capacity to run a datastore, and thinner nodes to process application requests.

Note

You cannot create node pools in a single-node clusters.
To create and use node pools in workload clusters created by vSphere with Tanzu, you need to be running vSphere v7.0 U3 or later.
If you use vSphere with Tanzu, your vSphere with Tanzu clusters must use the v1alpha2 API to run the node-pool commands successfully. For more information, see the vSphere with Tanzu documentation.

About Node Pools

Node pools define properties for the sets of worker nodes used by a workload cluster.

Some node pool properties depend on the VM options that are available in the underlying infrastructure, but all node pools on all cloud infrastructures share the following properties:

  • name: a unique identifier for the node pool, used for operations like updates and deletion.
  • replicas: the number of nodes in the pool, all of which share the same properties.
  • labels: key/value pairs set as metadata on the nodes, to match workloads to nodes in the pool. For more information, and example labels, see Labels and Selectors in the Kubernetes documentation.

For workload clusters on vSphere, standalone management clusters by default follow anti-affinity rules to deploy node pool workers and control plane nodes to different ESXi hosts. To deactivate the anti-affinity rules for node placement, see Deactivate Anti-Affinity Rules.

All workload clusters are created with a first, original node pool. When you create additional node pools for a cluster, as described below, the first node pool provides default values for properties not set in the new node pool definitions.

List Node Pools

To inspect the node pools currently available in a cluster, run:

tanzu cluster node-pool list CLUSTER-NAME

The command returns a list of all of the node pools in the cluster and the state of the replicas in each node pool.

Create a Node Pool

To create a node pool in a cluster:

  1. Create a configuration file for the node pool. See Sample Configuration for a sample configuration file.

    For a full list of configuration properties, see Configuration Properties.

  2. Create the node pool defined by the configuration file:

    tanzu cluster node-pool set CLUSTER-NAME -f /PATH/TO/CONFIG-FILE
    

    Options:

    • --namespace specifies the namespace of the cluster. The default value is default.
    • (Legacy clusters) --base-machine-deployment specifies the base MachineDeployment object from which to create the new node pool.
      • Set this value as a MachineDeployment identifier as listed in the output of tanzu cluster get under Details.
      • The default value is the first in the cluster’s array of worker node MachineDeployment objects, represented internally as workerMDs[0].

Sample Configuration

The following provides example configuration for your underlying infrastructure.

vSphere Configuration
In addition to the required name, replicas, and labels properties, configuration files for node pools on vSphere can include a vsphere block to define optional properties specific to configuring VMs on vSphere.

Example node pool definition for vSphere cluster:

    name: tkg-wc-oidc-md-1
    replicas: 4
    labels:
      key1: value1
      key2: value2
    vsphere:
      memoryMiB: 8192
      diskGiB: 64
      numCPUs: 4
      datacenter: dc0
      datastore: iscsi-ds-0
      storagePolicyName: name
      folder: vmFolder
      resourcePool: rp-1
      vcIP: 10.0.0.1
      template: templateName
      cloneMode: clone-mode
      network: network-name

Any values not set in the vsphere block inherit from the values in the cluster’s first node pool.

For the vcIP value, workload cluster node pools must run in the same vCenter as the cluster’s control plane.

For clusters deployed by a vSphere with Tanzu Supervisor, define storageClass, tkr, and vmClass. For more information about these properties, see TanzuKubernetesCluster v1alpha3 API – Annotated in the vSphere with Tanzu documentation.

By default, workload clusters on vSphere are deployed following anti-affinity rules that spread the control plane nodes and worker nodes within each node pool across multiple ESXi hosts. To deactivate or reactivate the anti-affinity rules, see Deactivate Anti-Affinity Rules (vSphere) below.

AWS Configuration
In addition to the required name, replicas, and labels properties, configuration files for node pools on AWS support the following optional properties:
  • az: Availability Zone
  • nodeMachineType: Instance type

These settings may be omitted, in which case their values inherit from the cluster’s first node pool.

Example node pool definition for an AWS cluster:

name: tkg-aws-wc-np-1
replicas: 2
az: us-west-2b
nodeMachineType: t3.large
labels:
  key1: value1
  key2: value2
Note

Workload cluster node pools on AWS must be in the same availability zone as the standalone management cluster.

Azure Configuration
In addition to the required name, replicas, and labels properties above, configuration files for node pools on Microsoft Azure support the following optional properties:
  • az: Availability Zone
  • nodeMachineType: Instance type

If the settings are omitted, their values inherit from the cluster’s first node pool.

Example node pool definition for Azure cluster:

name: tkg-azure-wc-np-1
replicas: 2
az: 2
nodeMachineType: Standard_D2s_v3
labels:
  key1: value1
  key2: value2

Configuration Properties

The following table lists all of the properties that you can define in a node pool configuration file for workload clusters.

Note

You cannot add to or otherwise change an existing node pool’s labels, az, nodeMachineType or vSphere properties. To change these properties, you must create a new node pool in the cluster with the desired properties, migrate workloads to the new node pool, and delete the original.

Name Type Cluster Object Provider Notes
name string Any All Name of the node pool to create or update.
replicas integer Any All Number of nodes in the node pool.
az string Any TKG on AWS or Azure AZ to place the nodes in.
nodeMachineType string Any TKG on AWS or Azure instanceType or vmSize for the node in AWS and Azure respectively.
labels map[string]string Any All Labels to be set on the node using kubeletExtraArgs (‐‐node-labels).
vmClass string Any All Name of a Kubernetes vmClass. Matches the vmClass defined in the TKC cluster. This class sets the CPU and memory available to the node. To list available VM classes, run kubectl describe virtualmachineclasses.
storageClass string Any All Name of a Kubernetes StorageClass to use for the node pool. This class applies to the disks that store the root filesystems of the nodes. To list available storage classes, run kubectl describe storageclasses.
volumes:
  • name, string
  • mountPath, string
  • capacity, corev1.ResourceList
  • storageClass, string
[]object Any All Volumes to use for the nodes.
tkr:
  • reference, string
object TKC-based All Name of the TKR to use for the node pool. For example, v1.27.5---vmware.2-tkg.2.
tkrResolver string Class-based All Required for class-based clusters. Value of the run.tanzu.vmware.com/resolve-os-image annotation from the Cluster resource.
nodeDrainTimeout metav1.Duration Any All Node drain timeout.
vsphere object Any All See [Configuration Properties (vSphere only)](#vsphere-properties).
workerClass string Class-based All Required for class-based clusters. The workerClass from the cluster's ClusterClass that you want the node pool to use.

Configuration Properties (vSphere only)

For information about the VSPHERE_* configuration variables, see vSphere in Configuration File Variable Reference.

Name Type Management Cluster Type Notes
cloneMode string Standalone Same as VSPHERE_CLONE_MODE.
datacenter string Standalone Same as VSPHERE_DATACENTER.
datastore string Standalone Same as VSPHERE_DATASTORE.
storagePolicyName string Standalone Same as VSPHERE_STORAGE_POLICY_NAME.
taints []corev1.Taint Supervisor Taints to apply to the node.
folder string Standalone Same as VSPHERE_FOLDER.
network string Standalone Same as VSPHERE_NETWORK.
nameservers []string Standalone Same as VSPHERE_WORKER_NAMESERVERS.
tkgIPFamily string Standalone Same as TKG_IP_FAMILY.
resourcePool string Standalone Same as VSPHERE_RESOURCE_POOL.
vcIP string Standalone Same as VSPHERE_SERVER.
template string Standalone Same as VSPHERE_TEMPLATE.
memoryMiB integer Standalone Same as VSPHERE_WORKER_MEM_MIB.
diskGiB integer Standalone Same as VSPHERE_WORKER_DISK_GIB.
numCPUs integer Standalone Same as VSPHERE_WORKER_NUM_CPUS.

Assign Workloads to a Node Pool

To assign a workload to a node pool:

  1. In the Kubernetes workload resource or resources that manage your pods, set nodeSelector to the value of labels that you defined in your node pool configuration file. For information about Kubernetes workload resources, see Workloads in the Kubernetes documentation.
  2. Apply your configuration by running the kubectl apply -f command.

To reassign a workload to a different node pool:

  1. In the Kubernetes workload resource or resources that manage your pods, update the value of nodeSelector to the new value.
  2. Apply your configuration update by running the kubectl apply -f command.

Update Node Pools

If you only need to change the number of nodes in a node pool, use the Tanzu CLI command in Scale Nodes Only. If you want to add labels, follow the procedure in Add Labels and Scale Nodes.

Caution

With these procedures, do not change existing labels, the availability zone, node instance type (on AWS or Azure), or virtual machine properties (on vSphere) of the node pool. This can have severe negative impacts on running workloads. To change these properties, create a new node pool with these properties and reassign workloads to the new node pool before deleting the original. For instructions, see Assign Workloads to a Node Pool above.

Scale Nodes Only

To change the number of nodes in a node pool, run:

tanzu cluster scale CLUSTER-NAME -p NODE-POOL-NAME -w NODE-COUNT

Where:

  • CLUSTER-NAME is the name of the workload cluster.
  • NODE-POOL-NAME is the name of the node pool.
  • NODE-COUNT is the number of nodes, as an integer, that belong in this node pool.

Add Labels and Scale Nodes

You can add labels to a node pool and scale its nodes at the same time through the node pool configuration file.

  1. Open the configuration file for the node pool you want to update.

  2. If you are increasing or decreasing the number of nodes in this node pool, update the number after replicas.

  3. If you are adding labels, indent them below labels. For example:

    labels:
      key1: value1
      key2: value2
    
  4. Save the node pool configuration file.

  5. In a terminal, run:

    tanzu cluster node-pool set CLUSTER-NAME -f /PATH/TO/CONFIG-FILE
    

    If the CLUSTER-NAME in the command and name in the configuration file match a node pool in the cluster, this command updates the existing node pool instead of creating a new one.

Delete Node Pools

To delete a node pool run:

tanzu cluster node-pool delete CLUSTER-NAME -n NODE-POOL-NAME

Where CLUSTER-NAME is the name of the workload cluster and NODE-POOL-NAME is the name of the node pool.

Optionally, use --namespace to specify the namespace of the cluster. The default value is default.

Caution

Reassign any workloads on these nodes to other node pools before performing this operation. tanzu cluster node-pool delete does not migrate workloads off of nodes before deleting them. For instructions, see Assign Workloads to a Node Pool above.

Deactivate Anti-Affinity Rules (vSphere)

By default, workload clusters on vSphere are deployed following anti-affinity rules that spread the control plane nodes and worker nodes within each node pool across multiple ESXi hosts.

Note

TKG applies anti-affinity rules in the following way:

  • TKG anti-affinity is not visible in the vSphere Client
  • TKG anti-affinity is soft rather than hard anti-affinity
  • TKG anti-affinity does not work between machine-deployment node VMs, so all nodes run on the same ESXi host.

Do the following to deactivate or reactivate the anti-affinity rules during cluster creation:

  1. Set the kubectl context to the management cluster:

    kubectl config use-context MGMT-CLUSTER-NAME-admin@MGMT-CLUSTER-NAME
    

    Where MGMT-CLUSTER-NAME is the name of the cluster.

  2. Run kubectl get deployment on the CAPV controller to collect the args values for its manager container, for example:

    kubectl get deployment -n capv-system capv-controller-manager -o=json | jq '.spec.template.spec.containers[] | select(.name=="manager") | .args'
    [
      "--leader-elect",
      "--logtostderr",
      "--v=4",
      "--feature-gates=NodeAntiAffinity=true,NodeLabeling=true"
    ]
    
  3. With the output copied from the previous step, change the --feature-gates values and pass the arguments list to a kubectl patch command that revises their values in the object. For example, to set the NodeAntiAffinity and NodeLabeling feature gates to false, which deactivates the node anti-affinity rules:

    kubectl patch deployment -n capv-system capv-controller-manager --type=json -p '[{"op":"replace", "path":"/spec/template/spec/containers/0/args", "value": [
    "--leader-elect",
    "--logtostderr",
    "--v=4",
    "--feature-gates=NodeAntiAffinity=false,NodeLabeling=false"
    ]}]'
    
check-circle-line exclamation-circle-line close-line
Scroll to top icon