VMware Tanzu Kubernetes Grid v2.1 Release Notes

Except where noted, these release notes apply to all v2.1.x patch versions of Tanzu Kubernetes Grid (TKG).

TKG v2.1 is distributed as a downloadable Tanzu CLI package that deploys a versioned TKG standalone management cluster. TKG v2.1 is the first version of TKG that supports creating and managing class-based workload clusters with a standalone management cluster that can run on multiple infrastructures, including vSphere, AWS, and Azure.

Tanzu Kubernetes Grid v2.x, and vSphere with Tanzu Supervisor in vSphere 8

Important

The vSphere with Tanzu Supervisor in vSphere 8.0.1c and later runs TKG v2.2. Earlier versions of vSphere 8 run TKG v2.0, which was not released independently of Supervisor. Standalone management clusters that run TKG 2.x are available from TKG 2.1 onwards. Later TKG releases will be embedded in Supervisor in future vSphere update releases. Consequently, the version of TKG that is embedded in the latest vSphere with Tanzu version at a given time might not be the same as the standalone version of TKG that you are using. However, the versions of the Tanzu CLI that are compatible with all TKG v2.x releases are fully supported for use with Supervisor in all releases of vSphere 8.

Tanzu Kubernetes Grid v2.1 and vSphere with Tanzu in vSphere 7

Caution

The versions of the Tanzu CLI that are compatible with TKG 2.x and with the vSphere with Tanzu Supervisor in vSphere 8 are not compatible with the Supervisor Cluster in vSphere 7. To use the Tanzu CLI with a vSphere with Tanzu Supervisor Cluster on vSphere 7, use the Tanzu CLI version from TKG v1.6. To use the versions of the Tanzu CLI that are compatible with TKG 2.x with Supervisor, upgrade to vSphere 8. You can deploy a standalone TKG 2.x management cluster to vSphere 7 if a vSphere with Tanzu Supervisor Cluster is not present. For information about compatibility between the Tanzu CLI and VMware products, see the Tanzu CLI Documentation.

What’s New

Tanzu Kubernetes Grid v2.1.x includes the following new features.

Tanzu Kubernetes Grid v2.1.1

New features in Tanzu Kubernetes Grid v2.1.1:

  • Supports using NSX Advanced Load Balancer v22.1.2 or later on vSphere 8 with a TKG standalone management cluster and its workload clusters.
  • You can install the FIPS version of TKG v2.1.1. For more information, see FIPS-Enabled Versions in VMware Tanzu Compliance.
  • Configuration variables:
    • Machine health checks: MHC_MAX_UNHEALTHY_CONTROL_PLANE and MHC_MAX_UNHEALTHY_WORKER_NODE. For more information, see Machine Health Checks in Configuration File Variable Reference.
    • Support for tdnf server with custom cert: CUSTOM_TDNF_REPOSITORY_CERTIFICATE (Technical Preview). For more information, see Node Configuration in Configuration File Variable Reference.
    • Support for node level proxy settings: TKG_NODE_SYSTEM_WIDE_PROXY (Technical Preview). For more information, see Proxy Configuration in Configuration File Variable Reference.

Tanzu Kubernetes Grid v2.1.0

New features in Tanzu Kubernetes Grid v2.1.0:

  • TKG 2.x support: On vSphere, AWS, or Azure with a standalone management cluster, configure and create class-based clusters as described in Workload Cluster Types.
  • You can install the FIPS version of TKG v2.1.0. For more information, see FIPS-Enabled Versions in VMware Tanzu Compliance.
  • Tanzu CLI:
    • package plugin uses kctrl-style commands by default. See tanzu package in the Tanzu CLI Command Reference.
    • isolated-cluster plugin download-bundle and upload-bundle commands retrieve and transfer all container images needed by TKG, as described in Prepare an Internet-Restricted Environment.
    • -A and --all-namespaces options for tanzu cluster list includes clusters in all namespaces that are managed by the management cluster and for which the user has View permissions or greater, not just the default namespace.
    • context command group lets users set and manage contexts for the Tanzu CLI, which include the server to target and the kubeconfig to apply. See tanzu context in the Tanzu CLI Command Reference.
      • Future versions will deprecate the tanzu login command in favor of tanzu context commands.
    • Target category for plugins changes CLI behavior and adds functionality reserved for future use, as described in Behavior Changes in Tanzu Kubernetes Grid v2.1, below.
    • auto-apply-generated-clusterclass-based-configuration feature auto-applies the class-based cluster configuration generated by the Tanzu CLI when you pass a legacy cluster configuration file to tanzu cluster create. The feature is set to false by default. See Features in Tanzu CLI Architecture and Configuration.
    • allow-legacy-cluster feature allows you to create plan-based clusters. The feature is set to false by default. See Features in Tanzu CLI Architecture and Configuration.
    • tanzu mc credentials update and tanzu cluster credentials update commands add options for Azure. This includes --azure-client-id, --azure-client-secret, and --azure-tenant-id.
  • The following cluster configuration variables are supported for class-based clusters and standalone management clusters as described in Configuration File Variable Reference:
    • Node configuration: CONTROL_PLANE_NODE_LABELS, CONTROL_PLANE_NODE_NAMESERVERS, CONTROL_PLANE_NODE_SEARCH_DOMAINS, WORKER_NODE_NAMESERVERS, WORKER_NODE_SEARCH_DOMAINS
    • ExtraArgs property: APISERVER_EXTRA_ARGS, CONTROLPLANE_KUBELET_EXTRA_ARGS, ETCD_EXTRA_ARGS, KUBE_CONTROLLER_MANAGER_EXTRA_ARGS, KUBE_SCHEDULER_EXTRA_ARGS, WORKER_KUBELET_EXTRA_ARGS
    • Rate limiting and synchronization: NTP_SERVERS, APISERVER_EVENT_RATE_LIMIT_CONF_BASE64
  • Clusters can automatically renew control plane node VM certificates; see Control Plane Node Certificate Auto-Renewal.
  • (vSphere) You can deploy multi-OS workload clusters that run both Windows- and Linux-based worker nodes, as described in Deploy a Multi-OS Workload Cluster. In this release, Windows workload clusters are replaced by the multi-OS workload clusters. For more information, see Behavior Changes in Tanzu Kubernetes Grid v2.1.
  • (vSphere) Class-based workload clusters can be configured with in-cluster IP Address Management (IPAM) over an allocated IP pool, eliminating the need to configure DHCP reservations when node counts or instances change.
  • (vSphere) Cluster node Machine object labels identify the address of their ESXi host, to support using nodeSelector to run specific workloads on specialized hardware.
  • (vSphere) Ubuntu OVA images use the Unified Extensible Firmware Interface (UEFI) mode for booting, replacing traditional BIOS firmware mode. UEFI mode enables Graphic Processing Unit (GPU) workloads and enhances node security. For more information about UEFI on Ubuntu, see UEFI in Ubuntu documentation.
  • (Azure) Azure Disk CSI driver is automatically installed on newly-created workload clusters.
  • You can use Kube-VIP as an L4 LoadBalancer service for workloads; see Kube-VIP Load Balancer (Technical Preview).
  • You can deploy single-node workload clusters that run both hosted workloads and control plane infrastructure on a single ESXi host, for edge applications as described in Single-Node Clusters on vSphere (Technical Preview).
    • You can deploy minimal single-node clusters based on tiny TKrs that minimize their footprint.
  • You can back up and deploy cluster infrastructure as described in Back Up and Restore Management and Workload Cluster Infrastructure (Technical Preview).
  • Supports Pod Security Admission (PSA) controllers to replace Pod Security Policies as described in namespaces as described inPod Security Admission Controller (Technical Preview).

Supported Kubernetes Versions in Tanzu Kubernetes Grid v2.1

Each version of Tanzu Kubernetes Grid adds support for the Kubernetes version of its management cluster, plus additional Kubernetes versions, distributed as Tanzu Kubernetes releases (TKrs).

Any version of Tanzu Kubernetes Grid supports all TKr versions from the previous two minor lines of Kubernetes, except where noted as a Known Issue. For example, TKG v2.1.x supports the Kubernetes versions v1.23.x and v1.22.x listed below, but not v1.21.x or prior.

Tanzu Kubernetes Grid Version Management Cluster Kubernetes Version Provided Kubernetes (TKr) Versions
2.1.1 1.24.10 1.24.10, 1.23.16, 1.22.17
2.1.0 1.24.9 1.24.9, 1.23.15, 1.22.17
1.6.1 1.23.10 1.23.10, 1.22.13, 1.21.14
1.6.0 1.23.8 1.23.8, 1.22.11, 1.21.14
1.5.4 1.22.9 1.22.9, 1.21.11, 1.20.15
1.5.3 1.22.8 1.22.8, 1.21.11, 1.20.15
1.5.2, 1.5.1, 1.5.0 1.22.5 1.22.5, 1.21.8, 1.20.14

Product Snapshot for Tanzu Kubernetes Grid v2.1

Tanzu Kubernetes Grid v2.1 supports the following infrastructure platforms and operating systems (OSs), as well as cluster creation and management, networking, storage, authentication, backup and migration, and observability components. The component versions listed in parentheses are included in Tanzu Kubernetes Grid v2.1.1. For more information, see Component Versions.

vSphere AWS Azure
Infrastructure platform
  • vSphere 6.7U3
  • vSphere 7
  • vSphere 8
  • VMware Cloud on AWS**
  • Azure VMware Solution
  • Oracle Cloud VMware Solution (OCVS)
  • Google Cloud VMware Engine (GCVE)
Native AWS Native Azure
CLI, API, and package infrastructure Tanzu Framework v0.28.1
Cluster creation and management Core Cluster API (v1.2.8), Cluster API Provider vSphere (v1.5.3) Core Cluster API (v1.2.8), Cluster API Provider AWS (v2.0.2) Core Cluster API (v1.2.8), Cluster API Provider Azure (v1.6.3)
Kubernetes node OS distributed with TKG Photon OS 3, Ubuntu 20.04 Amazon Linux 2, Ubuntu 20.04 Ubuntu 18.04, Ubuntu 20.04
Build your own image Photon OS 3, Red Hat Enterprise Linux 7*** and 8, Ubuntu 18.04, Ubuntu 20.04, Windows 2019 Amazon Linux 2, Ubuntu 18.04, Ubuntu 20.04 Ubuntu 18.04, Ubuntu 20.04
Container runtime Containerd (v1.6.6)
Container networking Antrea (v1.7.2), Calico (v3.24.1)
Container registry Harbor (v2.6.3)
Ingress NSX Advanced Load Balancer Essentials and Avi Controller **** (v21.1.3- v21.1.6, v22.1.1, v22.1.2), Contour (v1.22.3) Contour (v1.22.3) Contour (v1.22.3)
Storage vSphere Container Storage Interface (v2.5.2*) and vSphere Cloud Native Storage Amazon EBS CSI driver (v1.8.0) and in-tree cloud providers Azure Disk CSI driver (v1.19.0), Azure File CSI driver (v1.21.0), and in-tree cloud providers
Authentication OIDC via Pinniped (v0.12.1), LDAP via Pinniped (v0.12.1) and Dex
Observability Fluent Bit (v1.9.5), Prometheus (v2.37.0), Grafana (v7.5.17)
Backup and migration Velero (v1.9.5)

* Version of vsphere_csi_driver. For a full list of vSphere Container Storage Interface components included in the Tanzu Kubernetes Grid v1.6 release, see Component Versions.

** For a list of VMware Cloud on AWS SDDC versions that are compatible with this release, see the VMware Product Interoperability Matrix.

*** Tanzu Kubernetes Grid v1.6 is the last release that supports building Red Hat Enterprise Linux 7 images.

**** On vSphere 8, to use NSX Advanced Load Balancer with a TKG standalone management cluster and its workload clusters, you need NSX ALB v22.1.2 or later and TKG v2.1.1 or later.

For a full list of Kubernetes versions that ship with Tanzu Kubernetes Grid v2.1, see Supported Kubernetes Versions in Tanzu Kubernetes Grid v2.1 above.

Component Versions

The Tanzu Kubernetes Grid v2.1.x releases include the following software component versions:

Component TKG v2.1.1 TKG v2.1.0
aad-pod-identity v1.8.13+vmware.2* v1.8.13+vmware.1*
addons-manager v2.1+vmware.1-tkg.3 v2.1+vmware.1-tkg.3
ako-operator v1.7.0+vmware.3 v1.7.0+vmware.3*
alertmanager v0.24.0+vmware.2* v0.24.0+vmware.1
antrea v1.7.2+vmware.1-advanced v1.7.2+vmware.1-advanced*
aws-ebs-csi-driver v1.8.0+vmware.2 v1.8.0+vmware.2*
azuredisk-csi-driver v1.19.0+vmware.1 v1.19.0+vmware.1*
azurefile-csi-driver* v1.21.0+vmware.1 v1.21.0+vmware.1
calico_all v3.24.1+vmware.1 v3.24.1+vmware.1*
capabilities-package v0.28.1-dev-capabilities* v0.28.0-dev-capabilities*
carvel-secretgen-controller v0.11.2+vmware.1 v0.11.2+vmware.1*
cloud-provider-azure v1.1.26+vmware.1,
v1.23.23+vmware.1,
v1.24.10+vmware.1
v1.1.26+vmware.1*,
v1.23.23+vmware.1*,
v1.24.9+vmware.1*
cloud_provider_vsphere v1.24.3+vmware.1 v1.24.3+vmware.1*
cluster-api-provider-azure v1.6.3_vmware.1* v1.6.1_vmware.1*
cluster_api v1.2.8+vmware.1 v1.2.8+vmware.1*
cluster_api_aws v2.0.2+vmware.1 v2.0.2+vmware.1*
cluster_api_vsphere v1.5.3+vmware.1l* v1.5.1+vmware.1l*
cni_plugins v1.1.1+vmware.18* v1.1.1+vmware.16*
configmap-reload v0.7.1+vmware.2* v0.7.1+vmware.1
containerd v1.6.6+vmware.3* v1.6.6+vmware.1*
contour v1.22.3+vmware.1 v1.22.3+vmware.1*
coredns v1.8.6+vmware.17* v1.8.6+vmware.15*
crash-diagnostics v0.3.7+vmware.6 v0.3.7+vmware.6*
cri_tools v1.23.0+vmware.8* v1.23.0+vmware.7*
csi_attacher v3.5.0+vmware.1,
v3.4.0+vmware.1,
v3.3.0+vmware.1
v3.5.0+vmware.1*,
v3.4.0+vmware.1,
v3.3.0+vmware.1
csi_livenessprobe v2.7.0+vmware.1,
v2.6.0+vmware.1,
v2.5.0+vmware.1,
v2.4.0+vmware.1
v2.7.0+vmware.1*,
v2.6.0+vmware.1,
v2.5.0+vmware.1,
v2.4.0+vmware.1
csi_node_driver_registrar v2.5.1+vmware.1,
v2.5.0+vmware.1,
v2.3.0+vmware.1
v2.5.1+vmware.1,
v2.5.0+vmware.1,
v2.3.0+vmware.1
csi_provisioner v3.2.1+vmware.1,
v3.1.0+vmware.2,
v3.0.0+vmware.1
v3.2.1+vmware.1*,
v3.1.0+vmware.2,
v3.0.0+vmware.1
dex v2.35.3+vmware.2 v2.35.3+vmware.2*
envoy v1.23.3+vmware.2 v1.23.3+vmware.2*
external-dns v0.12.2+vmware.4 v0.12.2+vmware.4*
external-snapshotter v6.0.1+vmware.1,
v5.0.1+vmware.1
v6.0.1+vmware.1,
v5.0.1+vmware.1
etcd v3.5.6+vmware.6* v3.5.6+vmware.3*
fluent-bit v1.9.5+vmware.1 v1.9.5+vmware.1*
gangway v3.2.0+vmware.2 v3.2.0+vmware.2
grafana v7.5.17+vmware.1* v7.5.16+vmware.1
guest-cluster-auth-service v1.2.0* v1.1.0*
harbor v2.6.3+vmware.1 v2.6.3+vmware.1*
image-builder v0.1.13+vmware.2 v0.1.13+vmware.2*
image-builder-resource-bundle v1.24.10+vmware.1-tkg.1* v1.24.9+vmware.1-tkg.1*
imgpkg v0.31.1+vmware.1 v0.31.1+vmware.1*
jetstack_cert-manager v1.10.1+vmware.1 v1.10.1+vmware.1*
k8s-sidecar v1.15.6+vmware.3*,
v1.12.1+vmware.5*
v1.15.6+vmware.2,
v1.12.1+vmware.3*
k14s_kapp v0.53.2+vmware.1 v0.53.2+vmware.1*
k14s_ytt v0.43.1+vmware.1 v0.43.1+vmware.1*
kapp-controller v0.41.5+vmware.1,
v0.38.5+vmware.2
v0.41.5+vmware.1*,
v0.38.5+vmware.2*
kbld v0.35.1+vmware.1 v0.35.1+vmware.1*
kube-state-metrics v2.6.0+vmware.2* v2.6.0+vmware.1*
kube-vip v0.5.7+vmware.1 v0.5.7+vmware.1*
kube-vip-cloud-provider* v0.0.4+vmware.2 v0.0.4+vmware.2
kube_rbac_proxy v0.11.0+vmware.2 v0.11.0+vmware.2
kubernetes v1.24.10+vmware.1* v1.24.9+vmware.1*
kubernetes-csi_external-resizer v1.4.0+vmware.1,
v1.3.0+vmware.1
v1.4.0+vmware.1*,
v1.3.0+vmware.1
kubernetes-sigs_kind v1.24.10+vmware.1-tkg.1_v0.17.0* v1.24.9+vmware.1-tkg.1_v0.17.0*
kubernetes_autoscaler v1.24.0+vmware.1 v1.24.0+vmware.1*
load-balancer-and-ingress-service (AKO) v1.8.2+vmware.1 v1.8.2+vmware.1*
metrics-server v0.6.2+vmware.1 v0.6.2+vmware.1*
multus-cni v3.8.0+vmware.2 v3.8.0+vmware.2*
pinniped v0.12.1+vmware.1-tkg.1 v0.12.1+vmware.1-tkg.1
pinniped-post-deploy v0.12.1+vmware.2-tkg.3 v0.12.1+vmware.2-tkg.3*
prometheus v2.37.0+vmware.2* v2.37.0+vmware.1*
prometheus_node_exporter v1.4.0+vmware.2* v1.4.0+vmware.1*
pushgateway v1.4.3+vmware.2* v1.4.3+vmware.1
sonobuoy v0.56.13+vmware.1 v0.56.13+vmware.1*
standalone-plugins-package v0.28.1-dev-standalone-plugins* v0.28.1-dev-standalone-plugins*
tanzu-framework v0.28.1* v0.28.0*
tanzu-framework-addons v0.28.1* v0.28.0*
tanzu-framework-management-packages v0.28.1-tf* v0.28.0-tf*
tkg-bom v2.1.1* v2.1.0*
tkg-core-packages v1.24.10+vmware.1-tkg.1* v1.24.9+vmware.1-tkg.1*
tkg-standard-packages v2.1.1* v2.1.0*
tkg-storageclass-package v0.28.1-tkg-storageclass* v0.28.0-tkg-storageclass*
tkg_telemetry v2.1.1+vmware.1* v2.1.0+vmware.1*
velero v1.9.5+vmware.1 v1.9.5+vmware.1*
velero-mgmt-cluster-plugin* v0.1.0+vmware.1 v0.1.0+vmware.1
velero-plugin-for-aws v1.5.3+vmware.1 v1.5.3+vmware.1*
velero-plugin-for-csi v0.3.3+vmware.1 v0.3.3+vmware.1*
velero-plugin-for-microsoft-azure v1.5.3+vmware.1 v1.5.3+vmware.1*
velero-plugin-for-vsphere v1.4.2+vmware.1 v1.4.2+vmware.1*
vendir v0.30.1+vmware.1 v0.30.1+vmware.1*
vsphere_csi_driver v2.6.2+vmware.2 v2.6.2+vmware.2*
whereabouts v0.5.4+vmware.1 v0.5.4+vmware.1*

* Indicates a new component or version bump since the previous release. TKG v2.1.0 is previous to v2.1.1, and v1.6.1 is previous to v2.1.0.

For a complete list of software component versions that ship with TKG v2.1, use imgpkg to pull the repository bundle and then list its contents. For TKG v2.1.1, for example:

imgpkg pull -b projects.registry.vmware.com/tkg/packages/standard/repo:v2.1.1 -o standard-2.1.1
cd standard-2.1.1/packages
tree

Local BOM files such as the following also list package versions, but may not be current:

  • ~/.config/tanzu/tkg/bom/tkg-bom-v2.1.yaml
  • ~/.config/tanzu/tkg/bom/tkr-bom-v1.24.10+vmware.1-tkg.1.yaml

Supported Upgrade Paths

In the TKG upgrade path, v2.1 immediately follows v1.6. TKG v2.0 is not a downloadable version of TKG: it is the version of TKG that is embedded in the vSphere with Tanzu Supervisor in vSphere 8.

You can only upgrade to Tanzu Kubernetes Grid v2.1.x from v1.6.x. If you want to upgrade to Tanzu Kubernetes Grid v2.1.x from a version earlier than v1.6.x, you must upgrade to v1.6.x first.

When upgrading Kubernetes versions on workload clusters, you cannot skip minor versions. For example, you cannot upgrade a Tanzu Kubernetes cluster directly from v1.21.x to v1.23.x. You must upgrade a v1.21.x cluster to v1.22.x before upgrading the cluster to v1.23.x.

Release Dates

Tanzu Kubernetes Grid v2.1 release dates are:

  • v2.1.0: January 29, 2023
  • v2.1.1: March 21, 2023

Behavior Changes in Tanzu Kubernetes Grid v2.1

Tanzu Kubernetes Grid v2.1 introduces the following new behaviors compared with v1.6.1, which is the latest previous release.

  • The --include-management-cluster option to tanzu cluster list requires the -A option to list a standalone management cluster. With the -A option, the command lists clusters in all namespaces.
  • The Tanzu CLI package plugin uses kctrl-style commands by default. See tanzu package with kctrl in the Tanzu CLI Command Reference.

    • In TKG v1.6, the package plugin ran by default with kctrl mode deactivated, called legacy mode below.
    • kctrl mode and legacy mode commands differ as follows:

      • To create a default values file for package configuration, kctrl-style tanzu package available get commands use the flag--generate-default-values-file instead of --default-values-file-output.
      • The --create-namespace flag is removed. If you use -n or --namespace to specify a target namespace, the namespace must already exist.
      • The --create flag is removed for package repository update.
      • The --package-name flag is renamed to --package for package installed create and package install.
      • The --install flag is removed for package installed update.
      • The --verbose global flag is removed.
      • The --poll-interval and -poll-timeout flags are renamed to --wait-interval and --wait-timeout.
      • In package available get output, an additional table lists available versions for the package.
      • In package available list output, the LATEST-VERSION column is removed and SHORT-DESCRIPTION is not displayed by default; use the --wide flag to display it.
      • In package repository list output, REPOSITORY and TAG columns are replaced by a SOURCE column that consists of source type (e.g. imgpkg), repository URL, and tag.
    • See the topics under CLI-Managed Packages in the TKG v1.6 documentation for how the tanzu package plugin works with kctrl mode deactivated.

  • The tanzu-standard package repository is not pre-installed on class-based clusters. To add the package repository, see Add a Package Repository.
  • The Tanzu CLI management cluster creation process no longer supports creating a new VPC. The installer interface does not include an option to create a new VPC, and cluster configuration files no longer support the AWS_* options for creating a new VPC for a specified CIDR. If you want to use a new VPC, before deploying a standalone management cluster on AWS, you must create a VPC for your TKG deployment using the AWS Console.
  • The Tanzu CLI uses a new Targets abstraction to associate different command groups with the type of server that the commands apply to. The tanzu context list command refers to the same concept as context type, with the --target flag. Because command groups are based on CLI plugins:

    • Plugins that define commands for Kubernetes clusters have the Target k8s
    • Plugins that define commands for TMC commands have Target tmc reserved for future use
    • Plugins that define context-independent commands have no Target
    • Identically-named plugins for different targets let the Tanzu CLI tailor commands under command groups such as tanzu cluster to fit the context.

    In TKG v2.1, the only supported Target or context type is k8s, which is also indicated by:

    • Kubernetes cluster operations in the output of tanzu help commands
    • kubernetes in the TARGET column in the output of tanzu plugin list
  • VMware does not recommend deploying Windows workload clusters that have only Windows-based workers as described in the TKG v1.6 documentation. Instead, VMware recommends creating multi-OS clusters as described in Deploy a Multi-OS Workload Cluster. Multi-OS clusters can run both Windows and Linux containers, thus supporting both Linux-based TKG components and Linux workloads.
  • Because of certificate-handling changes in Go v1.18, MacOS bootstrap machines need the vCenter certificate added to their keychains before they can run tanzu cluster create with thumbprint verification; see Prerequisites for Cluster Deployment.

User Documentation

A new publication, Deploying and Managing TKG 2.1 Standalone Management Clusters, includes topics specific to standalone management clusters that are not relevant to using TKG with a vSphere with Tanzu Supervisor.

For more information, see Find the Right TKG Docs for Your Deployment on the VMware Tanzu Kubernetes Grid Documentation page.

Resolved Issues

Resolved in v2.1.1

The following issues that were documented as Known Issues in Tanzu Kubernetes Grid v2.1.0 are resolved in Tanzu Kubernetes Grid v2.1.1.

  • Cannot deploy a custom CNI

    The CNI:none option does not work on workload clusters deployed from a standalone management cluster. The only available choices are antrea (default) and calico.

  • TKG user account creates idle vCenter sessions

    The vSphere account for TKG creates idle vCenter sessions, as listed in vSphere > Hosts and Clusters inventory > your vCenter > Monitor tab > Sessions.

    Workaround: Remove idle vCenter sessions by starting and stopping all sessions:

    1. ssh in to vCenter as root
    2. If prompted, type shell
    3. Run service-control --stop --all
    4. Wait for services to show as Stopped
    5. Run service-control --start --all
  • LoadBalancer services for class-based workload clusters on Azure need manual gateway or frontend configuration

    Because of a name mismatch between AzureClusterName and ClusterName, services of type LoadBalancer that are deployed for use by apps on class-based Azure workload clusters are not accessible from the internet.

    Workaround: Provide your own route for the load balancer service, for example via a NAT gateway, proxy or other internal routing, to allow nodes behind the load balancer to access the internet.

    VMware recommends using a NAT gateway, if available, for outbound connectivity. If a NAT gateway is not available:

    1. From the Azure portal, navigate to the LoadBalancer resource that is created by CAPZ, which should have the same name as that of the AzureCluster.
    2. Select Frontend IP configuration, click Add.
    3. Create a new public IP address for the load balancer service.
    4. Configure and create the service using the spec below, setting the loadBalancerIP value to the public IP address where indicated by IP-ADDRESS:

        apiVersion: v1
        kind: Service
        metadata:
          name: frontend
          labels:
            app: guestbook
            tier: frontend
          namespace: sample-app
        spec:
          # Add the frontend public IP here
          loadBalancerIP: IP-ADDRESS
          type: LoadBalancer
          ports:
          - port: 80
          selector:
            app: guestbook
            tier: frontend
      
  • Upgrading clusters does not update Kube-VIP version

    Upgrading standalone management and workload clusters to v2.1 does not upgrade their kube-vip to the current version.

    Workaround: For upgraded clusters that use Kube-VIP for their control plane endpoint, as configured with AVI_CONTROL_PLANE_HA_PROVIDER = false, update the kube-vip component:

    1. Retrieve the current TKr BoM file used for the cluster upgrade. Find a local copy of this file in ~/.config/tanzu/tkg/bom/ with a filename starting with tkr-. For example, tkr-bom-v1.24.10+vmware.1-tkg.1.yaml.

    2. List the current kube-vip version from the BoM file, for example:

      $ cat ~/.config/tanzu/tkg/bom/tkr-bom-v1.24.10+vmware.1-tkg.1.yaml | yq '.components.kube-vip'
      - version: v0.5.7+vmware.1
        images:
          kubeVipImage:
            imagePath: kube-vip
            tag: v0.5.7_vmware.1
      
    3. Get the kcp object for the cluster. The name of this object has the form CLUSTER-NAME-control-plane.

      • Management cluster objects are created in the tkg-system namespace.
      • Workload cluster objects are in the namespace used for cluster creation or default if NAMESPACE was not set.
    4. Run kubectl edit to edit the kcp object and update the path of the kube-vip to match the current version from the BoM image. Find the location of this setting by running:

      kubectl get kcp <cluster-name>-control-plane -o jsonpath='{.spec.kubeadmConfigSpec.files[0]}' | jq
      
  • Upgrading management clusters from v1.5.x to v2.1.0 causes node network error due to null avi_ingress_node_network_list in AKO operator secret

    With standalone management clusters originally created in TKG v1.5 or earlier, upgrading to v2.1.0 sets a null value for avi_ingress_node_network_list in the AKO operator secret. This causes a node network error when upgrading to v2.1.0 and generates missing Avi configuration errors in the logs.

    Workaround: After upgrading Tanzu CLI to v2.1.0 but before running tanzu mc upgrade:

    1. Switch to the management cluster context:

      kubectl config use-context <MGMT-CLUSTER>-admin@<MGMT-CLUSTER>
      
    2. Retrieve the AKO operator secret and decode its data values:

      kubectl get secret <MGMT-CLUSTER>-ako-operator-addon -n tkg-system -o jsonpath="{.data.values\.yaml}" | base64 --decode > values.yaml
      
    3. Open the values.yaml file with a text editor. The avi_ingress_node_network_list setting will look like this:

      avi_ingress_node_network_list: '""'
      
    4. Change the setting to like this, with the range of your cluster node network:

      avi_ingress_node_network_list: '[{"networkName":"VM Network", "cidrs":["10.191.176.0/20"]}]'
      
    5. base64-encode the new data values and record the output string:

      base64 -w 0 values.yaml
      
    6. Edit the AKO operator secret:

      kubectl edit secret MGMT-CLUSTER-ako-operator-addon -n tkg-system
      
    7. Paste in the new, encoded data values string as the value of values.yaml in the secret. Save and exit.

  • TMC cannot deploy class-based clusters with service engines not in Default-Group SEG.

    Tanzu Mission Control, which integrates with TKG, cannot deploy new class-based clusters that use NSX ALB and are configured with service engines not in the Default-Group Service Engine Group in NSX ALB. This limitation does not affect upgrading existing workload clusters configured with custom service engines.

    For more information, see the Tanzu Mission Control Release Notes.

  • TMC Catalog not supported for listing and deploying packages

    You cannot use the Catalog feature of Tanzu Mission Control (TMC) to list or install packages to TKG v2.1 workload clusters as described in View Packages in the TMC documentation. The TMC UI will show the package repository stuck in a reconciling state.

Resolved in v2.1.0

The following issues that were documented as Known Issues in Tanzu Kubernetes Grid v1.6.1 are resolved in Tanzu Kubernetes Grid v2.1.0.

  • Cluster and pod operations that delete pods may fail if DaemonSet configured to auto-restore persistent volumes

    In installations where a DaemonSet uses persistent volumes (PVs), machine deletion may fail because the drain by default process ignores DaemonSets and the system waits indefinitely for the volumes to be detached from the node. Affected cluster operations include upgrade, scale down, and delete.

  • On vSphere with Tanzu, tanzu cluster list generates error for DevOps users

    When a user with the DevOps engineer role, as described in vSphere with Tanzu User Roles and Workflows, runs tanzu cluster list, they may see an error resembling Error: unable to retrieve combined cluster info: unable to get list of clusters. User cannot list resource "clusters" at the cluster scope.

    This happens because the tanzu cluster command without a -n option attempts to access all namespaces, some of which may not be accessible to a DevOps engineer user.

    Workaround: When running tanzu cluster list, include a --namespace value to specify a namespace that the user can access.

Known Issues

The following are known issues in Tanzu Kubernetes Grid v2.1.x. Any known issues that were present in v2.1.1 that have been resolved in a subsequent v2.1.x patch release are listed under the Resolved Issues for the patch release in which they were fixed.

Upgrade

Known in v2.1.1

The following are known upgrade issues in v2.1.1.

  • Upgrading from v2.1 to v2.1.1 on vSphere fails

    On vSphere, upgrading from v2.1 to v2.1.1 fails with the error Reconcile failed:Error. The error is seen because the tkg-clusterclass-vsphere package does not reconcile and the installation is blocked.

    Workaround: Unset the following vSphere resource variables if they are set in the local environment:

    unset VSPHERE_CLONE_MODE
    unset VSPHERE_DATACENTER
    unset VSPHERE_DATASTORE
    unset VSPHERE_FOLDER
    unset VSPHERE_NETWORK
    unset VSPHERE_RESOURCE_POOL
    unset VSPHERE_SERVER
    unset VSPHERE_STORAGE_POLICY_ID
    unset VSPHERE_TEMPLATE
    unset VSPHERE_WORKER_DISK_GIB
    unset VSPHERE_WORKER_MEM_MIB
    unset VSPHERE_WORKER_NUM_CPUS
    

Known in v2.1.x

The following are known upgrade issues in v2.1.x.

  • You cannot upgrade multi-OS clusters

    You cannot use the tanzu cluster upgrade command to upgrade clusters with Windows worker nodes as described in Deploy a Multi-OS Workload Cluster.

  • Upgrading clusters on Azure fails

    On Azure, upgrading management clusters and workload clusters fails with errors such as context deadline exceeded or unable to upgrade management cluster: error waiting for kubernetes version update for kubeadm control plane. This happens because operations on Azure sometimes take longer than on other platforms.

    Workaround: Run the tanzu management-cluster upgrade or tanzu cluster upgrade again, specifying a longer timeout in the --timeout flag. The default timeout is 30m0s.

  • Upgrade fails for standalone management clusters originally created in TKG v1.3 or earlier

    In TKG v2.1, the components that turn a generic cluster into a TKG standalone management cluster are packaged in a Carvel package tkg-pkg. Standalone management clusters that were originally created in TKG v1.3 or earlier lack a configuration secret that the upgrade process requires in order to install tkg-pkg, causing upgrade to fail.

    Workaround: Perform the additional steps listed in Upgrade Standalone Management Clusters for standalone management clusters created in TKG v1.3 or earlier.

  • Upgrade fails for clusters created with the wildcard character (*) in TKG_NO_PROXY setting

    TKG v1.6 does not allow the wildcard character (*) in cluster configuration file settings for TKG_NO_PROXY. Clusters created by previous TKG versions with this setting require special handling before upgrading, in order to avoid the error workload cluster configuration validation failed: invalid string '*' in TKG_NO_PROXY.

    Workaround: Depending on the type of cluster you are upgrading:

    • Management cluster:

      1. Switch to management cluster kubectl context.
      2. Edit the configMap kapp-controller-config:

        kubectl edit cm kapp-controller-config -n tkg-system
        
      3. Find the data.noProxy field and change its wildcard hostname by removing *. For example, change *.vmware.com to .vmware.com

      4. Save and exit. The cluster is ready to upgrade.

    • Workload cluster:

      1. Switch to workload cluster kubectl context
      2. Set environment variables for your cluster name and namespace, for example:

        CLUSTER_NAME=my-test-cluster
        NS=my-test-namespace
        
      3. Obtain and decode the kapp controller data values for the workload cluster:

        kubectl get secret "${CLUSTER_NAME}-kapp-controller-data-values" -n $NS -o json | jq -r '.data."values.yaml"' | base64 -d > "${CLUSTER_NAME}-${NS}-kapp-controller-data-values"
        
      4. Edit the ${CLUSTER_NAME}-${NS}-kapp-controller-data-values file by removing * from its kappController.config.noProxy setting. For example, change *.vmware.com to .vmware.com.

      5. Save and quit.
      6. Re-encode the data values file ${CLUSTER_NAME}-${NS}-kapp-controller-data-values:

        cat "${CLUSTER_NAME}-${NS}-kapp-controller-data-values" | base64 -w 0
        
      7. Edit the ${CLUSTER_NAME}-${NS}-kapp-controller-data-values secret and update its data.value.yaml setting by pasting in the newly-encoded data values string.

        kubectl edit secret "${CLUSTER_NAME}-kapp-controller-data-values" -n "${NS}"
        
      8. Save and exit. The cluster is ready to upgrade.

  • Older version TKrs are not available immediately after standalone management cluster upgrade

    Upgrading a standalone management cluster from TKG v1.6 to v2.1 replaces the TKr source controller with a newer version that supports class-based clusters, and then re-synchronizes TKrs. As a result, once the tanzu mc upgrade command completes, tanzu cluster available-upgrades get and tanzu kubernetes-release get may not show all valid TKr versions, and the the Tanzu CLI may not be able to immediately upgrade workload clusters.

    Workaround: Wait a few minutes for TKrs to re-download.

  • Before upgrade, you must manually update a changed Avi certificate in the tkg-system package and ako-operator-addon secret values

    Management cluster upgrade fails if you have rotated an Avi Controller certificate, even if you have updated its value in the management cluster’s secret/avi-controller-ca as described in Modify the Avi Controller Credentials.

    Failure occurs because updating secret/avi-controller-ca does not copy the new value into the management cluster’s tkg-system package and ako-operator-addon secret values, and TKG uses the certificate value from those secrets during upgrade.

    Workaround: Before upgrading TKG, check if the Avi certificate in tkg-pkg-tkg-system-values is up-to-date, and patch it if needed:

    1. In the management cluster context, get the certificate from avi-controller-ca:
      kubectl get secret avi-controller-ca -n tkg-system-networking -o jsonpath="{.data.certificateAuthorityData}"
      
    2. In the tkg-pkg-tkg-system-values secret, get and decode the package values string:
      kubectl get secret tkg-pkg-tkg-system-values -n tkg-system -o jsonpath="{.data.tkgpackagevalues\.yaml}" | base64 --decode
      
    3. In the decoded package values, check the value for avi_ca_data_b64 under akoOperatorPackage.akoOperator.config. If it differs from the avi-controller-ca value, update tkg-pkg-tkg-system-values and ako-operator-addon with the new value:

      1. In a copy of the decoded package values string, paste in the new certificate from avi-controller-ca as the avi_ca_data_b64 value under akoOperatorPackage.akoOperator.config.
      2. Run base64 to re-encode the entire package values string.
      3. Patch the tkg-pkg-tkg-system-values secret with the new, encoded string:
        kubectl patch secret/tkg-pkg-tkg-system-values -n tkg-system -p '{"data": {"tkgpackagevalues.yaml": "BASE64-ENCODED STRING"}}'
        
      4. In the ako-operator-addon secret, get and decode the values string:
        kubectl get secret MANAGEMENT-CLUSTER-NAME-ako-operator-addon -n tkg-system -o jsonpath="{.data.values\.yaml}" | base64 --decode
        
      5. In a copy of the decoded values string, paste in the new certificate from avi-controller-ca as the avi_ca_data_b64 value.
      6. Run base64 to re-encode the entire ako-operator-addon values string.
      7. Patch the ako-operator-addon secret with the new, encoded string:
        kubectl patch secret/MGMT-CLUSTER-NAME-ako-operator-addon -n tkg-system -p '{"data": {"values.yaml": "BASE64-ENCODED STRING"}}'
        

Packages

  • Configuration update requires upgrade for some packages

    Known Issue In: v2.1.1

    The Tanzu Standard package repository for TKG v2.1.1 lacks the following package versions that are in the v2.1.0 repository:

    • cert-manager: 1.10.1+vmware.1-tkg.1.yml, 1.5.3+vmware.7-tkg.1.yml, and 1.7.2+vmware.3-tkg.1.yml
    • external-dns: 0.10.0+vmware.1-tkg.3.yml, 0.11.0+vmware.1-tkg.3.yml, and 0.12.2+vmware.4-tkg.1.yml
    • grafana: 7.5.16+vmware.1-tkg.2.yml

    Because of this, after you upgrade a workload cluster from TKG v2.1.0 to v2.1.1, you cannot run tanzu package installed update to update these packages’ configurations without also upgrading the packages to the latest versions:

    • cert-manager: 1.10.1+vmware.1-tkg.2.yml
    • external-dns: 0.12.2+vmware.4-tkg.2.yml
    • grafana: 7.5.17+vmware.1-tkg.1.yml

    This issue only arises if you need to change package configurations; the installed packages continue to run without upgrading.

    Workaround: Do either one of the following:

    • If you need to update your cert-manager, external-dns, or grafana package configuration:

      1. Run tanzu package installed get to retrieve the version of the package.
      2. As listed above, if the v2.1.1 repo lacks the installed package version, pass the latest version to the -v flag when running tanzu package installed update.
    • After upgrading workload clusters to TKG v2.1.1, update the three packages to the versions above.

    For tanzu package commands, see , as described in Install and Manage Packages.

  • Multus CNI fails on medium and smaller pods with NSX Advanced Load Balancer

    On vSphere, workload clusters with medium or smaller worker nodes running the Multus CNI package with NSX ALB can fail with Insufficient CPU or other errors.

    Workaround: To use Multus CNI with NSX ALB, deploy workload clusters with worker nodes of size large or extra-large.

  • TKG BoM file contains extraneous cert-manager package version

    The TKG Bill of Materials (BoM) file that the Tanzu CLI installs into ~/.config/tanzu/tkg lists both v1.5.3 and v1.7.2 versions for the cert manager (jetstack_cert-manager) package. The correct version to install is v1.5.3, as described in Install cert-manager.

    Workaround: Install v1.5.3 of cert-manager.

  • Deactivating Pinniped requires manual Secret delete on legacy clusters

    When you deactivate external identity management on a management cluster, the unused Pinniped Secret object remains present on legacy workload clusters.

    If a user then tries to access the cluster using an old kubeconfig, a login popup will appear and fail.

    Workaround: Manually delete the legacy cluster’s Pinniped Secret as described in Deactivate Identity Management.

  • Harbor CVE export may fail when execution ID exceeds 1000000+

    Harbor v2.6.3, which is the version packaged for TKG v2.1, has a known issue that CVE reports export with error “404 page not found” when the execution primary key auto-increment ID grows to 1000000+.

    This Harbor issue is resolved in later versions of Harbor that are slated for inclusion in later versions of TKG.

  • No Harbor proxy cache support

    You cannot use Harbor’s proxy cache feature for running Tanzu Kubernetes Grid v2.1 in an internet-restricted environment. You can still use a Harbor proxy cache to proxy images from prior versions of Tanzu Kubernetes Grid, and non-Tanzu images such as application images.

    Workaround: None

  • Packages do not comply with default baseline PSA profile

    With PSA controllers on TKG, in unsupported Technical Preview state, some TKG packages do not comply with the default baseline profile.

    Workaround: Set the audit=privileged and warn=privileged label in affected package namespaces as described in Pod Security Admission Controller (Technical Preview).

  • Adding standard repo fails for single-node clusters

    Running tanzu package repository add to add the tanzu-standard repo to a single-node cluster of the type described in Single-Node Clusters on vSphere (Technical Preview) may fail.

    This happens because single-node clusters boot up with cert-manager as a core add-on, which conflicts with the different cert-manager package in the tanzu-standard repo.

    Workaround: Before adding the tanzu-standard repo, patch the cert-manager package annotations as described in Install cert-manager.

Cluster Operations

Known in v2.1.1

The following are known cluster operations issues in v2.1.1.

  • Cannot create new workload clusters based on non-current TKr versions with Antrea CNI

    You cannot create a new workload cluster that uses Antrea CNI and runs Kubernetes versions shipped with prior versions TKG, such as Kubernetes v1.23.10, which is the default Kubernetes version in TKG v1.6.1 as listed in Supported Kubernetes Versions in Tanzu Kubernetes Grid v2.1.

    TKG v2.1.1 fully supports existing clusters that run older versions of Kubernetes.

    Workaround: Create a workload cluster that runs Kubernetes 1.24.10, 1.23.16, or 1.22.17. The Kubernetes project recommends that you run components on the most recent patch version of any current minor version.

Known in v2.1

  • tanzu cluster create does not correctly validate generated cluster specs with non-default Kubernetes versions

    When creating a class-based workload cluster from a configuration file using one of the two-step processes described in Create a Class-Based Cluster, and you specify a --tkr value in the first step to base the cluster on a non-default version of Kubernetes, the second step may fail with validation errors.

    Workaround: In the second step, when you run tanzu cluster create a second time and pass in the generated cluster manifest, specify the same --tkr values and other options that you did in the first step, as described in Create a Class-Based Cluster.

  • Autoscaler for class-based clusters requires manual annotations

    Due to a label propagation issue in Cluster API, AUTOSCALER_MIN_SIZE_* and AUTOSCALER_MAX_SIZE_* settings in the cluster configuration file for class-based workload clusters are not set in the cluster’s MachineDeployment objects.

    Workaround: After creating a class-based workload cluster with Cluster Autoscaler enabled, manually add the min- and max- machine count setting for each AZ as described in Manually Add Min and Max Size Annotations.

  • Node pool labels and other configuration properties cannot be changed

    You cannot add to or otherwise change an existing node pool’s labels, az, nodeMachineType or vSphere properties, as listed in Configuration Properties.

    Workaround: Create a new node pool in the cluster with the desired properties, migrate workloads to the new node pool, and delete the original.

  • You cannot scale management cluster control plane nodes to an even number

    If you run tanzu cluster scale on a management cluster and pass an even number to the --controlplane-machine-count option, TKG does not scale the control plane nodes, and the CLI does not output an error. To maintain quorum, control plane node counts should always be odd.

    Workaround: Do not scale control plane node counts to an even number.

  • Class-based cluster names have 25 character limit with NSX ALB as load balancer service or ingress controller

    When NSX Advanced Load Balancer (ALB) is used as a class-based cluster’s load balancer service or ingress controller with a standalone management cluster, its application names include both the cluster name and load-balancer-and-ingress-service, the internal name for the AKO package. When the combined name exceeds the 64-character limit for Avi Controller apps, the tanzu cluster create command may fail with an error that the avi-system namespace was not found.

    Workaround: Limit class-based cluster name length to 25 characters or less when using NSX ALB as a load balancer or ingress controller.

  • Orphan vSphereMachine objects after cluster upgrade or scale

    Due to a known issue in the cluster-api-provider-vsphere (CAPV) project, standalone management clusters on vSphere may leave orphaned VSphereMachine objects behind after cluster upgrade or scale operations.

    This issue is fixed in newer versions of CAPV, which future patch versions of TKG will incorporate.

    Workaround: To find and delete orphaned CAPV VM objects:

    1. List all VSphereMachine objects and identify the orphaned ones, which do not have any PROVIDERID value:
      kubectl get vspheremachines -A
      
    2. For each orphaned VSphereMachine object:

      1. List the object and retrieve its machine ID:
        kubectl get vspheremachines VSPHEREMACHINE -n NAMESPACE -o yaml
        

        Where VSPHEREMACHINE is the machine NAME and NAMESPACE is its namespace.

      2. Check if the VSphereMachine has an associated Machine object:
        kubectl get machines -n NAMESPACE |  grep MACHINE-ID
        

        Run kubectl delete machine to delete any Machine object associated with the VSphereMachine.

      3. Delete the VSphereMachine object:
        kubectl delete vspheremachines VSPHEREMACHINE -n NAMESPACE
        
      4. From vCenter, check if the VSphereMachine VM still appears; it may be present, but powered off. If so, delete it in vCenter.
      5. If the deletion hangs, patch its finalizer:
        kubectl patch vspheremachines VSPHEREMACHINE -n NAMESPACE -p '{"metadata": {"finalizers": null}}' --type=merge
        

Networking

Note

For v4.0+, VMware NSX-T Data Center is renamed to “VMware NSX.”

  • Creating a ClusterClass config file from a legacy config file and --dry-run includes empty Antrea configuration

    Creating a ClusterClass config file by using tanzu cluster create --dry-run -f with a legacy config file that includes an ANTREA_NODEPORTLOCAL entry results in an autogenerated Antrea configuration that does not include any labels, which causes Antrea not to reconcile successfully. This happens because in TKG 2.1.1, AntreaConfig resources need the tkg.tanzu.vmware.com/package-name label for the Add-on manager to install Antrea in the designated workload cluster. This issue does not apply to 2.1.0.

    Workaround: Add the missing labels to the AntreaConfig in the ClusterClass config file and attempt to create the cluster again:

    labels:
    tkg.tanzu.vmware.com/cluster-name: rito
    tkg.tanzu.vmware.com/package-name: antrea.tanzu.vmware.com.1.7.2---vmware.1-tkg.1-advanced 
    
  • IPv6 networking is not supported on vSphere 8

    TKG v2.1 does not support IPv6 networking on vSphere 8, although it supports single-stack IPv6 networking using Kube-Vip on vSphere 7 as described in IPv6 Networking.

    Workaround: If you need or are currently using TKG in an IPv6 environment on vSphere, do not install or upgrade to vSphere 8.

  • NSX ALB NodePortLocal ingress mode is not supported for management cluster

    In TKG v2.1, you cannot run NSX Advanced Load Balancer (ALB) as a service type with ingress mode NodePortLocal for traffic to the management cluster.

    This issue does not affect support for NodePortLocal ingress to workload clusters, as described in L7 Ingress in NodePortLocal Mode.

    Workaround: Configure management clusters with AVI_INGRESS_SERVICE_TYPE set to either NodePort or ClusterIP. Default is NodePort.

  • Management cluster create fails or performance slow with older NSX-T versions and Photon 3 or Ubuntu with Linux kernel 5.8 VMs

    Deploying a management cluster with the following infrastructure and configuration may fail or result in restricted traffic between pods:

    • vSphere with any of the following versions of NSX-T:
      • NSX-T v3.1.3 with Enhanced Datapath enabled
      • NSX-T v3.1.x lower than v3.1.3
      • NSX-T v3.0.x lower than v3.0.2 hot patch
      • NSX-T v2.x. This includes Azure VMware Solution (AVS) v2.0, which uses NSX-T v2.5
    • Base image: Photon 3 or Ubuntu with Linux kernel 5.8

    This combination exposes a checksum issue between older versions of NSX-T and Antrea CNI.

    TMC: If the management cluster is registered with Tanzu Mission Control (TMC) there is no workaround to this issue. Otherwise, see the workarounds below.

    Workarounds:

    • Deploy workload clusters configured with ANTREA_DISABLE_UDP_TUNNEL_OFFLOAD set to "true". This setting deactivates Antrea’s UDP checksum offloading, which avoids the known issues with some underlay network and physical NIC network drivers.
    • Upgrade to NSX-T v3.0.2 Hot Patch, v3.1.3, or later, without Enhanced Datapath enabled
    • Use an Ubuntu base image with Linux kernel 5.9 or later.

Identity and Access Management

Storage

  • Changing default StorageClass object causes reconcile failure in workload clusters

    Modifying the properties of a default StorageClass object included in TKG causes a package reconcile failure in workload clusters that use the storage class.

    Workaround: To customize a storage class, create a new StorageClass definition with a different name instead of modifying the default object definition, and reconfigure the cluster to use the new storage class.

  • Workload cluster cannot distribute storage across multiple datastores

    You cannot enable a workload cluster to distribute storage across multiple datastores as described in Deploy a Cluster that Uses a Datastore Cluster. If you tag multiple datastores in a datastore cluster as the basis for a workload cluster’s storage policy, the workload cluster uses only one of the datastores.

    Workaround: None

CLI

  • Non-alphanumeric characters cannot be used in HTTP/HTTPS proxy passwords

    When deploying management clusters with CLI, the non-alphanumeric characters # ` ^ | / ? % ^ { [ ] } \ " < > cannot be used in passwords. Also, any non-alphanumeric character cannot be used in HTTP/HTTPS proxy passwords when deploying management cluster with UI.

    Workaround: You can use non-alphanumeric characters other than # ` ^ | / ? % ^ { [ ] } \ " < > in passwords when deploying management cluster with CLI.

  • Tanzu CLI does not work on macOS machines with ARM processors

    Tanzu CLI v0.11.6 does not work on macOS machines with ARM (Apple M1) chips, as identified under Finder > About This Mac > Overview.

    Workaround: Use a bootstrap machine with a Linux or Windows OS, or a macOS machine with an Intel processor.

  • Tanzu CLI lists tanzu management-cluster osimage

    The management-cluster command group lists tanzu management-cluster osimage. This feature is currently in development and reserved for future use.

    Workaround: Do not use tanzu management-cluster osimage.

  • Validation error when running tanzu cluster create

    By default, when you pass a flat key-value configuration file to the --file option of tanzu cluster create, the command converts the configuration file into a Kubernetes-style object spec file and then exits. This behavior is controlled by the auto-apply-generated-clusterclass-based-configuration feature, which is set to false by default. In some cases, when you pass the Kubernetes-style object spec file generated by the --file option to tanzu cluster create, the command fails with an error similar to the following:

    Error: workload cluster configuration validation failed...
    

    This error may also occur when you pass a Kubernetes-style object spec file generated by the --dry-run option to tanzu cluster create.

    Workaround: Set the configuration parameter or parameters listed in the error output as local environment variables. Alternatively, to avoid this error, you can create class-based clusters in one step, without previewing their configuration, by setting the auto-apply-generated-clusterclass-based-configuration feature to true and then running tanzu cluster create. To set auto-apply-generated-clusterclass-based-configuration to true, run:

    tanzu config set features.cluster.auto-apply-generated-clusterclass-based-configuration true
    

    This configures the Tanzu CLI to always create class-based clusters in one step. For more information, see Create a Class-Based Cluster.

  • --default-values-file-output option of tanzu package available get outputs an incomplete configuration template file for the Harbor package

    Running tanzu package available get harbor.tanzu.vmware.com/PACKAGE-VERSION --default-values-file-output FILE-PATH creates an incomplete configuration template file for the Harbor package. To get a complete file, use the imgpkg pull command as described in Install Harbor for Service Registry.

  • Windows CMD: Extraneous characters in CLI output column headings

    In the Windows command prompt (CMD), Tanzu CLI command output that is formatted in columns includes extraneous characters in column headings.

    The issue does not occur in Windows Terminal or PowerShell.

    Workaround: On Windows bootstrap machines, run the Tanzu CLI from Windows Terminal.

  • Ignorable AKODeploymentConfig error during management cluster creation

    Running tanzu management-cluster create to create a management cluster with NSX ALB outputs the following error: no matches for kind ???AKODeploymentConfig??? in version ???networking.tkg.tanzu.vmware.com/v1alpha1???. The error can be ignored. For more information, see this article in the KB.

  • Ignorable machinehealthcheck and clusterresourceset errors during workload cluster creation on vSphere

    When a workload cluster is deployed to vSphere by using the tanzu cluster create command through vSphere with Tanzu, the output might include errors related to running machinehealthcheck and accessing the clusterresourceset resources, as shown below:

    Error from server (Forbidden): error when creating "/tmp/kubeapply-3798885393": machinehealthchecks.cluster.x-k8s.io is forbidden: User "sso:[email protected]" cannot create resource "machinehealthchecks" in API group "cluster.x-k8s.io" in the namespace "tkg"
    ...
    Error from server (Forbidden): error when retrieving current configuration of: Resource: "addons.cluster.x-k8s.io/v1beta1, Resource=clusterresourcesets", GroupVersionKind: "addons.cluster.x-k8s.io/v1beta1, Kind=ClusterResourceSet"
    ...
    

    The workload cluster is successfully created. You can ignore the errors.

  • CLI temporarily misreports status of recently deleted nodes when MHCs are deactivated

    When machine health checks (MHCs) are deactivated, then Tanzu CLI commands such as tanzu cluster status may not report up-to-date node state while infrastructure is being recreated.

    Workaround: None

vSphere

  • Node pools created with small nodes may stall at Provisioning

    Node pools created with node SIZE configured as small may become stuck in the Provisioning state and never proceed to Running.

    Workaround: Configure node pool with at least medium size nodes.

  • With NSX ALB, cannot create clusters with identical names

    If you are using NSX Advanced Load Balancer for workloads (AVI_ENABLE) or the control plane (AVI_CONTROL_PLANE_HA_PROVIDER) the Avi Controller may fail to distinguish between identically-named clusters.

    Workaround: Set a unique CLUSTER_NAME value for each cluster:

    • Management clusters: Do not create multiple management clusters with the same CLUSTER_NAME value, even from different bootstrap machines.

    • Workload clusters: Do not create multiple workload clusters that have the same CLUSTER_NAME and are also in the same management cluster namespace, as set by their NAMESPACE value.

  • Adding external identity management to an existing deployment may require setting dummy VSPHERE_CONTROL_PLANE_ENDPOINT value

    Integrating an external identity provider with an existing TKG deployment may require setting a dummy VSPHERE_CONTROL_PLANE_ENDPOINT value in the management cluster configuration file used to create the add-on secret, as described in Generate the Pinniped Add-on Secret for the Management Cluster

AWS

  • CAPA resource tagging issue causes reconciliation failure during AWS management cluster deploy and upgrade.

    Due to a resource tagging issue in upstream Cluster API Provider AWS (CAPA), offline deployments cannot access the ResourceTagging API, causing reconciliation failures during management cluster creation or upgrade.

    Workaround: In an offline AWS environment, set EXP_EXTERNAL_RESOURCE_GC=false in your local environment or in the management cluster configuration file before running tanzu mc create or tanzu mc upgrade.

  • Workload cluster node pools on AWS must be in the same availability zone as the standalone management cluster.

    When creating a node pool configured with an az that is different from where the management cluster is located, the new node pool may remain stuck with status ScalingUp, as listed by tanzu cluster node-pool list, and never reach the Ready state.

    Workaround: Only create node pools in the same AZ as the standalone management cluster.

  • Deleting cluster on AWS fails if cluster uses networking resources not deployed with Tanzu Kubernetes Grid.

    The tanzu cluster delete and tanzu management-cluster delete commands may hang with clusters that use networking resources created by the AWS Cloud Controller Manager independently from the Tanzu Kubernetes Grid deployment process. Such resources may include load balancers and other networking services, as listed in The Service Controller in the Kubernetes AWS Cloud Provider documentation.

    For more information, see the Cluster API issue Drain workload clusters of service Type=Loadbalancer on teardown.

    Workaround: Use kubectl delete to delete services of type LoadBalancer from the cluster. Or if that fails, use the AWS console to manually delete any LoadBalancer and SecurityGroup objects created for this service by the Cloud Controller manager.

    Caution

    : Do not to delete load balancers or security groups managed by Tanzu, which have the tags key: sigs.k8s.io/cluster-api-provider-aws/cluster/CLUSTER-NAME, value: owned.

Azure

  • Cluster delete fails when storage volume uses account with private endpoint

    With an Azure workload cluster in an unmanaged resource group, when the Azure CSI driver creates a persistent volume (PV) that uses a storage account with private endpoint, it creates a privateEndpoint and vNet resources that are not deleted when the PV is deleted. As a result, deleting the cluster fails with an error like subnets failed to delete. err: failed to delete resource ... Subnet management-cluster-node-subnet is in use.

    Workaround: Before deleting the Azure cluster, manually delete the network interface for the storage account private endpoint:

    1. From a browser, log in to Azure Resource Explorer.
    2. Click subscriptions at left, and expand your subscription.
    3. Under your subscription, expand resourceGroups at left, and expand your TKG deployment’s resource group.
    4. Under the resource group, expand providers > Microsoft.Network > networkinterfaces.
    5. Under networkinterfaces, select the NIC resource that is failing to delete.
    6. Click the Read/Write button at the top, and then the Actions(POST, DELETE) tab just underneath.
    7. Click Delete.
    8. Once the NIC is deleted, delete the Azure cluster.

Windows and Multi-OS Workload Clusters

  • You cannot create a Windows machine image on a MacOS machine

    Due to an issue with the open-source packer utility used by Kubernetes Image Builder, you cannot build a Windows machine image on a MacOS machine as described in Windows Custom Machine Images.

    Workaround: Use a Linux machine to build your custom Windows machine images.

  • Backup and restore is not supported for Windows and multi-OS workload clusters

    You cannot backup and restore workload clusters with Windows-based worker nodes.

    Workaround: None

Image-Builder

  • Ignorable goss test failures during image-build process

    When you run Kubernetes Image Builder to create a custom Linux custom machine image, the goss tests python-netifaces, python-requests, and ebtables fail. Command output reports the failures. The errors can be ignored; they do not prevent a successful image build.

AVS

  • vSphere CSI volume deletion may fail on AVS

    On Azure vSphere Solution (AVS), vSphere CSI Persistent Volumes (PVs) deletion may fail. Deleting a PV requires the cns.searchable permission. The default admin account for AVS, [email protected], is not created with this permission. For more information, see vSphere Roles and Privileges.

    Workaround: To delete a vSphere CSI PV on AVS, contact Azure support.

check-circle-line exclamation-circle-line close-line
Scroll to top icon