VMware Tanzu Kubernetes Grid | 09 SEP 2021 | Build 18560687

Check for additions and updates to these release notes.

What's New in Tanzu Kubernetes Grid 1.4

Here are the key new features and capabilities of VMware Tanzu Kubernetes Grid 1.4.

Version Updates

Tanzu Kubernetes Release Enhancements

  • Tanzu Kubernetes releases are no longer tied to Tanzu Kubernetes Grid releases. See Tanzu Kubernetes Releases.
  • Tanzu Kubernetes releases include a version of Antrea and a version of Kubernetes that are compatible with each other. If you upgrade a Tanzu Kubernetes release, the Antrea and Kubernetes versions are now also updated to the latest compatible versions. For more information, see Tanzu Kubernetes releases and Antrea Versions.
  • Federal Information Processing Standards (FIPS) compliant Tanzu Kubernetes releases provided for Kubernetes v1.20.x.

Extensions Replaced by Packages

Proxy Enhancements

  • When you run the tanzu init command in a proxy environment, the TKG_*_PROXY configuration gets updated automatically into the local bootstrap cluster.
  • In a proxy environment, you can enable communication of a cluster VM with vCenter Server through an insecure connection. See Configure Proxies.
  • Tanzu Kubernetes Grid does not proxy traffic from cluster VMs to vCenter Server. In a proxied vSphere environment, you must either use insecure communication to vSphere or add the vCenter IP or the hostname to the TKG-NO-PROXY list. See Configure Proxies and Configure the Kubernetes Network and Proxies.

vSphere Enhancements

AWS Enhancements

Azure Enhancements

VMware NSX Advanced Load Balancer Enhancements

  • Supports NSX Advanced Load Balancer service in workload clusters, which is available through the Avi Kubernetes Operator (AKO). See Configure NSX ALB in Workload Clusters.
  • Supports L7 ingress using VMware NSX Advanced Load Balancer. See Configure L7 Ingress with NSX Advanced Load Balancer for Workload Clusters.
  • Support for NSX Advanced Load Balancer as a control plane endpoint provider.
  • Avi Kubernetes Operator (AKO) and AKO Operator are provided as core packages.​ See Core Packages.
  • New configuration variables:
    • AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME - The name of the network where you associate a floating IP subnet or IP pool to a load balancer for the management cluster and workload cluster control plane (if using NSX ALB to provide control plane HA). See the NSX Advanced Load Balancer reference list for more information.
    • AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR - The CIDR of the subnet to use for the management cluster and workload cluster’s control plane (if using NSX ALB to provide control plane HA) load balancer VIP. See the NSX Advanced Load Balancer reference list for more information.
    • AVI_DISABLE_STATIC_ROUTE_SYNC - Tells AKO to sync static routing between the pod networks and the NSX ALB service engine or not. See the NSX Advanced Load Balancer reference list for more information.
    • AVI_INGRESS_NODE_NETWORK_LIST - Specifies the name of the port group (PG) network that your nodes are a part of and the associated CIDR that the CNI allocates to each node for that node to assign to its pods. See the NSX Advanced Load Balancer reference list for more information.

Deployment and Configuration Enhancements

  • In addition to selecting vSphere Base OS images, Tanzu Kubernetes Grid 1.4 allows you to select multiple image templates that are present in your AWS and Azure regions when deploying Tanzu Kubernetes Grid clusters. See Select the Base OS Image.
  • Adds LDAP configuration verification in the installer interface. See Configure Identity Management.
  • Support for node pools for workload clusters. See Manage Node Pools of Different VM Types.
  • General improvements in the installation experience and improved accessibility.

Product Snapshot for Tanzu Kubernetes Grid v1.4

Tanzu Kubernetes Grid v1.4 supports the following infrastructure platforms and operating systems (OSs), as well as cluster creation and management, networking, storage, authentication, backup and migration, and observability components. The component versions listed in parentheses are included in Tanzu Kubernetes Grid v1.4. For more information, see Component Versions.

  vSphere Amazon EC2 Azure
Infrastructure platform vSphere 6.7U3 and later, vSphere 7, VMware Cloud on AWS****, Azure VMware Solution Native AWS* Native Azure*
Cluster creation and management Core Cluster API (v0.3.22), Cluster API Provider vSphere (v0.7.10) Core Cluster API (v0.3.22), Cluster API Provider AWS (v0.6.6) Core Cluster API (v0.3.22), Cluster API Provider Azure (v0.4.15)
Kubernetes node OS distributed with TKG Photon OS 3, Ubuntu 20.04 Amazon Linux 2, Ubuntu 20.04 Ubuntu 18.04, Ubuntu 20.04
Build your own image Photon OS 3, Red Hat Enterprise Linux 7, Ubuntu 18.04, Ubuntu 20.04 Amazon Linux 2, Ubuntu 18.04, Ubuntu 20.04 Ubuntu 18.04, Ubuntu 20.04
Container runtime Containerd (v1.4.6) + Containerd (v1.4.6) + Containerd (v1.4.6) +
Container networking Antrea (v0.13.3), Calico (v3.11.3) Antrea (v0.13.3), Calico (v3.11.3) Antrea (v0.13.3), Calico (v3.11.3)
Container registry Harbor (v2.2.3) Harbor (v2.2.3) Harbor (v2.2.3)

NSX Advanced Load Balancer Essentials (v20.1.3)**, Contour (v1.17.1),
Avi Kubernetes Operator (AKO) (v1.4.3_vmware.1),
Avi Controller (v20.1.3 - v20.1.6)

Contour (v1.17.1) Contour (v1.17.1)
Storage vSphere Container Storage Interface (v2.3.0***) and vSphere Cloud Native Storage In-tree cloud providers only In-tree cloud providers only
Authentication OIDC via Pinniped (v0.4.4), LDAP via Pinniped (v0.4.4) and Dex OIDC via Pinniped (v0.4.4), LDAP via Pinniped (v0.4.4) and Dex OIDC via Pinniped (v0.4.4), LDAP via Pinniped (v0.4.4) and Dex
Observability Fluent Bit (v1.7.5), Prometheus (v2.27.0), Grafana (v7.5.7) Fluent Bit (v1.7.5), Prometheus (v2.27.0), Grafana (v7.5.7) Fluent Bit (v1.7.5), Prometheus (v2.27.0), Grafana (v7.5.7)
Backup and migration Velero (v1.6.2) Velero (v1.6.2) Velero (v1.6.2)


  • * See Supported AWS and Azure Regions below.
  • ** NSX Advanced Load Balancer Essentials is supported on vSphere 6.7U3, vSphere 7, and VMware Cloud on AWS.
  • *** Version of vsphere_csi_driver. For a full list of vSphere Container Storage Interface components included in this release, see Component Versions.
  • **** For a list of VMware Cloud on AWS SDDC versions that are compatible with this release, see the VMware Product Interoperability Matrix.

For a full list of Kubernetes versions that ship with Tanzu Kubernetes Grid v1.4, see Supported Kubernetes Versions in Tanzu Kubernetes Grid v1.4 above.

Supported Component Versions

Each version of Tanzu Kubernetes Grid provides specific versions of Kubernetes and updates the components that ship with the product.

Supported Kubernetes Versions in Tanzu Kubernetes Grid v1.4

Each version of Tanzu Kubernetes Grid adds support for new Kubernetes versions. This version also supports versions of Kubernetes from previous versions of Tanzu Kubernetes Grid.

Tanzu Kubernetes Grid Version Provided Kubernetes Versions Supported in v1.4?
1.4.0 1.21.2
1.3.1 1.20.5
1.3.0 1.20.4
1.2.1 1.19.3
1.2 1.19.1
1.1.3 1.18.6
1.1.2 1.18.3
1.1.0 1.18.2 NO
1.0.0 1.17.3 NO

Component Versions

The Tanzu Kubernetes Grid v.1.4 release includes the following software component versions:

  • ako-operator: v1.4.0+vmware.1
  • alertmanager: v0.22.2+vmware.1
  • antrea: v0.13.3+vmware.1
  • cadvisor: v0.39.1+vmware.1
  • calico_all: v3.11.3+vmware.1
  • cloud-provider-azure: v0.7.4+vmware.1
  • cloud_provider_vsphere: v1.21.0+vmware.1
  • cluster-api-provider-azure: v0.4.15+vmware.1
  • cluster_api: v0.3.22+vmware.1
  • cluster_api_aws: v0.6.6+vmware.1
  • cluster_api_vsphere: v0.7.10+vmware.1
  • configmap-reload: v0.5.0+vmware.1
  • contour: v1.17.1+vmware.1
  • crash-diagnostics: v0.3.3+vmware.1
  • csi_attacher: v3.2.0+vmware.1
  • csi_livenessprobe: v2.2.0+vmware.1
  • csi_node_driver_registrar: v2.1.0+vmware.1
  • csi_provisioner: v2.2.0+vmware.1
  • dex: v2.27.0+vmware.1
  • envoy: v1.18.3+vmware.1
  • external-dns: v0.8.0+vmware.1
  • fluent-bit: v1.7.5+vmware.1
  • gangway: v3.2.0+vmware.2
  • grafana: v7.5.7+vmware.1
  • harbor: v2.2.3+vmware.1
  • imgpkg: v0.10.0+vmware.1
  • jetstack_cert-manager: v1.1.0+vmware.1
  • k8s-sidecar: v1.12.1+vmware.1
  • k14s_kapp: v0.37.0+vmware.1
  • k14s_ytt: v0.34.0+vmware.1
  • kapp-controller: v0.23.0+vmware.1
  • kbld: v0.30.0+vmware.1
  • kube-state-metrics: v1.9.8+vmware.1
  • kube-vip: v0.3.3+vmware.1
  • kube_rbac_proxy: v0.8.0+vmware.1
  • kubernetes-csi_external-resizer: v1.1.0+vmware.1
  • kubernetes-sigs_kind: v1.21.2+vmware.1
  • kubernetes_autoscaler: v1.21.0+vmware.1, v1.20.0+vmware.1, v1.19.1+vmware.1
  • load-balancer-and-ingress-service: v1.4.3+vmware.1
  • metrics-server: v0.4.0_vmware.1
  • pinniped: v0.4.4+vmware.1
  • prometheus: v2.27.0+vmware.1
  • prometheus_node_exporter: v1.1.2+vmware.1
  • pushgateway: v1.4.0+vmware.1
  • sonobuoy: v0.20.0+vmware.1
  • tanzu_core: v1.4.0
  • tkg-bom: v1.4.0
  • tkg_telemetry: v1.4.0+vmware.1
  • velero: v1.6.2+vmware.1
  • velero-plugin-for-aws: v1.2.1+vmware.1
  • velero-plugin-for-microsoft-azure: v1.2.1+vmware.1
  • velero-plugin-for-vsphere: v1.1.1+vmware.1
  • vsphere_csi_driver: v2.3.0+vmware.1

*The version of kapp-controller depends on the Kubernetes version you are running and on which cloud provider the cluster is deployed upon.

For a complete list of software component versions that ship with Tanzu Kubernetes Grid v1.4, see ~/.config/tanzu/tkg/bom/bom-v1.4.0.yaml and ~/.config/tanzu/tkg/bom/tkr-bom-v1.20.5+vmware.2-tkg.1.yaml.

Supported AWS and Azure Regions

You can use Tanzu Kubernetes Grid v1.4 to deploy clusters to the following AWS regions:

  • ap-northeast-1
  • ap-northeast-2
  • ap-south-1
  • ap-southeast-1
  • ap-southeast-2
  • eu-central-1
  • eu-west-1
  • eu-west-2
  • eu-west-3
  • sa-east-1
  • us-east-1
  • us-east-2
  • us-gov-east-1
  • us-gov-west-1
  • us-west-2

You can use Tanzu Kubernetes Grid v1.4 to deploy clusters to all Microsoft Azure regions within the AzurePublicCloud and AzureUSGovernment cloud environments.

Supported Upgrade Paths

For the supported upgrade paths, see the VMware Product Interoperability Matrix.

User Documentation

The Tanzu Kubernetes Grid 1.4 documentation applies to all of the 1.4.x releases.

Resolved Issues

  • On vSphere 7, offline volume expansion for vSphere CSI storage used by workload clusters does not work.

    Cluster storage interface (CSI) lacks the csi-resizer pod needed to resize storage volumes.

  • The tanzu CLI truncates workload cluster names or does not perform cluster operations.

    Workload cluster names must be 42 characters or less.

  • Telemetry for the Customer Experience Improvement Program (CEIP) does not run on AWS.

    Telemetry pods fail with an error like the following:

    "ERROR workspace/main.go:48 the individual labels are formed incorrectly. e.g. --labels=<key1>=<value1>,<key2>=<value2> with no ',' and '=' allowed in keys and values"

    This issue only affects management clusters created with the CEIP Participation enabled in the installer interface, or ENABLE_CEIP_PARTICIPATION absent in the configuration file or set to true (the default).

  • Management clusters that run Photon OS deploy workload clusters that run Ubuntu by default

    If you use a Photon OS OVA image when you deploy a management cluster to vSphere from the installer interface, the OS_NAME setting is not written into the configuration file. Consequently, if you use a copy of the management cluster configuration file to deploy workload clusters, the workload cluster OS defaults to Ubuntu, unless you explicitly set the OS_NAME variable to photon in the configuration file. If the Ubuntu image is not present in your vSphere inventory, deployment of workload clusters will fail.

  • Running Tanzu commands on Windows fails with a certificate error

    Attempts to run the Tanzu CLI on Windows for the first time result in Error: unable to ensure tkg BOM file: failed to download default bom files from the registry: [...] certificate signed by unknown authority

  • Management Cluster creation fails if vSphere password starts with special characters

    If the password for vSphere starts with %, !, &,*, or # deployment fails with the following error:

    Error: unable to set up management cluster: unable to build management cluster configuration: unable to get template: Extracting data value from KV: Deserializing value for key 'VSPHERE_PASSWORD': Deserializing YAML value: yaml: line 1: could not find expected directive name
  • May 2021 Linux security patch causes kind clusters to fail during management cluster creation

    If you run Tanzu CLI commands on a machine with a recent Linux kernel, for example Linux 5.11 and 5.12 with Fedora, kind clusters do not operate. This happens because kube-proxy attempts to change nf_conntrack_max sysctl, which was made read-only in the May 2021 Linux security patch, and kube-proxy enters a CrashLoopBackoff state. The security patch is currently being backported to all LTS kernels from 4.9 onwards, so as  operating system updates are shipped, including for Docker Machine on Mac OS and Windows Subsystem for Linux, kind clusters will fail, resulting in management cluster deployment failure.

  • The kind bootstrap cluster cannot pull container images when the management cluster and infrastructure (such as vCenter) are on different networks and behind a different proxies.

    This is a limitation in proxied, internet-restricted environments.

Known Issues

The known issues are grouped as follows.

Azure Issues
  • vSphere CSI volume deletion may fail on Azure vSphere Solution (AVS)

    On AVS, vSphere CSI Persistent Volumes (PVs) deletion may fail. Deleting a PV requires cns.searchable permissions. For more information, see vSphere Roles and Privileges. The default admin account for AVS, cloudadmin@vsphere.local, is not created with this permission. To delete a vSphere CSI PV on AVS, contact Azure support.

    Workaround: None

CLI Issues
  • Windows CMD: Extraneous characters in CLI output column headings

    In the Windows command prompt (CMD), Tanzu CLI command output that is formatted in columns includes extraneous characters in column headings.

    The issue does not occur in Windows Terminal or PowerShell.

    On Windows bootstrap machines, run the Tanzu CLI from Windows Terminal.

  • With MHC disabled, CLI temporarily misreports status of deleted nodes.

    If machine health checks (MHCs) are disabled, then the Tanzu CLI such as tanzu cluster status may not report up-to-date node state when infrastructure is being recreated.

    Workaround: None

  • The tanzu cluster create command shows errors related to running machinehealthcheck and accessing the clusterresourceset resources on vSphere

    When a workload cluster is deployed to vSphere by using the tanzu cluster create command through Tanzu Kubernetes Grid Service (TKGS), the output might include errors related to running machinehealthcheck and accessing the clusterresourceset resources, as shown below:

    Error from server (Forbidden): error when creating "/tmp/kubeapply-3798885393": machinehealthchecks.cluster.x-k8s.io is forbidden: User "sso:Administrator@vsphere.local" cannot create resource "machinehealthchecks" in API group "cluster.x-k8s.io" in the namespace "tkg"
    Error from server (Forbidden): error when retrieving current configuration of:
     Resource: "addons.cluster.x-k8s.io/v1alpha3, Resource=clusterresourcesets", GroupVersionKind: "addons.cluster.x-k8s.io/v1alpha3, Kind=ClusterResourceSet"


    The workload cluster is successfully created. You can ignore the errors.

Cluster Lifecycle and Package Issues
  • Deleting management clusters fails with image pull errors when using a custom registry certificate

    If you set the TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE variable during management cluster deployment, but did not set the TKG_CUSTOM_IMAGE_REPOSITORY variable, images are pulled successfully from the Tanzu Kubernetes Grid image repository during management cluster creation, but if you attempt to delete the management cluster, the image pull operation fails with a certificate error:

    Failed to pull image [...] proxyconnect tcp: x509: certificate signed by unknown authority

    Workaround: To successfully delete management clusters, you must set both of the TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE and TKG_CUSTOM_IMAGE_REPOSITORY variables.

    • Set the TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE variable with the CA certificate in base64 encoded format of the HTTPS proxy.
    • Set the TKG_CUSTOM_IMAGE_REPOSITORY variable to the standard Tanzu Kubernetes image repository.
      export TKG_CUSTOM_IMAGE_REPOSITORY=projects.registry.vmware.com/tkg
  • Deletion fails for Contour or Harbor packages upgraded from v1.3.

    Running tanzu package installed delete to delete a Contour or Harbor package that was upgraded from a TKG v1.3 extension fails with an error resembling:

    Error: package install deletion failed: Preparing kapp: Getting service account: serviceaccounts "contour-tanzu-system-ingress-sa" not found

    The error occurs with Contour packages installed in namespace tanzu-system-ingress or Harbor packages installed in namespace tanzu-system-registry. These namespaces are system namespaces inherited from v1.3 extensions.

    Workaround: Before running package installed delete, remove the finalizer: setting from the namespace so that namespace deletion will clean up all resources in the namespace:


    1. Run kubectl edit namespace tanzu-system-ingress for Contour, or kubectl edit namespace tanzu-system-registry for Harbor.
    2. At the end of the Namespace spec, Change the spec.finalizers setting from:
            - kubernetes
          finalizers: []
    3. Save and exit. 


  • tanzu kubernetes-release available-upgrades does not show available upgrades for a given Kubernetes version

    If you run tanzu kubernetes-release available-upgrades to get the list of available upgrades for a given Kubernetes version, the command does not return any result, even if there is a compatible upgrade available.


    1. Run tanzu cluster available-upgrades get CLUSTER-NAME  to get the list of available Tanzu Kubernetes versions compatible with your cluster.
    2. Run tanzu cluster upgrade with the desired Tanzu Kubernetes release version.
      tanzu cluster upgrade my-cluster --tkr v1.21.2---vmware.1-tkg.1
  • Running tanzu kubernetes-release get can take a long time

    If you run tanzu kubernetes-release get to get the Kubernetes version of a workload cluster, it can take several minutes to obtain the response.

    Workaround: None.

  • Harbor package fails to run

    A freshly-deployed or upgraded 1.4 Harbor package fails to run because its harbor-notary-signer pod enters a CrashLoopBackOff state.

    To diagnose this, run kubectl -n tanzu-system-registry get pods and look at the harbor-notary-signer-ID pod listing.

    Workaround: After installing the Harbor package, apply an overlay to patch it, as described in the Knowledge Base article The harbor-notary-signer pod fails to start....

  • KCP remediation remains in the WaitingForRemediation state, preventing management cluster creation

    When bootstrapping a cluster, if KCP remediation is triggered (for whatever reason), KCP remediation remains in the WaitingForRemediation state even if the control plane machine came back to a healthy state. This prevents the management cluster bootstrap process from proceeding and eventually times out. This happens because the OwnerRemediated condition is not reset when the machine returns healthy.

    Workaround: None

  • Updating cluster credentials does not restart the pods that use the credentials

    Running tanzu management-cluster credentials update cluster_name on either management clusters or workload clusters updates the cluster credentials but does not restart the cluster pods. Consequently, the new credentials are not consumed by the pods and the pods continue to use the old credentials. The results in authentication errors in the logs:

    controller-runtime/controller "msg"="Reconciler error" "error"="ServerFaultCode: Cannot complete login due to an incorrect user name or password." "controller"="vspherevm"

    Workaround: Manually restart the pod so it can obtain the new credentials.

    kubectl -n capv-system delete pod --selector=control-plane=controller-manager
  • Worker nodes cannot join cluster if cluster name contains period (.)

    If you deploy a Tanzu Kubernetes cluster and specify a name that includes the period character (.), the cluster appears to be created but only the control plane nodes are visible. Worker nodes are unable to join the cluster, and their names are truncated to exclude any text included after the period.

    Workaround: Do not include period characters in cluster names.

  • Deleting shared services cluster without removing registry webhook causes cluster deletion to stop indefinitely

    If you created a shared services cluster and deployed Harbor as a shared service with the Tanzu Kubernetes Grid Connectivity API, and then you created one or more Tanzu Kubernetes clusters, attempting to delete both the shared services cluster and the Tanzu Kubernetes clusters results in machines being deleted but both clusters remaining indefinitely in the deleting status.

    Workaround: Delete the registry admission webhook so that the cluster deletion process can complete. 

  • Cannot use tanzu login selector in Git Bash on Windows

    If you use Git Bash to run the tanzu login command on Windows systems, you see Error: Incorrect function and you cannot use the arrow keys to select a management cluster.

    Workaround: Run the following command in Git Bash before you run any Tanzu CLI commands:

    alias tanzu='winpty -Xallow-non-tty tanzu'
  • Management cluster fails to deploy when the Tanzu CLI is run on a MacOS system

    A management cluster fails to deploy when the Tanzu CLI is invoked from a MacOS system in the following circumstances:

    • The MacOS system where the tkg/tanzu CLI is launched is running Docker Desktop version 3.3.1 or earlier.
    • You see messages similar to the following in the capv-controller-manager logs in the bootstrap cluster:

    E0510 16:16:51.320061 1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="failed to create vSphere session: Post EOF" "controller"="vspherevm" "name"="cluster-name-control-plane-pqrjq" "namespace"="tkg-system"

    Workaround: Upgrade Docker Desktop to version 3.3.3 or higher.

  • Non-admin access fails for workload clusters

    If you configure identity management, LDAP or OIDC, in your management cluster, the settings do not carry over automatically to your workload clusters.

    Workaround: Set the IDENTITY_MANAGEMENT_TYPE variable to either ldap or oidc in two places:

    • In the configuration file for the management cluster
    • In the configuration file for the workload cluster.

    The values must match. For more information about identity management variables, see Tanzu CLI Configuration File Variable Reference.

  • Private image repository ignored in cluster configuration file

    If you are deploying Tanzu Kubernetes Grid in an Internet-restricted environment and set the TKG_CUSTOM_IMAGE_REPOSITORY variable in the cluster configuration file, it is ignored. Even though the output from the command tanzu management-cluster create -f foo-vsphere.yaml -v 6 indicates the images are coming from the private image repository, they are not.

    Workaround: Set the TKG_CUSTOM_IMAGE_REPOSITORY variable as a local environment variable. See Prepare an Internet-Restricted Environment.

  • Installer ignores test names entered for LDAP check

    The Tanzu Kubernetes Grid installer interface ignores what you enter in the Test User Name (Optional) and Test Group Name (Optional) fields when verifying LDAP configuration. Instead, it uses cn for the test user name and ou for the test group name when running its LDAP check.


  • Shared Services cluster does not work with TKGS

    Tanzu Kubernetes Grid Service (TKGS) does not support deploying packages to a shared services cluster. Workload clusters deployed by TKGS can only use packaged services deployed to the workload clusters themselves.

    Workaround: None

Deployment Issues
  • Management cluster create fails with Linux or MacOS bootstrap machines running cgroups v2 in their Linux kernel, and Docker Desktop v4.3.0 or later.

    Due to the version of kind that the v1.4.0 CLI uses to build its container image, bootstrap machines running cgroups v2 fail to run the image.

    Workaround: Pre-create the kind image as described in Use an Existing Bootstrap Cluster to Deploy and Delete Management Clusters or patch the bootstrap machine to run cgroups v1 as described in the Prerequisites section of Install the Tanzu CLI and Other Tools.

  • After selecting an Amazon EC2 instance type under AZ1 Worker Node Instance Type, AZ2 Worker Node Instance Type, and AZ3 Worker Node Instance Type in the Production view, the installer deploys the management cluster with only one worker node, AZ1 Worker Node Instance Type, instead of three worker nodes.

    In v1.4.0, for management clusters, the prod plan deploys three control plane nodes and one worker node.

    Workaround: None.

  • Workload cluster fails to deploy on Tanzu Kubernetes Grid Service when Machine Health Checks are not disabled in cluster configuration file.

    Running tanzu cluster create with a Tanzu Kubernetes Grid Supervisor cluster fails if the cluster configuration file does not set ENABLE_MHC to false.

    Workaround: Set ENABLE_MHC to false in the cluster configuration file. This setting has no effect because Tanzu Kubernetes Grid Service runs its own health checks, and ignores MHC settings in a Tanzu Kubernetes Grid cluster configuration file.

Tanzu Mission Control Issues
  • Registering a management cluster with Tanzu Mission Control is not supported

    It is not currently possible to register management clusters with Tanzu Mission Control.

    Workaround: You can attach workload clusters to Tanzu Mission Control without registering the management cluster as described in Attach an Existing Cluster in the Tanzu Mission Control documentation.

Upgrade Issues
  • Package repo missing and ClusterResourceSet fails to reconcile

    If you upgrade a management cluster from Tanzu Kubernetes Grid v1.3 to v1.4.x, from extensions to packages, the standard package repository is not created and ClusterResourceSet fails to reconcile.

    Workaround: Manually add the tanzu-package-repo-global namespace by following step 4 of the Procedure section in Upgrade Management Clusters.

  • On vSphere, upgrading a workload cluster with multiple control plane nodes stalls as VIP of the kube-apiserver is lost

    When upgrading a workload cluster with multiple control plane nodes on vSphere, the VIP on the first control plane is lost when the second control plane node starts, as described in Creating a workload cluster with multiple control plane nodes stalls as VIP of the kube-apiserver is lost, in the vSphere Issues below.

    Workaround: See the workaround described in Creating a workload cluster with multiple control plane nodes stalls as VIP of the kube-apiserver is lost, in the vSphere Issues below.

  • Pinniped authentication error on workload cluster after management cluster upgrade

    When attempting to authenticate to a workload cluster associated with the upgraded management cluster, you receive an error message similar to the following:

    Error: could not complete Pinniped login: could not perform OIDC discovery for "https://IP:PORT": Get "https://IP:PORT/.well-known/openid-configuration": x509: certificate signed by unknown authority

    Workaround: See Pinniped Authentication Error on Workload Cluster After Management Cluster Upgrade.

  • List of clusters shows incorrect Kubernetes version after unsuccessful upgrade attempt

    If you attempt to upgrade a Tanzu Kubernetes cluster and the upgrade fails, and if you subsequently run tanzu cluster list or tanzu cluster get to see the list of deployed clusters and their versions, the cluster for which the upgrade failed shows the upgraded version of Kubernetes.

    Workaround: None

  • Workload cluster upgrade may hang or fail due to undetached persistent volumes

    If you are upgrading your Tanzu Kubernetes clusters from Tanzu Kubernetes Grid v1.4.x to v1.5.x and you have applications on the cluster that use persistent volumes, the volumes may fail to detach and re-attach during upgrade, causing the upgrade process to hang or fail.

    Workaround: Follow the steps in Persistent volumes cannot attach to a new node if previous node is deleted (85213) in the VMware Knowledge Base.

vSphere Issues
  • Unable to get the non-admin kubeconfig for cluster if you use NSX Advanced Load Balancer as the control plane load balancer

    On vSphere, if you set AVI_CONTROL_PLANE_HA_PROVIDER: "true" to configure a cluster to use NSX Advanced Load Balancer as the control plane load balancer instead of Kube VIP, attempts to obtain the non-administrator kubeconfig of the cluster fail with the following error.

    $ tanzu management-cluster kubeconfig get
    Error: failed to get cluster-info from cluster: failed to get cluster-info from the end-point: Get "https://:0/api/v1/namespaces/kube-public/configmaps/cluster-info": dial tcp :0: connect: connection refused

    This happens due to a port mismatch between the Pinniped authentication service and NSX Advanced Load Balancer.

    Workaround: If you use an NSX Advanced Load Balancer as the control plane load balancer, see Enable Non-Admin Authentication in Configure Identity Management After Management Cluster Deployment.

  • Cannot use Velero to back up Kubernetes 1.20 clusters with persistent volumes on vSphere

    If you attempt to use Velero 1.5.3 to back up Kubernetes 1.20 clusters running on vSphere that have persistent volumes, the backup fails with the error in the backup logs:

    time="2021-04-02T16:29:05Z" level=info msg="1 errors encountered backup up item" backup=velero/nginx-backup logSource="pkg/backup/ backup.go:427" name=nginx-deployment-66689547d-d7n6c time="2021-04-02T16:29:05Z" level=error msg="Error backing up item" backup=velero/nginx-backup error="error executing custom action (groupResource=persistentvolumeclaims, namespace=nginx-example, name=nginx-logs): rpc error: code = Unknown desc = Failed during IsObjectBlocked check: Could not translate selfLink to CRD name" logSource="pkg/backup/backup.go:431" name=nginx-deployment-66689547d-d7n6

    This occurs because Kubernetes 1.20 deprecated selfLink.

    Workaround: See https://kb.vmware.com/s/article/83314.

  • Creating a workload cluster with multiple control plane nodes stalls as VIP of the kube-apiserver is lost

    When creating a workload cluster with multiple control plane nodes on vSphere, the following happens:

    • The first control plane node starts successfully.
    • When the second control plane node starts, the VIP on the first control plane is lost.
    • No IP addresses appear on the first control plane node in the vSphere Client.
    • The following event appears in the logs:
      [2021-04-22T19:43:16.516Z] [ warning] [guestinfo] *** WARNING: GuestInfo collection interval longer than expected; actual=511 sec, expected=30 sec. ***
    • The node shows an alert about high CPU utilization in the vSphere Client
    • The first control plane node becomes intermittently responsive, with a high load average:
      root@nv8-wl-02-control-plane-q58z7 [ ~ ]# uptime
      17:58:05 up 26 min,  1 user,  load average: 22.74, 84.23, 69.62 

    Workaround: Tune the kube-vip leader election parameters by updating the vSphere configuration with the following ytt overlay.

    1. Open the file ~/.tanzu/tkg/providers/infrastructure-vsphere/ytt/vsphere-overlay.yaml in a text editor.
    2. Paste the following into vsphere-overlay.yaml:
      #@ load("@ytt:overlay", "overlay")
      #@ load("@ytt:data", "data")
      #@ load("lib/helpers.star", "get_bom_data_for_tkr_name", "get_default_tkg_bom_data", "kubeadm_image_repo", "get_image_repo_for_component", "get_vsphere_thumbprint")
      #@ load("@ytt:yaml", "yaml")
      #@ bomData = get_default_tkg_bom_data()
      #@ def kube_vip_pod():
      apiVersion: v1
      kind: Pod
        creationTimestamp: null
        name: kube-vip
        namespace: kube-system
        - args:
          - start
          - name: vip_arp
            value: "true"
          - name: vip_leaderelection
            value: "true"
          - name: address
            value: #@ data.values.VSPHERE_CONTROL_PLANE_ENDPOINT
          - name: vip_interface
            value:  #@ data.values.VIP_NETWORK_INTERFACE
          - name: vip_leaseduration
            value: "30"
          - name: vip_renewdeadline
            value: "20"
          - name: vip_retryperiod
            value: "4"
          image: #@ "{}/{}:{}".format(get_image_repo_for_component(bomData.components["kube-vip"][0].images.kubeVipImage), bomData.components["kube-vip"][0].images.kubeVipImage.imagePath, bomData.components["kube-vip"][0].images.kubeVipImage.tag)
          imagePullPolicy: IfNotPresent
          name: kube-vip
          resources: {}
              - NET_ADMIN
              - SYS_TIME
          - mountPath: /etc/kubernetes/admin.conf
            name: kubeconfig
        hostNetwork: true
        - hostPath:
            path: /etc/kubernetes/admin.conf
            type: FileOrCreate
          name: kubeconfig
      status: {}
      #@ end
      #@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})
      apiVersion: controlplane.cluster.x-k8s.io/v1alpha3
      kind: KubeadmControlPlane
        name: #@ "{}-control-plane".format(data.values.CLUSTER_NAME)
          #@overlay/match by=overlay.index(0)
          - content: #@ yaml.encode(kube_vip_pod())
    3. Save and close the file.
    4. Attempt to deploy the workload cluster again.
  • Cannot delete cluster if AKO agent pod is not running correctly

    If you use NSX Advanced Load Balancer, attempts to use tanzu cluster delete to delete a workload cluster fail if the AVI Kubernetes Operator (AKO) agent pod is in the CreateContainerConfigError status:

    kubectl get po -n avi-system
     NAME  READY STATUS                     RESTARTS AGE
     ako-0 0/1   CreateContainerConfigError 0        94s

    The deletion process waits indefinitely for the AKO agent to clean up its related items.  


    1. Edit the cluster configuration.
      kubectl edit cluster cluster-name
    2. Under finalizers, remove the AKO related line 18, ako-operator.networking.tkg.tanzu.vmware.com:
       16   finalizers:
       17   - cluster.cluster.x-k8s.io
       18   - ako-operator.networking.tkg.tanzu.vmware.com

    The cluster will be successfully removed after a short time.

  • Management cluster create fails or performance slow with older NSX-T versions and Photon 3 or Ubuntu with Linux kernel 5.8 VMs

    Deploying a management cluster with the following infrastructure and configuration may fail or result in restricted traffic between pods:

    • vSphere with any of the following versions of NSX-T:

      • NSX-T v3.1.3 with Enhanced Datapath enabled

      • NSX-T v3.1.x lower than v3.1.3

      • NSX-T v3.0.x lower than v3.0.2 hot patch

      • NSX-T v2.x. This includes Azure VMware Solution (AVS) v2.0, which uses NSX-T v2.5

    • Base image: Photon 3 or Ubuntu with Linux kernel 5.8

    This combination exposes a checksum issue between older versions of NSX-T and Antrea CNI.


    • Upgrade to NSX-T v3.0.2 Hot Patch, v3.1.3, or later, without Enhanced Datapath enabled
    • Use an Ubuntu base image with Linux kernel 5.9 or later.
    • If the management cluster deploys successfully, run the following on all of its nodes:
      • ethtool -K eth0 tx-udp_tnl-segmentation off && ethtool -K eth0 tx-udp_tnl-csum-segmentation off

    This issue has easier workarounds in v1.4.2 and v1.5

  • Cluster nodes are reprovisioned during upgrade

    Moving or renaming the virtual machine template for the current Kubernetes version and then running tanzu cluster upgrade cause the cluster nodes to be reprovisioned.

    Workaround: If you have existing clusters in your vCenter, do not change the name or location of their base image template.  

  • System clock is not synchronized with NTP server

    Cluster VMs do not pick up your NTP configuration provided via DHCP Option 42. This issue may affect both Photon and Ubuntu.

    Workaround: If you are experiencing this issue, contact Support.

  • You cannot use node pools with Tanzu Kubernetes clusters created by using the Tanzu Kubernetes Grid service.

    Node pool support is a feature of the Tanzu CLI, but the TKGS context forbids the way the CLI's node pool facility templates out Cluster API resources.

    Workaround: None

  • Host network pods and node use the wrong IP in IPv6 clusters.

    When you deploy IPv6 clusters with multiple control plane nodes on vSphere and the clusters use Kubernetes 1.20.x or 1.21.x, one of your nodes as well as the etckube-apiserver, and kube-proxy pods may take on the IP you set for the VSPHERE_CONTROL_PLANE_ENDPOINT instead of an IP of their own. You might not see an error, but this could cause networking problems for these pods and prevent the control plane nodes from proper failover. To confirm this is your issue:

    1. Connect to the cluster and run kubectl get pods -A -o wide.
    2. Note the IPs for the etckube-apiserver, and kube-proxy pods.
    3. Run kubectl get nodes -o wide.
    4. Note the IP for the first node in the output. 
    5. Compare the IPs for the pods and node to see if they match the VSPHERE_CONTROL_PLANE_ENDPOINT you set in the cluster configuration file. 


  • When AVI_LABELS is set, ako-operator  causes high latency on the AVI Controller

    Due to a bug in the ako-operator package, setting the AVI_LABELS variable or configuring Cluster Labels (Optional) in the Configure VMware NSX Advanced Load Balancer section of the installer interface when creating the management cluster results in the package attempting to reconcile indefinitely. This generates a high volume of events on the AVI Controller.

    Workaround: If you are experiencing this issue, follow the steps below:

    1. Pause the reconciliation of the ako-operator package:
      kubectl patch pkgi ako-operator -n tkg-system --type "json" -p '[{"op":"replace","path":"/spec/paused","value":true}]'
    2. Remove the cluster selector in the default AKODeploymentConfig custom resource:
      kubectl patch adc install-ako-for-all --type "json" -p='[{"op":"remove","path":"/spec/clusterSelector"}]'
    3. Remove the labels that you defined in AVI_LABELS or Cluster Labels (Optional) from each affected workload cluster:
      kubectl label CLUSTER-NAME YOUR-AVI-LABELS-

      For example:

      kubectl label my-workload-cluster tkg.tanzu.vmware.com/ako-enabled=-

            The ako-operator package must remain in the paused state to persist this change.

check-circle-line exclamation-circle-line close-line
Scroll to top icon