VMware recommends that you install or upgrade to Tanzu Kubernetes Grid (TKG) v1.5.4, not previous v1.5 patch versions.
Except where noted, these release notes apply to all patch versions of Tanzu Kubernetes Grid, v1.5.0 through v1.5.4.
tanzu apps
commands.pinniped.supervisor_svc_external_dns
configuration setting supports using an FQDN as the callback URL for a Pinniped Supervisor (in v1.5.3+)NodePortLocal
mode. See L7 Ingress in NodePortLocal Mode.tanzu secret registry
commands manage secrets to enable cluster access to a private container registry. See Configure Authentication to a Private Container Registry.tanzu config set
and tanzu config unset
commands activate and deactivate CLI features and manage persistent environment variables. See Tanzu CLI Configuration.tanzu plugin sync
command discovers and downloads new CLI plugins that are associated with either a newer version of Tanzu Kubernetes Grid, or a package installed on your management cluster that your local CLI does not know about, for example if another user installed it. See Sync New Plugins.mc
alias for management-cluster
. See Tanzu CLI Command Reference.-v
and -f
flags to tanzu package installed update
enable updating package configuration without updating version.-p
flag to tanzu cluster scale
lets you specify a node pool when scaling node-pool nodes. See Update Node Pools.--machine-deployment-base
option to tanzu cluster node-pool set
specifies a base MachineDeployment
object from which to create a new node pool.tanzu management-cluster permissions aws generate-cloudformation-template
command retrieves the CloudFormation template to create the IAM resources required by Tanzu Kubernetes Grid’s account on AWS. See Permissions Set by Tanzu Kubernetes Grid.VSPHERE_INSECURE: true
in cluster configuration file.AVI_DISABLE_STATIC_ROUTE_SYNC
disables the static routing sync for AKO.AVI_MANAGEMENT_CLUSTER_SERVICE_ENGINE_GROUP
specifies the group name of the service engine that is to be used by AKO in the management cluster.AVI_INGRESS_NODE_NETWORK_LIST
describes the details of the network and the CIDRs that are used in the pool placement network for vCenter Cloud.CONTROL_PLANE_MACHINE_COUNT
and WORKER_MACHINE_COUNT
configuration variables customize management clusters, in addition to workload clusters.CLUSTER_API_SERVER_PORT
sets the port number of the Kubernetes API server, overriding default 6443
, for deployments without NSX Advanced Load Balancer.ANTREA_DISABLE_UDP_TUNNEL_OFFLOAD
disables Antrea’s UDP checksum offloading, to avoid known issues with underlay network and physical NIC network drivers. See Antrea CNI Configuration.AZURE_ENABLE_ACCELERATED_NETWORKING
toggles Azure accelerated networking. Defaults to true
and enables setting to false
on VMs with more than 4 CPUs.Each version of Tanzu Kubernetes Grid adds support for the Kubernetes version of its management cluster, plus additional Kubernetes versions, distributed as Tanzu Kubernetes releases (TKrs).
Any version of Tanzu Kubernetes Grid supports all TKr versions from the previous two minor lines of Kubernetes. For example, TKG v1.5.4 supports the Kubernetes versions v1.22.x, v1.21.x, and v1.20.x listed below, but not v1.19.x, v1.18.x, or v1.17.x.
Tanzu Kubernetes Grid Version | Kubernetes Version of Management Cluster |
Provided Kubernetes (TKr) Versions |
---|---|---|
1.5.4 | 1.22.9 | 1.22.9, 1.21.11, 1.20.15 |
1.5.3 | 1.22.8 | 1.22.8, 1.21.11, 1.20.15 |
1.5.2, 1.5.1, 1.5.0 | 1.22.5 | 1.22.5, 1.21.8, 1.20.14 |
1.4.2 | 1.21.8 | 1.21.8, 1.20.14, 1.19.16 |
1.4.0, 1.4.1 | 1.21.2 | 1.21.2, 1.20.8, 1.19.12 |
1.3.1 | 1.20.5 | 1.20,5, 1.19.9, 1.18.17 |
1.3.0 | 1.20.4 | 1.20.4, 1.19.8, 1.18.16, 1.17.16 |
Tanzu Kubernetes Grid v1.5 supports the following infrastructure platforms and operating systems (OSs), as well as cluster creation and management, networking, storage, authentication, backup and migration, and observability components. The component versions listed in parentheses are included in Tanzu Kubernetes Grid v1.4. For more information, see Component Versions.
vSphere | Amazon EC2 | Azure | |
Infrastructure platform |
|
Native AWS | Native Azure |
CLI, API, and package infrastructure | Tanzu Framework v0.11.6 | ||
Cluster creation and management | Core Cluster API (v1.0.1), Cluster API Provider vSphere (v1.0.2) | Core Cluster API (v1.0.1), Cluster API Provider AWS (v1.2.0) | Core Cluster API (v1.0.1), Cluster API Provider Azure (v1.0.1) |
Kubernetes node OS distributed with TKG | Photon OS 3, Ubuntu 20.04 | Amazon Linux 2, Ubuntu 20.04 | Ubuntu 18.04, Ubuntu 20.04 |
Build your own image | Photon OS 3, Red Hat Enterprise Linux 7, Ubuntu 18.04, Ubuntu 20.04, Windows 2019 | Amazon Linux 2, Ubuntu 18.04, Ubuntu 20.04 | Ubuntu 18.04, Ubuntu 20.04 |
Container runtime | Containerd (v1.5.7) | ||
Container networking | Antrea (v1.2.3), Calico (v3.19.1) | ||
Container registry | Harbor (v2.3.3) | ||
Ingress | NSX Advanced Load Balancer Essentials and Avi Controller (v20.1.3, v20.1.6, v20.1.7)*, Contour (v1.18.2, v1.17.2) | Contour (v1.18.2, v1.17.2) | Contour (v1.18.2, v1.17.2) |
Storage | vSphere Container Storage Interface (v2.4.1**) and vSphere Cloud Native Storage | In-tree cloud providers only | In-tree cloud providers only |
Authentication | OIDC via Pinniped (v0.12.1), LDAP via Pinniped (v0.12.1) and Dex | ||
Observability | Fluent Bit (v1.7.5), Prometheus (v2.27.0), Grafana (v7.5.7) | ||
Backup and migration | Velero (v1.8.1) |
NOTES:
For a full list of Kubernetes versions that ship with Tanzu Kubernetes Grid v1.5, see Supported Kubernetes Versions in Tanzu Kubernetes Grid above.
The Tanzu Kubernetes Grid v1.5 patch releases include the following software component versions. A blank space indicates that the component version is the same one that is listed to the left for a later patch release.
Component | TKG v1.5.4 | TKG v1.5.3 | TKG v1.5.2 | TKG v1.5.1 |
---|---|---|---|---|
aad-pod-identity | v1.8.0+vmware.1* | |||
addons-manager | v1.5.0_vmware.1-tkg.5 | v1.5.0_vmware.1-tkg.4 | v1.5.0_vmware.1-tkg.3 | |
ako-operator | v1.5.0_vmware.6-tkg.1 | v1.5.0_vmware.5 | v1.5.0_vmware.4* | |
alertmanager | v0.22.2+vmware.1 | |||
antrea | v1.2.3+vmware.4* | |||
cadvisor | v0.39.1+vmware.1 | |||
calico_all | v3.19.1+vmware.1* | |||
carvel-secretgen-controller | v0.7.1+vmware.1* | |||
cloud-provider-azure | v0.7.4+vmware.1 | |||
cloud_provider_vsphere | v1.22.4+vmware.1* | |||
cluster-api-provider-azure | v1.0.2+vmware.1 | v1.0.1+vmware.1* | ||
cluster_api | v1.0.1+vmware.1* | |||
cluster_api_aws | v1.2.0+vmware.1* | |||
cluster_api_vsphere | v1.0.3+vmware.1 | v1.0.2+vmware.1* | ||
cni_plugins | v1.1.1+vmware.2 | v0.9.1+vmware.8* | ||
configmap-reload | v0.5.0+vmware.2* | |||
containerd | v1.5.11+vmware.1 | v1.5.7+vmware.1 | ||
contour | v1.18.2+vmware.1, v1.17.2+vmware.1* |
|||
coredns | v1.8.4_vmware.9 | v1.8.4_vmware.7* | ||
crash-diagnostics | v0.3.7+vmware.5 | v0.3.7+vmware.3 | ||
cri_tools | v1.21.0+vmware.7* | |||
csi_attacher | v3.3.0+vmware.1* | |||
csi_livenessprobe | v2.4.0+vmware.1* | |||
csi_node_driver_registrar | v2.3.0+vmware.1* | |||
csi_provisioner | v3.0.0+vmware.1* | |||
dex | v2.30.2+vmware.1* | |||
envoy | v1.19.1+vmware.1, v1.18.4+vmware.1 |
|||
external-dns | v0.10.0+vmware.1* | |||
etcd | v3.5.4_vmware.2 | v3.5.2_vmware.3 | v3.5.0+vmware.7* | |
fluent-bit | v1.7.5+vmware.2* | |||
gangway | v3.2.0+vmware.2 | |||
grafana | v7.5.7+vmware.2* | |||
harbor | v2.3.3+vmware.1* | |||
image-builder | v0.1.11+vmware.3 | |||
imgpkg | v0.22.0+vmware.1 | v0.18.0+vmware.1* | ||
jetstack_cert-manager | v1.5.3+vmware.2* | |||
k8s-sidecar | v1.12.1+vmware.2* | |||
k14s_kapp | v0.42.0+vmware.2 | v0.42.0+vmware.1* | ||
k14s_ytt | v0.37.0+vmware.1 | v0.35.1+vmware.1* | ||
kapp-controller | v0.30.1_vmware.1-tkg.2 | v0.30.0+vmware.1-tkg.2 | v0.30.0+vmware.1-tkg.1* | |
kbld | v0.31.0+vmware.1* | |||
kube-state-metrics | v1.9.8+vmware.1 | |||
kube-vip | v0.3.3+vmware.1 | |||
kube_rbac_proxy | v0.8.0+vmware.1 | |||
kubernetes | v1.22.9+vmware.1 | v1.22.8+vmware.1 | v1.22.5+vmware.1-tkg.4 | v1.22.5+vmware.1-tkg.3* |
kubernetes-csi_external-resizer | v1.3.0+vmware.1* | |||
kubernetes-sigs_kind | v1.22.9+vmware.1-tkg.1_v0.11.1 | v1.22.8+vmware.1-tkg.1_v0.11.1 | v1.22.5+vmware.1_v0.11.1* | |
kubernetes_autoscaler | v1.22.0+vmware.1* | |||
load-balancer-and-ingress-service (AKO) | v1.6.1+vmware.4 | v1.6.1+vmware.2* | ||
metrics-server | v0.5.1+vmware.1* | |||
multus-cni** | v3.7.1_vmware.2* | |||
pinniped | v0.12.1+vmware.1 | v0.12.0+vmware.1* | ||
prometheus | v2.27.0+vmware.1 | |||
prometheus_node_exporter | v1.1.2+vmware.1 | |||
pushgateway | v1.4.0+vmware.1 | |||
standalone-plugins-package | v0.11.6-1-standalone-plugins | v0.11.4-1-standalone-plugins | v0.11.2-standalone-plugins | v0.11.1-standalone-plugins* |
sonobuoy | v0.54.0+vmware.1 | |||
tanzu-framework | v0.11.6-1 | v0.11.4-1 | v0.11.2 | v0.11.1*† |
tanzu-framework-addons | v0.11.6-1 | v0.11.4-1 | v0.11.2 | v0.11.1*† |
tanzu-framework-management-packages | v0.11.6-1 | v0.11.4-1 | v0.11.2 | v0.11.1*† |
tkg-bom | v1.5.4 | v1.5.3 | v1.5.2 | v1.5.1* |
tkg-core-packages | v1.22.9+vmware.1-tkg.1 | v1.22.8+vmware.1-tkg.1 | v1.22.5+vmware.1-tkg.4 | v1.22.5+vmware.1-tkg.3* |
tkg-standard-packages | v1.5.4 | v1.5.4 | v1.5.2 | v1.5.1* |
tkg_telemetry | v1.5.0+vmware.1* | |||
velero | v1.8.1+vmware.1 | v1.7.0+vmware.1* | ||
velero-plugin-for-aws | v1.4.1+vmware.1 | v1.3.0+vmware.1* | ||
velero-plugin-for-microsoft-azure | v1.4.1+vmware.1 | v1.3.0+vmware.1* | ||
velero-plugin-for-vsphere | v1.3.1+vmware.1 | v1.3.0+vmware.1* | ||
vendir | v0.23.1+vmware.1 | v0.23.0+vmware.1* | ||
vsphere_csi_driver | v2.4.1+vmware.1* | |||
windows-resource-bundle | v1.22.9+vmware.1-tkg.1 | v1.22.8+vmware.1-tkg.1 | v1.22.5+vmware.1-tkg.1 | v1.22.5+vmware.1-tkg.1 |
* Indicates a version bump or new component since v1.4.2, which is the latest release prior to v1.5.1.
† The version numbering scheme for Tanzu Framework changed when the project became open-source. Previously, Tanzu Framework version numbers matched the Tanzu Kubernetes Grid versions that included them.
For a complete list of software component versions that ship with Tanzu Kubernetes Grid v1.5.4, see ~/.config/tanzu/tkg/bom/tkg-bom-v1.5.4.yaml and ~/.config/tanzu/tkg/bom/tkr-bom-v1.22.9+vmware.1-tkg.1.yaml. For component versions in previous releases, see the tkg-bom- and tkr-bom- YAML files that install with those releases.
Caution: VMware recommends not installing or upgrading to Tanzu Kubernetes Grid v1.5.0-v1.5.3, due to a bug in the versions of etcd
in the versions of Kubernetes used by Tanzu Kubernetes Grid v1.5.0-v1.5.3. Tanzu Kubernetes Grid v1.5.4 resolves this problem by incorporating a fixed version of etcd
. For more information, see Resolved Issues below.
You can only upgrade to Tanzu Kubernetes Grid v1.5.x from v1.4.x. If you want to upgrade to Tanzu Kubernetes Grid v1.5.x from a version earlier than v1.4.x, you must upgrade to v1.4.x first.
When upgrading Kubernetes versions on Tanzu Kubernetes clusters, you cannot skip minor versions. For example, you cannot upgrade a Tanzu Kubernetes cluster directly from v1.20.x to v1.22.x. You must upgrade a v1.20.x cluster to v1.21.x before upgrading the cluster to v1.22.x.
Tanzu Kubernetes Grid v1.5 release dates are:
Tanzu Kubernetes Grid v1.5 introduces the following new behaviors compared with v1.4.
tanzu init
before deploying a management cluster.tanzu kubernetes-release
and tanzu cluster
commands, such as tanzu cluster list
, are unavailable.tanzu plugin sync
to install Tanzu CLI plugins for Tanzu Kubernetes Grid.The Tanzu Kubernetes Grid v1.5 documentation applies to all of the v1.5.x releases. It includes information about the following subjects:
The following issues are resolved in Tanzu Kubernetes Grid v1.5 patch versions as indicated.
The following issues are resolved in Tanzu Kubernetes Grid v1.5.4.
Workload cluster upgrade may hang or fail due to undetached persistent volumes
If you are upgrading your Tanzu Kubernetes clusters from Tanzu Kubernetes Grid v1.4.x to v1.5.x and you have applications on the cluster that use persistent volumes, the volumes may fail to detach and re-attach during upgrade, causing the upgrade process to hang or fail.
The same issue may manifest when you try to scale a cluster up or down, as described in Workload cluster scaling may hang or fail due to undetached persistent volumes.
Workaround: Follow the steps in Persistent volumes cannot attach to a new node if previous node is deleted (85213) in the VMware Knowledge Base.
Workload cluster scaling may hang or fail due to undetached persistent volumes
If you scale a workload cluster up or down, and you have applications on the cluster that use persistent volumes, the volumes may fail to detach and re-attach during the scaling process, causing an AttachVolume.Attach failed
error and timeout.
The same issue may manifest when you try to upgrade a cluster a cluster, as described in Workload cluster upgrade may hang or fail due to undetached persistent volumes.
Workaround: Follow the steps in Persistent volumes cannot attach to a new node if previous node is deleted (85213) in the VMware Knowledge Base.
Host network pods and node use the wrong IP in IPv6 clusters
The issue described in Host network pods and node use the wrong IP in IPv6 clusters below has been resolved in TKG v1.5.4 for IPv6 clusters based on Kubernetes v1.22.x.
It is still a Known Issue in TKG v1.5.4 for clusters based on Kubernetes v1.20.x and and v1.21.x.
etcd
v3.5.0-2 data inconsistency issue in Kubernetes v1.22.0-8
Kubernetes versions 1.22.0-1.22.8, which are included in Tanzu Kubernetes Grid v1.5.0-v1.5.3, use etcd
versions 3.5.0-3.5.2. These versions of etcd
have a bug that can result in data corruption. VMware recommends not installing or upgrading to Tanzu Kubernetes Grid v1.5.0-v1.5.3. Fixes for this bug are incorporated into Tanzu Kubernetes Grid v1.5.4.
If you encounter etcd
data inconsistency in Tanzu Kubernetes Grid v1.5.0-v1.5.3 or for additional information and a diagnostic procedure, see the VMware Knowledge Base article etcd v3.5.0-3.5.2 can corrupt data in TKG v1.5.0-1.5.3.
You must run the tanzu plugin sync
command after deploying a management cluster
After a management cluster has been deployed in Tanzu Kubernetes Grid v1.5.0, v1.5.1, v1.5.2, or v1.5.3, you need to run tanzu plugin sync
to install Tanzu CLI plugins.
Unstable file naming convention prevents automated retrieval of image binaries
Automated retrieval of signed image binaries fails due to unstable file naming convention.
The following issues are resolved in Tanzu Kubernetes Grid v1.5.3.
Unexpected VIP network separation after upgrading TKG management cluster to v1.5.2
For management clusters that were created with AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME
and _CIDR
set different from AVI_DATA_NETWORK
and _CIDR
, upgrading to v1.5.2 changes internal settings that cause subsequently-created workload clusters’ control plane VIP networks to be different from the management cluster’s VIP network. This issue causes no loss of connectivity.
Workaround: Specify the correct control plane network for NSX ALB to discover by running the following, in the management cluster kubeconfig
context:
kubectl patch akodeploymentconfig install-ako-for-all --type "json" -p '[
{"op":"replace","path":"/spec/controlPlaneNetwork/name","value":AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME},
{"op":"replace","path":"/spec/controlPlaneNetwork/cidr","value":AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR}]'
Where AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME
and _CIDR
are the name and CIDR range of the control plane network that you want to assign to your management cluster’s load balancers.
You cannot deploy or upgrade to TKG v1.5 if you are using a registry with a self-signed certificate
Management cluster deployment or upgrade fails when accessing an image registry with a custom certificate, such as in an internet-restricted environment.
Management and workload cluster upgrades fail on Azure
On Azure, running tanzu management-cluster upgrade
and tanzu cluster upgrade
fails if AZURE_CLIENT_SECRET
is not set as an environment variable.
Workaround: Before running tanzu management-cluster upgrade
or tanzu cluster upgrade
, set the AZURE_CLIENT_SECRET
environment variable. For more information, see this VMware Knowledge Base article.
load-balancer-and-ingress-service
(AKO) package fails to reconcile on newly created Windows workload cluster
After creating a Windows workload cluster that uses NSX Advanced Load Balancer, you may see the following error message:
Reconcile failed: Error (see .status.usefulErrorMessage for details)
avi-system ako-0 - Pending: Unschedulable (message: 0/2 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {os: windows}, that the pod didn't tolerate.)
Workaround: Add a tolerations setting to the AKO pod specifications by following the procedure in Add AKO Overlay in Windows Custom Machine Images.
Azure private workload cluster upgrades fail
Upgrading private workload clusters on Azure from v1.4.x fails with error failed to pull and unpack image
because the upgrade process does not correctly reattach the control plane nodes to the load balancer, preventing external network access for the upgraded nodes.
Cannot delete upgraded workload cluster
When you delete a workload cluster that has been upgraded from v1.4.x, a finalizer on the VSphereIdentitySecret
is not deleted, resulting in the workload cluster not being deleted.
Running Tanzu commands on Windows fails with a certificate error
When using the Tanzu CLI on Windows OS, all registry operations fails with x509: certificate signed by unknown authority
error.
Workaround:
Obtain a correct base64 encoded root certificate in PEM format from DigiCert Trusted Root Authority Certificates.
Supply it as an environmental override via the TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE
configuration variable. To do so, use the below command before running tanzu init
or tanzu plugin
:
$env:TKG_CUSTOM_IMAGE_REPOSITORY_CA_CERTIFICATE="LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURyekNDQXBlZ0F3SUJBZ0lRQ0R2Z1ZwQkNSckdoZFdySldaSEhTakFOQmdrcWhraUc5dzBCQVFVRkFEQmgKTVFzd0NRWURWUVFHRXdKVlV6RVZNQk1HQTFVRUNoTU1SR2xuYVVObGNuUWdTVzVqTVJrd0Z3WURWUVFMRXhCMwpkM2N1WkdsbmFXTmxjblF1WTI5dE1TQXdIZ1lEVlFRREV4ZEVhV2RwUTJWeWRDQkhiRzlpWVd3Z1VtOXZkQ0JEClFUQWVGdzB3TmpFeE1UQXdNREF3TURCYUZ3MHpNVEV4TVRBd01EQXdNREJhTUdFeEN6QUpCZ05WQkFZVEFsVlQKTVJVd0V3WURWUVFLRXd4RWFXZHBRMlZ5ZENCSmJtTXhHVEFYQmdOVkJBc1RFSGQzZHk1a2FXZHBZMlZ5ZEM1agpiMjB4SURBZUJnTlZCQU1URjBScFoybERaWEowSUVkc2IySmhiQ0JTYjI5MElFTkJNSUlCSWpBTkJna3Foa2lHCjl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUE0anZoRVhMZXFLVFRvMWVxVUtLUEMzZVF5YUtsN2hMT2xsc0IKQ1NETUFaT25UakMzVS9kRHhHa0FWNTNpalNMZGh3WkFBSUVKenM0Ymc3L2Z6VHR4UnVMV1pzY0ZzM1luRm85NwpuaDZWZmU2M1NLTUkydGF2ZWd3NUJtVi9TbDBmdkJmNHE3N3VLTmQwZjNwNG1WbUZhRzVjSXpKTHYwN0E2RnB0CjQzQy9keEMvL0FIMmhkbW9SQkJZTXFsMUdOWFJvcjVINGlkcTlKb3orRWtJWUl2VVg3UTZoTCtocWtwTWZUN1AKVDE5c2RsNmdTemVSbnR3aTVtM09GQnFPYXN2K3piTVVaQmZIV3ltZU1yL3k3dnJUQzBMVXE3ZEJNdG9NMU8vNApnZFc3alZnL3RSdm9TU2lpY05veEJOMzNzaGJ5VEFwT0I2anRTajFldFgramtNT3ZKd0lEQVFBQm8yTXdZVEFPCkJnTlZIUThCQWY4RUJBTUNBWVl3RHdZRFZSMFRBUUgvQkFVd0F3RUIvekFkQmdOVkhRNEVGZ1FVQTk1UU5WYlIKVEx0bThLUGlHeHZEbDdJOTBWVXdId1lEVlIwakJCZ3dGb0FVQTk1UU5WYlJUTHRtOEtQaUd4dkRsN0k5MFZVdwpEUVlKS29aSWh2Y05BUUVGQlFBRGdnRUJBTXVjTjZwSUV4SUsrdDFFbkU5U3NQVGZyZ1QxZVhrSW95UVkvRXNyCmhNQXR1ZFhIL3ZUQkgxakx1RzJjZW5Ubm1DbXJFYlhqY0tDaHpVeUltWk9Na1hEaXF3OGN2cE9wLzJQVjVBZGcKMDZPL25Wc0o4ZFdPNDFQMGptUDZQNmZidEdiZlltYlcwVzVCamZJdHRlcDNTcCtkV09JcldjQkFJKzB0S0lKRgpQbmxVa2lhWTRJQklxRGZ2OE5aNVlCYmVyT2dPelc2c1JCYzRMMG5hNFVVK0tyazJVODg2VUFiM0x1akVWMGxzCllTRVkxUVN0ZUR3c09vQnJwK3V2RlJUcDJJbkJ1VGhzNHBGc2l2OWt1WGNsVnpEQUd5U2o0ZHpwMzBkOHRiUWsKQ0FVdzdDMjlDNzlGdjFDNXFmUHJtQUVTcmNpSXhwZzBYNDBLUE1icDFaV1ZiZDQ9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0KCg=="
Tanzu CLI secret
and package
commands return error
If you deployed clusters using an alternative to Tanzu Kubernetes Grid, such as Spectro Cloud or Tanzu Kubernetes Grid Integrated Edition, you may see the following error when running Tanzu CLI package
and secret
commands on an authenticated package repository:
Error: Unable to set up rest mapper: no Auth Provider found for name "oidc"
With multiple vCenters, recreating node can lose IP address
In vSphere deployments with multiple datacenters, shutting down and restarting a node from the vCenter UI sometimes lost the node’s IP address, causing its Antrea pod to crash.
Workaround: Delete the vsphere-cloud-controller-manager
pod so that it re-creates. Retrieve the pod name with kubectl get pods --namespace=kube-system
and delete it with kubectl delete pod
.
Pinniped remote authentication does not support Chrome 98 browser
Running the procedure Authenticate Users on a Machine Without a Browser with a local machine that has Chrome 98 as the default browser generates an error instead of opening up an IdP login page.
credentials update
needed for vSphere passwords that contain single-quote ('
) character
Upgrading TKG on a vSphere account with a password containing the single-quote character requires running tanzu mc credentials update
before running tanzu mc update
.
The following issues are resolved in Tanzu Kubernetes Grid v1.5.2.
NSX ALB setup lists identical names when multiple clusters share networks
The Avi Controller retrieves port group information from vCenter, which does not include the ports T1 and segment associations that are set in the NSX-T dashboard. When setting up load-balancing for deployments where multiple clusters share networks via NSX-T, this can mean having to choose port groups from lists with identical names in the Avi Controller UI.
Tanzu Kubernetes Grid v1.5.2+ supports enabling the Avi Controller to retrieve network information from NSX-T. This lets users disambiguate port groups that have identical names but are attached to different T1 routers.
Note: For v4.0+, VMware NSX-T Data Center is renamed to “VMware NSX.”
Node pool operations do not work in proxied environments
Tanzu CLI tanzu cluster node-pool
commands do not work in proxied environments.
Commands tanzu cluster node-pool scale
and tanzu cluster node-pool delete
target the wrong node pool.
Because of an internal regex mismatch, the commands tanzu cluster node-pool scale
and tanzu cluster node-pool delete
sometimes operate on a node pool other than the one specified in the command.
The following issues are resolved in Tanzu Kubernetes Grid v1.5.1.
Tanzu CLI does not support node-pool
commands for Tanzu Kubernetes Grid service (TKGS) clusters.
Tanzu CLI tanzu cluster node-pool
commands are not supported for Tanzu Kubernetes clusters created by using the Tanzu Kubernetes Grid service on vSphere 7.0 U3.
Management cluster installation and upgrade fail in airgapped environment
In an airgapped environment, running tanzu management-cluster create
or tanzu management-cluster upgrade
fails when the kind
process attempts to retrieve a pause
v3.5 image from k8s.gcr.io
.
Management cluster upgrade fails on AWS with Ubuntu v20.04
On Amazon EC2, with a management cluster based on Ubuntu v20.04 nodes, running tanzu management-cluster upgrade
fails after the kind
process retrieves an incompatible pause
version (v3.6) image from k8s.gcr.io
.
Editing cluster resources on AWS with Calico CNI produces errors.
Known Issue In: v1.4.0, v1.4.1
Adding or removing a resource for a workload cluster on AWS on a TKG deployment that uses the Calico CNI produces errors if you do not manually add an ingress role.
CAPV controller parses datacenter incorrectly in multi-datacenter vSphere environment.
Known Issue In: v1.4.0, v1.4.1
During upgrade to Tanzu Kubernetes Grid v1.4.1 on vSphere, if you have multiple datacenters running within a single vCenter, the CAPV controller failed to find datacenter contents, causing upgrade failure and possible loss of data.
Management cluster create fails with Linux or MacOS bootstrap machines running cgroups v2 in their Linux kernel, and Docker Desktop v4.3.0 or later.
Known Issue In: v1.4.0, v1.4.1
Due to the version of kind
that the v1.4.0 CLI uses to build its container image, bootstrap machines running cgroups v2
fail to run the image.
Upgrading management cluster does not automatically create tanzu-package-repo-global
namespace.
Known Issue In: v1.4.x
When you upgrade to Tanzu Kubernetes Grid v1.4.x, you need to manually create the tanzu-package-repo-global
namespace and associated package repository.
Changing name or location of virtual machine template for current Kubernetes version reprovisions cluster nodes when running tanzu cluster upgrade
.
Known Issue In: v1.3.x, v1.4.x
Moving or renaming the virtual machine template in your vCenter and then running tanzu cluster upgrade
causes cluster nodes to be reprovisioned with new IP addresses.
The following are known issues in Tanzu Kubernetes Grid v1.5.4. See Resolved Issues for issues that are resolved in v1.5.4, but that applied to earlier patch versions of TKG v1.5.
kapp-controller crashes when upgrading a management cluster from TKG v1.5.1-2 to v1.5.3.
In TKG v1.5.1 and v1.5.2 there is a known custom certificate airgap issue that makes kapp-controller
enter a crashLoopBackoff
state when upgrading a management cluster to v1.5.3. To resolve this issue, you must perform the below workaround before upgrading from v1.5.1-2 to v1.5.3.
Workaround: See step 3 in Upgrade Management Cluster.
kapp-controller
generates ctrl-change
ConfigMap objects, even if there is no change
The CustomResourceDefinition
objects that define configurations for Calico, AKO Operator, and other packages include a status
field. When the kapp-controller
reconciles these CRD objects every five minutes, it interprets their status
as having changed even when the package configuration did not change. This causes the kapp-controller
to generate unnecessary, duplicate ctrl-change
ConfigMap
objects, which soon overrun their history buffer because each package saves a maximum of 200 ctrl-change
ConfigMap
records.
Workaround: None
Tanzu Standard repository v1.5.x packages do not work with vSphere with Tanzu
You cannot install Tanzu Standard repository v1.5.x packages into Tanzu Kubernetes clusters created by using vSphere with Tanzu. The packages have not been validated for vSphere with Tanzu clusters.
Workaround: If you are running Tanzu Kubernetes Grid v1.4 packages on Tanzu Kubernetes clusters created by using the Tanzu Kubernetes Grid service, do not upgrade to v1.5 until a fix is available.
Shared Services Cluster Does Not Work with TKGS
Tanzu Kubernetes Grid Service (TKGS) does not support deploying packages to a shared services cluster. Workload clusters deployed by TKGS can only use packaged services deployed to the workload clusters themselves.
Workaround: None
Pinniped authentication error on workload cluster after upgrading management cluster
When attempting to authenticate to a workload cluster associated with the upgraded management cluster, you receive an error message similar to the following:
Error: could not complete Pinniped login: could not perform OIDC discovery for "https://IP:PORT": Get "https://IP:PORT/.well-known/openid-configuration": x509: certificate signed by unknown authority
Workaround: See Pinniped Authentication Error on Workload Cluster After Management Cluster Upgrade.
Kapp Controller crashLoopBackoff
state requires recreating Carvel API after offline upgrade from v1.5.1 or v1.5.2
The kapp-controller
component may enter a crashLoopBackoff
state when upgrading TKG from v1.5.1 or v1.5.2 in an internet-restricted environment with a private registry that uses a custom certificate.
Workaround: Run kubectl delete apiservice v1alpha1.data.packaging.carvel.dev
and then run the tanzu mc upgrade
command again.
Upgrade ignores custom bootstrap token duration
Management cluster upgrade ignores CAPBK_BOOTSTRAP_TOKEN_TTL
setting configured in TKG v1.4.2+ to extend bootstrap token TTL during cluster initialization. This may cause timeout. The default TTL is 15m
.
If you no longer have the management cluster’s configuration file, you can use kubectl
to determine its CAPBK_BOOTSTRAP_TOKEN_TTL
setting:
kubectl get pods -n capi-kubeadm-bootstrap-system
to list the bootstrap pods.capi-kubeadm-bootstrap-controller-manager
pod and output its description in YAML. For example: kubectl get pod -n capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-7ffb6dc8fc-hzm7l -o yaml
spec.containers[0].args
. If the argument --bootstrap-token-ttl
is present and is set to something other than 15m
(the default value), then the value was customized and requires the workaround below.Workaround: Before running tanzu mc upgrade, set CAPBK_BOOTSTRAP_TOKEN_TTL
as an environment variable. For example:
export CAPBK_BOOTSTRAP_TOKEN_TTL=30m
After upgrading a cluster with fewer than 3 control plane nodes, such as dev
plan clusters, kapp-controller
fails to reconcile its CSI package
When viewing the status of the CSI PackageInstall
resource, you may see Reconcile failed
:
kubectl get pkgi -A
In CSI v2.4, the default number of CSI replicas changed from 1
to 3
. For clusters with fewer than 3 control plane nodes (e.g. 1, to preserve quorum), this change disables kapp-controller
from matching the current number of CSI replicas to the desired state.
Workaround: Patch deployment_replicas
to match the number of control plane nodes:
Retrieve the data values file corresponding to the secret for CLUSTER-NAME-vsphere-csi-addon
in the management cluster. For example:
kubectl get secrets tkg-mgmt-vsphere-csi-addon -n tkg-system -o jsonpath={.data.values\\.yaml} | base64 -d > values.yaml
Add this line to the end of the new values.yaml
file:
deployment_replicas: 1
Update the CLUSTER-NAME-vsphere-csi-addon
secret. For example:
kubectl create secret generic tkg-mgmt-vsphere-csi-addon -n tkg-system --type=tkg.tanzu.vmware.com/addon --from-file=values.yaml=values.yaml --dry-run=client -o yaml | kubectl replace -f -
Add labels to CLUSTER-NAME-vsphere-csi-addon
as follows:
kubectl label secret CLUSTER_NAME-vsphere-csi-addon tkg.tanzu.vmware.com/cluster-name=tkg-mgmt
kubectl label secret CLUSTER_NAME-vsphere-csi-addon tkg.tanzu.vmware.com/addon-name=vsphere-csi
Wait about 10 minutes for kapp
to reconcile the change, and then confirm that vsphere-csi
has 1 replica and status Reconcile succeeded
:
kubectl get pkgi vsphere-csi -n tkg-system
Verify that vsphere-csi
has DESCRIPTION
Reconcile succeeded
.
kubectl get pods -n kube-system
Verify only one vsphere-csi-controller-*
pod, with multiple READY
containers
kubectl get rs -n kube-system
Verify 1
desired replica of vsphere-csi-controller-*
pod
On vSphere with Tanzu, tanzu cluster list
generates error for DevOps users
When a user with the DevOps engineer role, as described in vSphere with Tanzu User Roles and Workflows, runs tanzu cluster list
, they may see an error resembling Error: unable to retrieve combined cluster info: unable to get list of clusters. User cannot list resource "clusters" at the cluster scope
.
This happens because the tanzu cluster command
without a -n
option attempts to access all namespaces, some of which may not be accessible to a DevOps engineer user.
Workaround: When running tanzu cluster list
, include a --namespace
value to specify a namespace that the user can access.
Non-alphanumeric characters cannot be used in HTTP/HTTPS proxy passwords
When deploying management clusters with CLI, the non-alphanumeric characters # ` ^ | / ? % ^ { [ ] } \ " < >
cannot be used in passwords. Also, any non-alphanumeric character cannot be used in HTTP/HTTPS proxy passwords when deploying management cluster with UI.
Workaround: You can use non-alphanumeric characters other than # ` ^ | / ? % ^ { [ ] } \ " < >
in passwords when deploying management cluster with CLI.
Tanzu CLI does not work on macOS machines with ARM processors
Tanzu CLI v0.11.6 does not work on macOS machines with ARM (Apple M1) chips, as identified under Finder > About This Mac > Overview.
Workaround: Use a bootstrap machine with a Linux or Windows OS, or a macOS machine with an Intel processor.
Windows CMD: Extraneous characters in CLI output column headings
In the Windows command prompt (CMD), Tanzu CLI command output that is formatted in columns includes extraneous characters in column headings.
The issue does not occur in Windows Terminal or PowerShell.
Workaround: On Windows bootstrap machines, run the Tanzu CLI from Windows Terminal.
Ignorable AKODeploymentConfig
error during management cluster creation
Running tanzu management-cluster create
to create a management cluster with NSX ALB outputs the following error: no matches for kind “AKODeploymentConfig” in version “networking.tkg.tanzu.vmware.com/v1alpha1”
. The error can be ignored. For more information, see this article in the KB.
Ignorable machinehealthcheck
and clusterresourceset
errors during workload cluster creation on vSphere
When a workload cluster is deployed to vSphere by using the tanzu cluster create
command through vSphere with Tanzu, the output might include errors related to running machinehealthcheck
and accessing the clusterresourceset
resources, as shown below:
Error from server (Forbidden): error when creating "/tmp/kubeapply-3798885393": machinehealthchecks.cluster.x-k8s.io is forbidden: User "sso:Administrator@vsphere.local" cannot create resource "machinehealthchecks" in API group "cluster.x-k8s.io" in the namespace "tkg"
...
Error from server (Forbidden): error when retrieving current configuration of: Resource: "addons.cluster.x-k8s.io/v1beta1, Resource=clusterresourcesets", GroupVersionKind: "addons.cluster.x-k8s.io/v1beta1, Kind=ClusterResourceSet"
...
The workload cluster is successfully created. You can ignore the errors.
CLI temporarily misreports status of recently deleted nodes when MHCs are disabled
When machine health checks (MHCs) are disabled, then Tanzu CLI commands such as tanzu cluster status
may not report up-to-date node state while infrastructure is being recreated.
Workaround: None
Workload cluster cannot distribute storage across multiple datastores
You cannot enable a workload cluster to distribute storage across multiple datastores as described in Deploy a Cluster that Uses a Datastore Cluster. If you tag multiple datastores in a datastore cluster as the basis for a workload cluster’s storage policy, the workload cluster uses only one of the datastores.
Workaround: None
Node pool labels
and other configuration properties cannot be changed
You cannot add to or otherwise change an existing node pool’s labels
, az
, nodeMachineType
or vSphere properties, as listed in Configuration Properties.
Workaround: Create a new node pool in the cluster with the desired properties, migrate workloads to the new node pool, and delete the original.
Node pools created with small
nodes may stall at Provisioning
Node pools created with node SIZE
configured as small
may become stuck in the Provisioning
state and never proceed to Running
.
Workaround: Configure node pool with at least medium
size nodes.
Host network pods and node use the wrong IP in IPv6 clusters
This issue is resolved in Tanzu Kubernetes Grid v1.5.4 for IPv6 clusters based on Kubernetes v1.22.x.
When you deploy IPv6 clusters based on Kubernetes v1.20.x or v1.21.x with multiple control plane nodes on vSphere, one of your nodes as well as the etc
, kube-apiserver
, and kube-proxy
pods may take on the IP you set for the VSPHERE_CONTROL_PLANE_ENDPOINT
instead of an IP of their own. You might not see an error, but this could cause networking problems for these pods and prevent the control plane nodes from proper failover. To confirm this is your issue:
kubectl get pods -A -o wide
.etc
, kube-apiserver
, and kube-proxy
pods.kubectl get nodes -o wide
.VSPHERE_CONTROL_PLANE_ENDPOINT
you set in the cluster configuration file.Workaround: Use TKG v1.5.4 with clusters based on Kubernetes v1.22.x.
When AVI_LABELS
is set, ako-operator
causes high latency on the AVI Controller
Due to a bug in the ako-operator
package, setting the AVI_LABELS
variable or configuring Cluster Labels (Optional) in the Configure VMware NSX Advanced Load Balancer section of the installer interface when creating the management cluster results in the package attempting to reconcile indefinitely. This generates a high volume of events on the AVI Controller.
Workaround: If you are experiencing this issue, follow the steps below:
Pause the reconciliation of the ako-operator
package:
kubectl patch pkgi ako-operator -n tkg-system --type "json" -p '[{"op":"replace","path":"/spec/paused","value":true}]'
Remove the cluster selector in the default AKODeploymentConfig
custom resource:
kubectl patch adc install-ako-for-all --type "json" -p='[{"op":"remove","path":"/spec/clusterSelector"}]'
Remove the labels that you defined in AVI_LABELS
or Cluster Labels (Optional) from each affected workload cluster:
kubectl label CLUSTER-NAME YOUR-AVI-LABELS-
For example:
kubectl label my-workload-cluster tkg.tanzu.vmware.com/ako-enabled=-
The ako-operator
package must remain in the paused state to persist this change.
With NSX ALB, cannot create cluster in NAMESPACE
that has name beginning with numeric character
On vSphere with NSX Advanced Load Balancer, creating a workload cluster from Tanzu Mission Control or by running tanzu cluster create
fails if its management namespace, set by the NAMESPACE
configuration variable, begins with a numeric character (0
-9
).
Workaround: Deploy workload clusters to management namespaces that do not start with numeric characters.
With NSX ALB, cannot create clusters with identical names
If you are using NSX Advanced Load Balancer for workloads (AVI_ENABLE
) or the control plane (AVI_CONTROL_PLANE_HA_PROVIDER
) the Avi Controller may fail to distinguish between identically-named clusters.
Workaround: Set a unique CLUSTER_NAME
value for each cluster:
Management clusters: Do not create multiple management clusters with the same CLUSTER_NAME
value, even from different bootstrap machines.
Workload clusters: Do not create multiple workload clusters that have the same CLUSTER_NAME
and are also in the same management cluster namespace, as set by their NAMESPACE
value.
Adding external identity management to an existing deployment may require setting dummy VSPHERE_CONTROL_PLANE_ENDPOINT
value
Integrating an external identity provider with an existing TKG deployment may require setting a dummy VSPHERE_CONTROL_PLANE_ENDPOINT
value in the management cluster configuration file used to create the add-on secret, as described in Generate the Pinniped Add-on Secret for the Management Cluster
Deleting cluster on AWS fails if cluster uses networking resources not deployed with Tanzu Kubernetes Grid.
The tanzu cluster delete
and tanzu management-cluster delete
commands may hang with clusters that use networking resources created by the AWS Cloud Controller Manager independently from the Tanzu Kubernetes Grid deployment process. Such resources may include load balancers and other networking services, as listed in The Service Controller in the Kubernetes AWS Cloud Provider documentation.
For more information, see the Cluster API issue Drain workload clusters of service Type=Loadbalancer on teardown.
Workaround: Use kubectl delete
to delete services of type LoadBalancer
from the cluster. Or if that fails, use the AWS console to manually delete any LoadBalancer
and SecurityGroup
objects created for this service by the Cloud Controller manager. Warning: Do not to delete load balancers or security groups managed by Tanzu, which have the tags key: sigs.k8s.io/cluster-api-provider-aws/cluster/CLUSTER-NAME
, value: owned
.
You cannot upgrade Windows workload clusters to v1.5
You cannot upgrade Windows workload clusters from TKG v1.4 to v1.5
Workaround: After upgrading your management cluster to TKG v1.5, create a new Windows workload cluster with a Windows Server 2019 ISO image and migrate all workloads from the TKG v1.4 cluster to the v1.5 cluster.
You cannot create a Windows machine image on a MacOS machine
Due to an issue with the open-source packer
utility used by Kubernetes Image Builder, you cannot build a Windows machine image on a MacOS machine as described in Windows Custom Machine Images.
Workaround: Use a Linux machine to build your custom Windows machine images.
Pinniped fails to reconcile on newly created Windows workload cluster
After creating a Windows workload cluster that uses an external identity provider, you may see the following error message:
Reconcile failed: Error (see .status.usefulErrorMessage for details)
pinniped-supervisor pinniped-post-deploy-job - Waiting to complete (1 active, 0 3h failed, 0 succeeded)^ pinniped-post-deploy-job--1-kfpr5 - Pending: Unschedulable (message: 0/2 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {os: windows}, that the pod didn't tolerate.)
Workaround: Add a tolerations setting to the Pinniped secret by following the procedure in Add Pinniped Overlay in Windows Custom Machine Images.
Note: For v4.0+, VMware NSX-T Data Center is renamed to “VMware NSX.”
Management cluster create fails or performance slow with older NSX-T versions and Photon 3 or Ubuntu with Linux kernel 5.8 VMs
Deploying a management cluster with the following infrastructure and configuration may fail or result in restricted traffic between pods:
This combination exposes a checksum issue between older versions of NSX-T and Antrea CNI.
TMC: If the management cluster is registered with Tanzu Mission Control (TMC) there is no workaround to this issue. Otherwise, see the workarounds below.
Workarounds:
ANTREA_DISABLE_UDP_TUNNEL_OFFLOAD
set to "true"
. This setting disables Antrea’s UDP checksum offloading, which avoids the known issues with some underlay network and physical NIC network drivers.vSphere CSI volume deletion may fail on AVS
On Azure vSphere Solution (AVS), vSphere CSI Persistent Volumes (PVs) deletion may fail. Deleting a PV requires the cns.searchable permission. The default admin account for AVS, cloudadmin@vsphere.local, is not created with this permission. For more information, see vSphere Roles and Privileges.
Workaround: To delete a vSphere CSI PV on AVS, contact Azure support.
You cannot use Harbor in proxy cache mode for running Tanzu Kubernetes Grid in an internet-restricted environment. Prior versions of Tanzu Kubernetes Grid supported the Harbor proxy cache feature.
Workaround: None