Except where noted, these release notes apply to all v1.6.x patch versions of Tanzu Kubernetes Grid.
Tanzu Kubernetes Grid 1.6 adds the support for vSphere 8 described in the following sections.
You can use the versions of the Tanzu CLI included in Tanzu Kubernetes Grid 1.6.0 and 1.6.1 to connect to the Supervisor in vSphere with Tanzu on vSphere 8. Once connected to the Supervisor, you can use the Tanzu CLI to create class-based workload clusters. For more information, see Create and Manage TKG 2.1 Clusters with the Tanzu CLI in the TKG 2.1 documentation.
Using a standalone Tanzu Kubernetes Grid management cluster of a version up to and including v1.6.0 on vSphere 8 is not supported. If you are using vSphere 8 without a Supervisor, you must use Tanzu Kubernetes Grid v1.6.1 to deploy standalone management clusters. For more information, see Deploy Management Clusters.
If you have an existing 1.x management cluster that is already running on vSphere 7 without a Supervisor Cluster and you want to upgrade vSphere to vSphere 8, upgrade the management cluster and its workload clusters to Tanzu Kubernetes Grid v1.6.1, ideally before you upgrade from vSphere 7 to vSphere 8. If you are running a standalone management cluster on vSphere 7 and your vSphere instance is upgraded to vSphere 8 before you upgrade to Tanzu Kubernetes Grid v1.6.1, you must upgrade the management cluster and workload clusters to v1.6.1 as soon as possible. For more information, see Upgrade Tanzu Kubernetes Grid.
Tanzu Kubernetes Grid v1.6.1 includes the following new features:
ImportantMachine images for TKG 1.6.1 are not hardened to Security Technical Implementation Guides (STIG) or Center for Internet Security (CIS) standards.
Tanzu Kubernetes Grid v1.6.0 includes the following new features:
VSPHERE_CONTROL_PLANE_ENDPOINT
set to an FQDN instead of an IP address without having to apply an overlay.Tanzu CLI:
tanzu feature activate
, tanzu feature deactivate
, and tanzu feature list
manage features that are available in your target management cluster. For more information, see tanzu feature
.tanzu config set edition
sets the Tanzu CLI edition. For more information, see CLI Configuration.generate-default-values-file
flag of the tanzu package available get
command creates a configuration file with default values for the specified package. See Get the Details of an Available Package.--local
flag to tanzu plugin install
lets you install plugins to your local machine for use in airgapped environments.kctrl
-based tanzu package
commands:
kctrl
, described in the Carvel docs.kctrl
mode is deactivated by default. With kctrl
mode enabled, tanzu package
commands work identically to kctrl package
commands.kctrl
mode adds the following commands, extending observability and debugging functionality:
package installed pause
pauses the reconciliation of a package install.package installed kick
triggers the reconciliation of a package install.package [...] status
appended to the install
, installed create
, and installed update
commands tails the status of the command operation.kctrl
mode for tanzu package
commands by running tanzu config set features.package.kctrl-package-command-tree true
.tanzu package
command group when the kctrl
mode is activated. For more information, see tanzu package with kctrl.telemetry
plugin replaces tanzu mc ceip-participation
commands with tanzu telemetry
commands. See Manage Participation in CEIP for details.tanzu cluster list
tanzu cluster get
tanzu mc get
Cluster configuration variables:
VSPHERE_CONTROL_PLANE_ENDPOINT
to an FQDN without needing a custom overlay.AVI_LABELS
variable is supported for workload clusters. For more information, see NSX Advanced Load Balancer.The AVI_CONTROLLER_VERSION
cluster configuration variable is not needed because the AKO operator automatically detects the Avi Controller version that is in use.
Cluster configuration variables expand control of Antrea behavior: ANTREA_EGRESS
, ANTREA_EGRESS_EXCEPT_CIDRS
, ANTREA_ENABLE_USAGE_REPORTING
, ANTREA_FLOWEXPORTER
, ANTREA_FLOWEXPORTER_ACTIVE_TIMEOUT
, ANTREA_FLOWEXPORTER_COLLECTOR_ADDRESS
, ANTREA_FLOWEXPORTER_IDLE_TIMEOUT
, ANTREA_FLOWEXPORTER_POLL_INTERVAL
, ANTREA_IPAM
, ANTREA_KUBE_APISERVER_OVERRIDE
, ANTREA_MULTICAST
, ANTREA_MULTICAST_INTERFACES
, ANTREA_NETWORKPOLICY_STATS
, ANTREA_NODEPORTLOCAL_ENABLED
, ANTREA_NODEPORTLOCAL_PORTRANGE
, ANTREA_PROXY_ALL
, ANTREA_PROXY_LOAD_BALANCER_IPS
, ANTREA_PROXY_NODEPORT_ADDRS
, ANTREA_PROXY_SKIP_SERVICES
, ANTREA_SERVICE_EXTERNALIP
, ANTREA_TRANSPORT_INTERFACE
, and ANTREA_TRANSPORT_INTERFACE_CIDRS
. For more information, see Antrea CNI Configuration.
Cluster configuration variables allow setting up GPU-enabled clusters: VSPHERE_CONTROL_PLANE_CUSTOM_VMX_KEYS
, VSPHERE_CONTROL_PLANE_PCI_DEVICES
, VSPHERE_IGNORE_PCI_DEVICES_ALLOW_LIST
, VSPHERE_WORKER_PCI_DEVICES
, VSPHERE_WORKER_CUSTOM_VMX_KEYS
, WORKER_ROLLOUT_STRATEGY
. For more information, see GPU-Enabled Clusters.
Package configuration variables:
loadBalancerIP
for the Envoy service. For more information, see Implement Ingress Control with Contour.calico.config.skipCNIBinaries
that, if set to true
, prevents Calico from overwriting the settings of existing CNI plugins during cluster upgrading. For more information, see Updating Package Configuration.Each version of Tanzu Kubernetes Grid adds support for the Kubernetes version of its management cluster, plus additional Kubernetes versions, distributed as Tanzu Kubernetes releases (TKrs).
Any version of Tanzu Kubernetes Grid supports all TKr versions from the previous two minor lines of Kubernetes. For example, TKG v1.6.x supports the Kubernetes versions v1.23.x, v1.22.x, and v1.21.x listed below, but not v1.20.x, v1.19.x, or v1.18.x.
Tanzu Kubernetes Grid Version | Kubernetes Version of Management Cluster |
Provided Kubernetes (TKr) Versions |
---|---|---|
1.6.1 | 1.23.10 | 1.23.10, 1.22.13, 1.21.14 |
1.6.0 | 1.23.8 | 1.23.8, 1.22.11, 1.21.14 |
1.5.4 | 1.22.9 | 1.22.9, 1.21.11, 1.20.15 |
1.5.3 | 1.22.8 | 1.22.8, 1.21.11, 1.20.15 |
1.5.2, 1.5.1, 1.5.0 | 1.22.5 | 1.22.5, 1.21.8, 1.20.14 |
1.4.2 | 1.21.8 | 1.21.8, 1.20.14, 1.19.16 |
1.4.0, 1.4.1 | 1.21.2 | 1.21.2, 1.20.8, 1.19.12 |
Tanzu Kubernetes Grid v1.6.1 supports the following infrastructure platforms and operating systems (OSs), as well as cluster creation and management, networking, storage, authentication, backup and migration, and observability components. The component versions listed in parentheses are included in Tanzu Kubernetes Grid v1.6.1. For more information, see Component Versions.
vSphere | AWS | Azure | |
Infrastructure platform |
|
Native AWS | Native Azure |
CLI, API, and package infrastructure | Tanzu Framework v0.25.4 | ||
Cluster creation and management | Core Cluster API (v1.1.5), Cluster API Provider vSphere (v1.3.5) | Core Cluster API (v1.1.5), Cluster API Provider AWS (v1.2.0) | Core Cluster API (v1.1.5), Cluster API Provider Azure (v1.4.5) |
Kubernetes node OS distributed with TKG | Photon OS 3, Ubuntu 20.04 | Amazon Linux 2, Ubuntu 20.04 | Ubuntu 18.04, Ubuntu 20.04 |
Build your own image | Photon OS 3, Red Hat Enterprise Linux 7**** and 8, Ubuntu 18.04, Ubuntu 20.04, Windows 2019 | Amazon Linux 2, Ubuntu 18.04, Ubuntu 20.04 | Ubuntu 18.04, Ubuntu 20.04 |
Container runtime | Containerd (v1.6.6) | ||
Container networking | Antrea (v1.5.3, in VMware Container Networking with Antrea 1.4.0), Calico (v3.22.1) | ||
Container registry | Harbor (v2.6.1) | ||
Ingress | NSX Advanced Load Balancer Essentials and Avi Controller (v20.1.6, v20.1.7, v20.1.8, v20.1.9, v21.1.3, v21.1.4)*, Contour (v1.20.2) | Contour (v1.20.2) | Contour (v1.20.2) |
Storage | vSphere Container Storage Interface (v2.5.2**) and vSphere Cloud Native Storage | Amazon EBS CSI driver (v1.8.0) and in-tree cloud providers | Azure Disk CSI driver for Kubernetes (v1.19.0) and in-tree cloud providers |
Authentication | OIDC via Pinniped (v0.12.1), LDAP via Pinniped (v0.12.1) and Dex | ||
Observability | Fluent Bit (v1.8.15), Prometheus (v2.36.2), Grafana (v7.5.16) | ||
Backup and migration | Velero (v1.9.2) |
Note* NSX Advanced Load Balancer Essentials is supported on vSphere 6.7U3, vSphere 7, and VMware Cloud on AWS. You can download it from the Download VMware Tanzu Kubernetes Grid page.
** Version of vsphere_csi_driver. For a full list of vSphere Container Storage Interface components included in the Tanzu Kubernetes Grid v1.6 release, see Component Versions.
*** For a list of VMware Cloud on AWS SDDC versions that are compatible with this release, see the VMware Product Interoperability Matrix.
**** Tanzu Kubernetes Grid v1.6 is the last release that supports building Red Hat Enterprise Linux 7 images.
For a full list of Kubernetes versions that ship with Tanzu Kubernetes Grid v1.6, see Supported Kubernetes Versions in Tanzu Kubernetes Grid v1.6 above.
The Tanzu Kubernetes Grid v1.6.x releases include the following software component versions:
Component | TKG v1.6.1 | TKG v1.6.0 |
---|---|---|
aad-pod-identity | v1.8.13+vmware.1* | v1.8.0+vmware.1 |
addons-manager | v1.6.0+vmware.1-tkg.3* | v1.5.0_vmware.1-tkg.5 |
ako-operator | v1.6.0_vmware.17-tkg.2* | v1.6.0+vmware.16* |
alertmanager | v0.24.0+vmware.1 | v0.24.0+vmware.1* |
antrea | v1.5.3+vmware.3* | v1.5.3_tkg.1* (in VMware Container Networking with Antrea 1.4.0) |
aws-ebs-csi-driver* | v1.8.0+vmware.1 | v1.8.0+vmware.1 |
azuredisk-csi-driver* | v1.19.0+vmware.1 | v1.19.0+vmware.1 |
byoh-k8s-ubuntu-2004* | v1.23.10+vmware.1* | v1.23.10+vmware.2-tkg.1 |
calico_all | v3.22.1+vmware.1 | v3.22.1+vmware.1* |
capabilities-package* | v0.25.4-tf-capabilities* | v0.25.0-23-g6288c751-capabilities |
carvel-secretgen-controller | v0.9.1+vmware.1 | v0.9.1+vmware.1* |
cloud-provider-azure | v0.7.4+vmware.1 | v0.7.4+vmware.1 |
cloud_provider_vsphere | v1.23.1+vmware.1 | v1.23.1+vmware.1* |
cluster-api-provider-azure | v1.4.5+vmware.1* | v1.4.0+vmware.2* |
cluster_api | v1.1.5+vmware.1 | v1.1.5+vmware.1* |
cluster_api_aws | v1.2.0+vmware.1 | v1.2.0+vmware.1 |
cluster_api_vsphere | v1.3.5+vmware.1* | v1.3.1+vmware.1* |
cni_plugins | v1.1.1+vmware.7* | v1.1.1+vmware.6* |
configmap-reload | v0.7.1+vmware.1 | v0.7.1+vmware.1* |
containerd | v1.6.6+vmware.2 | v1.6.6+vmware.2* |
contour | v1.20.2+vmware.2*, v1.18.2+vmware.1, v1.17.2+vmware.1 |
v1.20.2+vmware.1*, v1.18.2+vmware.1, v1.17.2+vmware.1 |
coredns | v1.8.6+vmware.11* | v1.8.6+vmware.7* |
crash-diagnostics | v0.3.7+vmware.5 | v0.3.7+vmware.5 |
cri_tools | v1.22.0+vmware.9* | v1.22.0+vmware.8* |
csi_attacher | v3.4.0+vmware.1, |
v3.4.0+vmware.1*, v3.3.0+vmware.1, |
csi_livenessprobe | v2.6.0+vmware.1, v2.5.0+vmware.1, v2.4.0+vmware.1 |
v2.6.0+vmware.1*, v2.5.0+vmware.1*, v2.4.0+vmware.1 |
csi_node_driver_registrar | v2.5.1+vmware.1, v2.5.0+vmware.1, v2.3.0+vmware.1 |
v2.5.1+vmware.1*, v2.5.0+vmware.1*, v2.3.0+vmware.1 |
csi_provisioner | v3.1.0+vmware.2*, v3.0.0+vmware.1 |
v3.0.0+vmware.1 |
dex | 2.30.2+vmware.1 | v2.30.2+vmware.1 |
envoy | v1.21.3+vmware.1, v1.19.1+vmware.1, v1.18.4+vmware.1 |
v1.21.3+vmware.1*, v1.19.1+vmware.1, v1.18.4+vmware.1 |
external-dns | v0.11.0+vmware.1* | v0.10.0+vmware.1 |
external-snapshotter* | v6.0.1+vmware.1, v5.0.1+vmware.1 |
v6.0.1+vmware.1*, v5.0.1+vmware.1 |
etcd | v3.5.4+vmware.7* | v3.5.4_vmware.6* |
fluent-bit | v1.8.15+vmware.1 | v1.8.15+vmware.1* |
gangway | v3.2.0+vmware.2 | v3.2.0+vmware.2 |
grafana | v7.5.16+vmware.1 | v7.5.16+vmware.1* |
guest-cluster-auth-service* | v1.0.0 | v1.0.0 |
harbor | v2.6.1+vmware.1* | v2.5.3 (deprecated) |
image-builder | v0.1.13+vmware.2* | v0.1.12+vmware.2* |
image-builder-resource-bundle* | v1.23.10+vmware.2-tkg.1 | v1.23.10+vmware.2-tkg.1 |
imgpkg | v0.29.0+vmware.1 | v0.29.0+vmware.1* |
jetstack_cert-manager | v1.7.2+vmware.1*, v1.5.3+vmware.6* |
v1.5.3+vmware.4* |
k8s-sidecar | v1.15.6+vmware.1 | v1.15.6+vmware.1* |
k14s_kapp | v0.49.0+vmware.1 | v0.49.0+vmware.1* |
k14s_ytt | v0.41.1+vmware.1 | v0.41.1+vmware.1* |
kapp-controller | v0.38.5+vmware.2* | v0.38.4+vmware.1* |
kbld | v0.34.0+vmware.1 | v0.34.0+vmware.1* |
kube-state-metrics | v2.5.0+vmware.1 | v2.5.0+vmware.1* |
kube-vip | v0.4.2+vmware.1* | v0.4.2+vmware.1* |
kube_rbac_proxy | v0.11.0+vmware.2 | v0.11.0+vmware.2* |
kubernetes | v1.23.10+vmware.1 | v1.23.8+vmware.2* |
kubernetes-csi_external-resizer | v1.4.0+vmware.1, v1.3.0+vmware.1 |
v1.4.0+vmware.1*, v1.3.0+vmware.1 |
kubernetes-sigs_kind | v1.23.10+vmware.1-tkg.1_v0.11.1 | v1.23.10+vmware.1-tkg.1_v0.11.1* |
kubernetes_autoscaler | v1.23.0+vmware.1 | v1.23.0+vmware.1* |
load-balancer-and-ingress-service (AKO) | v1.7.3+vmware.1* | v1.7.2+vmware.2* |
metrics-server | v0.6.1+vmware.1 | v0.6.1+vmware.1* |
multus-cni | v3.8.0+vmware.1 | v3.8.0+vmware.1* |
pinniped | v0.12.1+vmware.1-tkg.1 | v0.12.1+vmware.1-tkg.1 |
pinniped-post-deploy* | v0.12.1+vmware.2-tkg.2* | v0.12.1+vmware.1-tkg.1 |
prometheus | v2.36.2+vmware.1 | v2.36.2+vmware.1* |
prometheus_node_exporter | v1.3.1+vmware.1 | v1.3.1+vmware.1* |
pushgateway | v1.4.3+vmware.1 | v1.4.3+vmware.1* |
standalone-plugins-package | v0.25.4-tf-standalone-plugins* | v0.25.0-standalone-plugins* |
sonobuoy | v0.56.6+vmware.1 | v0.56.6+vmware.1* |
tanzu-framework | v0.25.4* | v0.25.0* |
tanzu-framework-addons | v0.25.4-tf* | v0.25.0-23-g6288c751* |
tanzu-framework-management-packages | v0.25.4-tf* | v0.25.0* |
tkg-bom | v1.6.1-tf-v0.25.4* | v1.6.0* |
tkg-core-packages | v1.23.10+vmware.1-tkg.1* | v1.23.8+vmware.2-tkg.1* |
tkg-standard-packages | v1.6.1-tf-v0.25.4* | v1.6.0* |
tkg-storageclass-package | v0.25.4-tkg-storageclass* | v0.25.0-23-g6288c751-tkg-storageclass* |
tkg_telemetry | v1.6.0+vmware.1 | v1.6.0+vmware.1* |
velero | v1.9.2+vmware.1* | v1.8.1+vmware.1 |
velero-plugin-for-aws | v1.5.1+vmware.1* | v1.4.1+vmware.1 |
velero-plugin-for-csi | v0.3.1+vmware.1* | N/A |
velero-plugin-for-microsoft-azure | v1.5.1+vmware.1* | v1.4.1+vmware.1 |
velero-plugin-for-vsphere | v1.4.0+vmware.1* | v1.3.1+vmware.1 |
vendir | v0.27.0+vmware.1 | v0.27.0+vmware.1* |
vsphere_csi_driver | v2.5.2+vmware.1 | v2.5.2+vmware.1* |
whereabouts* | v0.5.1+vmware.2 | v0.5.1+vmware.2 |
* Indicates a new component or version bump since the previous release. For TKG v1.6.0, the latest previous release was v1.5.4.
For a complete list of software component versions that ship with Tanzu Kubernetes Grid v1.6.1, see ~/.config/tanzu/tkg/bom/tkg-bom-v1.6.1.yaml and ~/.config/tanzu/tkg/bom/tkr-bom-v1.23.10+vmware.1-tkg.1.yaml. For component versions in previous releases, see the tkg-bom- and tkr-bom- YAML files that install with those releases.
You can only upgrade to Tanzu Kubernetes Grid v1.6.x from v1.5.x, or to a v1.6.y ptach from v1.6.x. If you want to upgrade to Tanzu Kubernetes Grid v1.6.x from a version earlier than v1.5.x, you must upgrade to v1.5.x first.
When upgrading Kubernetes versions on workload clusters, you cannot skip minor versions. For example, you cannot upgrade a Tanzu Kubernetes cluster directly from v1.21.x to v1.23.x. You must upgrade a v1.21.x cluster to v1.22.x before upgrading the cluster to v1.23.x.
Tanzu Kubernetes Grid v1.6 release dates are:
Tanzu Kubernetes Grid v1.6.0 introduces the following new behavior compared with v1.5.4, which is the latest previous release.
telemetry
plugin replaces tanzu mc ceip-participation
commands with tanzu telemetry
commands. See Manage Participation in CEIP for the new commands.TKG 2.0: Several new publications cover Tanzu Kubernetes Grid 2.0, which refers to vSphere 8 and the Tanzu CLI v1.6. More specifically, TKG 2.0 includes improved Cluster
object definitions that are backed by a new ClusterClass
object type in vSphere with Tanzu Supervisor clusters and supported by the v1.6 Tanzu CLI. One of these publications, About Tanzu Kubernetes Grid, explains how TKG 2.x compares with v1.x versions of Tanzu. For more information, see Find the Right TKG Docs for Your Deployment on the VMware Tanzu Kubernetes Grid Documentation page.
CLI Reference: A new publication VMware Tanzu CLI Reference describes the Tanzu CLI and includes a command reference organized by Tanzu CLI command group. Much of this content was previously published in the Tanzu Kubernetes Grid product documentation and the Tanzu Application Platform product documentation.
The Tanzu Kubernetes Grid 1.6 documentation applies to all of the 1.6.x releases. It includes information about the following subjects:
The following issues that were documented as Known Issues in Tanzu Kubernetes Grid v1.6.0 are resolved in Tanzu Kubernetes Grid v1.6.1.
Node passwords on vSphere expire in 60 or 90 days
Photon and Ubuntu node images based on TKr v1.23.8, v1.22.11, and v1.21.14, used in v1.6.0 management clusters and updated workload clusters on vSphere, have passwords that expire 60 days after creation for Ubuntu OS, and 90 days after creation for Photon OS.
These expiration times are set to adhere to CIS and STIG security standards, but they prevent ssh
login to the nodes after the password has expired.
Workaround: See the Knowledge Base article capv user password expiry in TKG v1.6.0.
The following issues that were documented as Known Issues in Tanzu Kubernetes Grid v1.5.4 are resolved in Tanzu Kubernetes Grid v1.6.0. For details of issues that were resolved in 1.5.x patch releases up to and including v1.5.4, see the v1.5.x Release Notes.
kapp-controller
generates ctrl-change
ConfigMap objects, even if there is no change
The CustomResourceDefinition
objects that define configurations for Calico, AKO Operator, and other packages include a status
field. When the kapp-controller
reconciles these CRD objects every five minutes, it interprets their status
as having changed even when the package configuration did not change. This causes the kapp-controller
to generate unnecessary, duplicate ctrl-change
ConfigMap
objects, which soon overrun their history buffer because each package saves a maximum of 200 ctrl-change
ConfigMap
records.
Workaround: None
Host network pods and node use the wrong IP in IPv6 clusters.
When you deploy IPv6 clusters with multiple control plane nodes on vSphere and the clusters use Kubernetes 1.20.x or 1.21.x, one of your nodes as well as the etc
, kube-apiserver
, and kube-proxy
pods may take on the IP you set for the VSPHERE_CONTROL_PLANE_ENDPOINT
instead of an IP of their own. You might not see an error, but this could cause networking problems for these pods and prevent the control plane nodes from proper failover.
When AVI_LABELS
is set, ako-operator
causes high latency on the AVI Controller
Due to a bug in the ako-operator
package, setting the AVI_LABELS
variable or configuring Cluster Labels (Optional) in the Configure VMware NSX Advanced Load Balancer section of the installer interface when creating the management cluster results in the package attempting to reconcile indefinitely. This generates a high volume of events on the AVI Controller.
Workaround: If you are experiencing this issue, follow the steps below:
Pause the reconciliation of the ako-operator
package:
kubectl patch pkgi ako-operator -n tkg-system --type "json" -p '[{"op":"replace","path":"/spec/paused","value":true}]'
Remove the cluster selector in the default AKODeploymentConfig
custom resource:
kubectl patch adc install-ako-for-all --type "json" -p='[{"op":"remove","path":"/spec/clusterSelector"}]'
Remove the labels that you defined in AVI_LABELS
or Cluster Labels (Optional) from each affected workload cluster:
kubectl label CLUSTER-NAME YOUR-AVI-LABELS-
For example:
kubectl label my-workload-cluster tkg.tanzu.vmware.com/ako-enabled=-
The ako-operator
package must remain in the paused state to persist this change.
With NSX ALB, cannot create cluster in NAMESPACE
that has name beginning with numeric character
On vSphere with NSX Advanced Load Balancer, creating a workload cluster from Tanzu Mission Control or by running tanzu cluster create
fails if its management namespace, set by the NAMESPACE
configuration variable, begins with a numeric character (0
-9
).
The following are known issues in Tanzu Kubernetes Grid v1.6.x. Any known issues that were present in 1.6.0 that have been resolved in a subsequent v1.6.x patch release are listed under the Resolved Issues for the patch release in which they were fixed.
Upgrading management cluster behind Azure internal load balancer stalls or fails with etcd error
When upgrading a management cluster from TKG v1.5.4 to v1.6.1 on Azure with an internal load balancer, etcd
health check updates between the control plane node and the load balancer fail due to incorrect control plane tolerations settings. This causes the upgrades to stall or fail with errors like Failed to connect to the etcd pod
.
For more information, see the Knowledge Base article TKG Management clusters fail to upgrade or upgrade slowly when behind Azure internal load balancers.
Workaround: Before running tanzu management-cluster upgrade
manually add a node affinity rule to the KubeAdm control plane (KCP) controller:
Edit the KCP controller deployment spec:
kubectl edit deployment capi-kubeadm-control-plane-controller-manager -n capi-kubeadm-control-plane-system
Add the following affinity:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
weight: 100
Retrieve the name of the KCP controller pod by listing all pods and looking for the one with capi-kubeadm-control-plane-controller-manager
in the name:
kubectl get po -A
Delete the KCP controller pod so that it recreates:
kubectl delete pod KCP-POD-NAME
Verify that the pod has recreated:
kubectl get po -A -l cluster.x-k8s.io/provider=control-plane-kubeadm -o wide
If the pod is not recreated, verify that the worker node it is scheduled on has sufficient capacity.
Upgrading clusters on Azure fails with timeout errors
On Azure, upgrading management clusters and workload clusters fails with errors such as context deadline exceeded
or unable to upgrade management cluster: error waiting for kubernetes version update for kubeadm control plane
. This happens because operations on Azure sometimes take longer than on other platforms.
Workaround: Run the tanzu management-cluster upgrade
or tanzu cluster upgrade
again, specifying a longer timeout in the --timeout
flag. The default timeout is 30m0s.
Upgrade fails for clusters created with the wildcard character (*
) in TKG_NO_PROXY
setting
TKG v1.6 does not allow the wildcard character (*
) in cluster configuration file settings for TKG_NO_PROXY
. Clusters created by previous TKG versions with this setting require special handling before upgrading, in order to avoid the error workload cluster configuration validation failed: invalid string '*' in TKG_NO_PROXY
.
Workaround: Depending on the type of cluster you are upgrading:
Management cluster:
kubectl
context.Edit the configMap kapp-controller-config:
kubectl edit cm kapp-controller-config -n tkg-system
Find the data.noProxy
field and change its wildcard hostname by removing *
. For example, change *.vmware.com to
.vmware.com
Save and exit. The cluster is ready to upgrade.
Workload cluster:
kubectl
contextSet environment variables for your cluster name and namespace, for example:
CLUSTER_NAME=my-test-cluster
NS=my-test-namespace
Obtain and decode the kapp controller data values for the workload cluster:
kubectl get secret "${CLUSTER_NAME}-kapp-controller-data-values" -n $NS -o json | jq -r '.data."values.yaml"' | base64 -d > "${CLUSTER_NAME}-${NS}-kapp-controller-data-values"
Edit the ${CLUSTER_NAME}-${NS}-kapp-controller-data-values
file by removing *
from its kappController.config.noProxy
setting. For example, change *.vmware.com to
.vmware.com
.
Re-encode the data values file ${CLUSTER_NAME}-${NS}-kapp-controller-data-values
:
cat "${CLUSTER_NAME}-${NS}-kapp-controller-data-values" | base64 -w 0
Edit the ${CLUSTER_NAME}-${NS}-kapp-controller-data-values
secret and update its data.value.yaml
setting by pasting in the newly-encoded data values string.
kubectl edit secret "${CLUSTER_NAME}-kapp-controller-data-values" -n "${NS}"
Save and exit. The cluster is ready to upgrade.
Multus CNI fails on medium
and smaller pods with NSX Advanced Load Balancer
On vSphere, workload clusters with medium
or smaller worker nodes running the Multus CNI package with NSX ALB can fail with Insufficient CPU
or other errors.
Workaround: To use Multus CNI with NSX ALB, deploy workload clusters with worker nodes of size large
or extra-large
.
TKG BoM file contains extraneous cert-manager package version
The TKG Bill of Materials (BoM) file that the Tanzu CLI installs into ~/.config/tanzu/tkg
lists both v1.5.3 and v1.7.2 versions for the cert manager
(jetstack_cert-manager
) package. The correct version to install is v1.5.3, as described in Install cert-manager.
Harbor CVE export may fail when execution ID exceeds 1000000+
Harbor v2.6.1, which is the version packaged for TKG v2.1, has a known issue that CVE reports export with error “404 page not found” when the execution primary key auto-increment ID grows to 1000000+.
This Harbor issue is resolved in later versions of Harbor that are slated for inclusion in later versions of TKG.
Cluster and pod operations that delete pods may fail if DaemonSet configured to auto-restore persistent volumes
In installations where a DaemonSet uses persistent volumes (PVs), machine deletion may fail because the drain by default process ignores DaemonSets and the system waits indefinitely for the volumes to be detached from the node. Affected cluster operations include upgrade, scale down, and delete.
Workaround: To address this issue, do one of the following to each worker node in the cluster before upgrading, scaling down, or deleting the cluster:
Set a spec.NodeDrainTimeout
value for the node. This lets the machine controller delete the node once the timeout expires, even if it has volumes attached.
Manually delete each pod in the node.
Workload cluster cannot distribute storage across multiple datastores
You cannot enable a workload cluster to distribute storage across multiple datastores as described in Deploy a Cluster that Uses a Datastore Cluster. If you tag multiple datastores in a datastore cluster as the basis for a workload cluster’s storage policy, the workload cluster uses only one of the datastores.
Workaround: None
On vSphere with Tanzu, tanzu cluster list
generates error for DevOps users
When a user with the DevOps engineer role, as described in vSphere with Tanzu User Roles and Workflows, runs tanzu cluster list
, they may see an error resembling Error: unable to retrieve combined cluster info: unable to get list of clusters. User cannot list resource "clusters" at the cluster scope
.
This happens because the tanzu cluster command
without a -n
option attempts to access all namespaces, some of which may not be accessible to a DevOps engineer user.
Workaround: When running tanzu cluster list
, include a --namespace
value to specify a namespace that the user can access.
Non-alphanumeric characters cannot be used in HTTP/HTTPS proxy passwords
When deploying management clusters with CLI, the non-alphanumeric characters # ` ^ | / ? % ^ { [ ] } \ " < >
cannot be used in passwords. Also, any non-alphanumeric character cannot be used in HTTP/HTTPS proxy passwords when deploying management cluster with UI.
Workaround: You can use non-alphanumeric characters other than # ` ^ | / ? % ^ { [ ] } \ " < >
in passwords when deploying management cluster with CLI.
Tanzu CLI does not work on macOS machines with ARM processors
Tanzu CLI v0.11.6 does not work on macOS machines with ARM (Apple M1) chips, as identified under Finder > About This Mac > Overview.
Workaround: Use a bootstrap machine with a Linux or Windows OS, or a macOS machine with an Intel processor.
--generate-default-values-file
option of tanzu package available get
outputs an incomplete configuration template file for the Harbor package
Running tanzu package available get harbor.tanzu.vmware.com/PACKAGE-VERSION --generate-default-values-file
creates an incomplete configuration template file for the Harbor package. To get a complete file, use the imgpkg pull
command as described in Deploy Harbor into a Workload or a Shared Services Cluster.
Windows CMD: Extraneous characters in CLI output column headings
In the Windows command prompt (CMD), Tanzu CLI command output that is formatted in columns includes extraneous characters in column headings.
The issue does not occur in Windows Terminal or PowerShell.
Workaround: On Windows bootstrap machines, run the Tanzu CLI from Windows Terminal.
Ignorable AKODeploymentConfig
error during management cluster creation
Running tanzu management-cluster create
to create a management cluster with NSX ALB outputs the following error: no matches for kind ???AKODeploymentConfig??? in version ???networking.tkg.tanzu.vmware.com/v1alpha1???
. The error can be ignored. For more information, see this article in the KB.
Ignorable machinehealthcheck
and clusterresourceset
errors during workload cluster creation on vSphere
When a workload cluster is deployed to vSphere by using the tanzu cluster create
command through vSphere with Tanzu, the output might include errors related to running machinehealthcheck
and accessing the clusterresourceset
resources, as shown below:
Error from server (Forbidden): error when creating "/tmp/kubeapply-3798885393": machinehealthchecks.cluster.x-k8s.io is forbidden: User "sso:Administrator@vsphere.local" cannot create resource "machinehealthchecks" in API group "cluster.x-k8s.io" in the namespace "tkg"
...
Error from server (Forbidden): error when retrieving current configuration of: Resource: "addons.cluster.x-k8s.io/v1beta1, Resource=clusterresourcesets", GroupVersionKind: "addons.cluster.x-k8s.io/v1beta1, Kind=ClusterResourceSet"
...
The workload cluster is successfully created. You can ignore the errors.
CLI temporarily misreports status of recently deleted nodes when MHCs are deactivated
When machine health checks (MHCs) are deactivated, then Tanzu CLI commands such as tanzu cluster status
may not report up-to-date node state while infrastructure is being recreated.
Workaround: None
Node pool labels
and other configuration properties cannot be changed
You cannot add to or otherwise change an existing node pool’s labels
, az
, nodeMachineType
or vSphere properties, as listed in Configuration Properties.
Workaround: Create a new node pool in the cluster with the desired properties, migrate workloads to the new node pool, and delete the original.
TKG user account creates idle vCenter sessions
The vSphere account for TKG creates idle vCenter sessions, as listed in vSphere > Hosts and Clusters inventory > your vCenter > Monitor tab > Sessions.
Workaround: Remove idle vCenter sessions by starting and stopping all sessions:
ssh
in to vCenter as root
shell
service-control --stop --all
Stopped
service-control --start --all
Node pools created with small
nodes may stall at Provisioning
Node pools created with node SIZE
configured as small
may become stuck in the Provisioning
state and never proceed to Running
.
Workaround: Configure node pool with at least medium
size nodes.
With NSX ALB, cannot create clusters with identical names
If you are using NSX Advanced Load Balancer for workloads (AVI_ENABLE
) or the control plane (AVI_CONTROL_PLANE_HA_PROVIDER
) the Avi Controller may fail to distinguish between identically-named clusters.
Workaround: Set a unique CLUSTER_NAME
value for each cluster:
Management clusters: Do not create multiple management clusters with the same CLUSTER_NAME
value, even from different bootstrap machines.
Workload clusters: Do not create multiple workload clusters that have the same CLUSTER_NAME
and are also in the same management cluster namespace, as set by their NAMESPACE
value.
Adding external identity management to an existing deployment may require setting dummy VSPHERE_CONTROL_PLANE_ENDPOINT
value
Integrating an external identity provider with an existing TKG deployment may require setting a dummy VSPHERE_CONTROL_PLANE_ENDPOINT
value in the management cluster configuration file used to create the add-on secret, as described in Generate the Pinniped Add-on Secret for the Management Cluster
Workload cluster node pools on AWS must be in the same availability zone as the standalone management cluster.
When creating a node pool configured with an az
that is different from where the management cluster is located, the new node pool may remain stuck with status ScalingUp
, as listed by tanzu cluster node-pool list
, and never reach the Ready
state.
Workaround: Only create node pools in the same AZ as the standalone management cluster.
Deleting cluster on AWS fails if cluster uses networking resources not deployed with Tanzu Kubernetes Grid.
The tanzu cluster delete
and tanzu management-cluster delete
commands may hang with clusters that use networking resources created by the AWS Cloud Controller Manager independently from the Tanzu Kubernetes Grid deployment process. Such resources may include load balancers and other networking services, as listed in The Service Controller in the Kubernetes AWS Cloud Provider documentation.
For more information, see the Cluster API issue Drain workload clusters of service Type=Loadbalancer on teardown.
Workaround: Use kubectl delete
to delete services of type LoadBalancer
from the cluster. Or if that fails, use the AWS console to manually delete any LoadBalancer
and SecurityGroup
objects created for this service by the Cloud Controller manager. Warning: Do not to delete load balancers or security groups managed by Tanzu, which have the tags key: sigs.k8s.io/cluster-api-provider-aws/cluster/CLUSTER-NAME
, value: owned
.
Ignorable goss
test failures during image-build process
When you run Kubernetes Image Builder to create a custom Linux custom machine image, the goss
tests python-netifaces
, python-requests
, and ebtables
fail. Command output reports the failures. The errors can be ignored; they do not prevent a successful image build.
You cannot upgrade Windows workload clusters to v1.6
You cannot upgrade Windows workload clusters from TKG v1.5 to v1.6
Workaround: After upgrading your management cluster to TKG v1.6, create a new Windows workload cluster with a Windows Server 2019 ISO image and migrate all workloads from the TKG v1.5 cluster to the v1.6 cluster.
You cannot create a Windows machine image on a MacOS machine
Due to an issue with the open-source packer
utility used by Kubernetes Image Builder, you cannot build a Windows machine image on a MacOS machine as described in Windows Custom Machine Images.
Workaround: Use a Linux machine to build your custom Windows machine images.
Backup and restore is not supported for Windows workload clusters
You cannot backup and restore Windows workload clusters.
Workaround: None
NoteFor v4.0+, VMware NSX-T Data Center is renamed to “VMware NSX.”
IPv6 networking is not supported on vSphere 8
TKG v1.6 does not support IPv6 networking on vSphere 8, although it supports single-stack IPv6 networking using Kube-Vip on vSphere 7 as described in IPv6 Networking.
Workaround: If you need or are currently using TKG in an IPv6 environment on vSphere, do not install or upgrade to vSphere 8.
NSX ALB NodePortLocal
ingress mode is not supported for management cluster
In TKG v1.6, you cannot run NSX Advanced Load Balancer (ALB) as a service type with ingress mode NodePortLocal
for traffic to the management cluster.
This issue does not affect support for NodePortLocal
ingress to workload clusters, as described in NodePortLocal (For Antrea CNI).
Workaround: Configure management clusters with AVI_INGRESS_SERVICE_TYPE
set to either NodePort
or ClusterIP
. Default is NodePort
.
TKG does not support encrypted Antrea CNI traffic
TKG does not support encrypting Antrea pod and container networking traffic. The previously-documented cluster configuration variables ANTREA_TRAFFIC_ENCRYPTION_MODE
and ANTREA_WIREGUARD_PORT
are not supported.
Management cluster create fails or performance slow with older NSX-T versions and Photon 3 or Ubuntu with Linux kernel 5.8 VMs
Deploying a management cluster with the following infrastructure and configuration may fail or result in restricted traffic between pods:
This combination exposes a checksum issue between older versions of NSX-T and Antrea CNI.
TMC: If the management cluster is registered with Tanzu Mission Control (TMC) there is no workaround to this issue. Otherwise, see the workarounds below.
Workarounds:
ANTREA_DISABLE_UDP_TUNNEL_OFFLOAD
set to "true"
. This setting deactivates Antrea’s UDP checksum offloading, which avoids the known issues with some underlay network and physical NIC network drivers. Setting AVI_CONTROLLER_VERSION
may cause error ako operator webhook validation fail
In TKG v1.6, the AVI_CONTROLLER_VERSION
cluster configuration variable is not needed because the AKO operator automatically detects the Avi Controller version that is in use. See the Product Snapshot for compatible Avi Controller versions.
If you set this variable, or if you include a spec.controllerVersion
setting when customizing your AKO deployment, management cluster creation or AKO customization may fail with a webhook validation failed
error.
Workaround: Do not set AVI_CONTROLLER_VERSION
in a management cluster configuration file, and if you customize your AKO deployment by running kubectl apply -f
with an AKODeploymentConfig
object spec, do not include a spec.controllerVersion
field in the spec.
vSphere CSI volume deletion may fail on AVS
On Azure vSphere Solution (AVS), vSphere CSI Persistent Volumes (PVs) deletion may fail. Deleting a PV requires the cns.searchable permission. The default admin account for AVS, cloudadmin@vsphere.local, is not created with this permission. For more information, see vSphere Roles and Privileges.
Workaround: To delete a vSphere CSI PV on AVS, contact Azure support.
You cannot use Harbor in proxy cache mode for running Tanzu Kubernetes Grid in an internet-restricted environment. Prior versions of Tanzu Kubernetes Grid supported the Harbor proxy cache feature.
Workaround: None