This topic contains release notes for Tanzu Kubernetes Grid Integrated Edition (TKGI) v1.18.
Release Date: July 23, 2024
Release Details |
||
---|---|---|
Version | v1.18.5 | |
Release date | July 23, 2024 | |
Internal Component Versions |
||
Antrea | v1.8.0 | Release Notes |
cAdvisor | v0.47.2 | |
Cloud Providers | AWS: v1.27.1 vSphere: v1.27.0 |
Release Notes: AWS vSphere |
Containerd | Linux: v1.6.33* Windows: v1.6.33* |
|
CoreDNS | v1.10.1+vmware.23* | |
CSI Driver for vSphere | v3.1.2 | Release Notes |
etcd | v3.5.12 | |
Harbor | v2.10.2 | Release Notes |
Kubernetes | v1.27.15* | Release Notes |
Metrics Server | v0.6.4 | |
NCP | v4.1.2.2* | Release Notes |
Percona XtraDB Cluster (PXC) (in BOSH pxc-release) |
v8.0.36-28 pxc-release: v1.0.29* |
Release Notes: PXC pxc-release |
UAA | v74.5.123* | |
Velero | v1.11.1 | Release Notes |
Wavefront | Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4 |
|
Stemcell Compatibility |
||
Ubuntu Jammy stemcells | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB. | |
Windows stemcells | v2019.69 or later | |
Interoperability |
||
Ops Manager | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB. | |
VMware Aria Operations Management Pack for Kubernetes | v2.0 | Release Notes |
VMware Cloud Foundation (VCF) | v5.0, v4.5.2 | Release Notes: v5.0, v4.5.2 |
VMware NSX** | See VMware Product Interoperability Matrices***. | |
vSphere |
* Components marked with an asterisk have been updated.
** As of May 7, 2024, NSX networking and firewall components are sold separately from TKGI.
*** Migration from NSX Management Plane API to NSX Policy API requires VMware NSX v4.0.1.1 or later. NSX v4.0.1.1 supports only 50% of NSX Management Plane API scale. To use Policy API at 100% of Management Plane API scale, you require NSX v4.1.1 or later.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.18.5 are from TKGI v1.18.4 and earlier v1.18 patches, and from TKGI v1.17.6 and earlier v1.17 patches.
TKGI v1.18.5 does not include any new breaking changes.
azs
and persistent_disk_in_mb
in the azs
block of a compute profile.TKGI v1.18.5 resolves the following issues:
NSXLoadBalancerMonitor
objects being deleted from clusters after TKGI upgrade.curl
bump to v8.8.0 in TKGI NSX packaging fixes information disclosure vulnerability CVE-2023-46219.Except where noted, the known issues in TKGI v1.18.4 are also in TKGI v1.18.5. For more information, see TKGI v1.18.4 Known Issues below.
TKGI v1.18.5 does not include any new known issues.
No TKGI features have been deprecated or removed from TKGI v1.18.
Release Date: July 23, 2024
Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.
Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.
Element | Details | |
---|---|---|
Version | v1.18.5 | |
Release date | July 23, 2024 | |
Installed TKGI version | v1.18.5 | |
Installed Ops Manager version | v3.0.31* | Release Notes |
Component | Version | |
Installed Kubernetes version | v1.27.15* | Release Notes |
Installed Harbor Registry version | v2.10.2 | Release Notes |
Ubuntu Jammy stemcell | v1.486* | Release Notes |
* Components marked with an asterisk have been updated.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.18.5 are from TKGI v1.18.4 and earlier v1.18 patches, and from TKGI v1.17.6 and earlier v1.17 patches.
TKGI v1.18.5 resolves the following issues:
Except where noted, the known issues in TKGI Management Console v1.18.4 are also in TKGI Management Console v1.18.5. For more information, see TKGI Management Console v1.18.4 Known Issues below.
TKGI v1.18.5 does not include any new known issues.
Release Date: June 12, 2024
Release Details |
||
---|---|---|
Version | v1.18.4 | |
Release date | June 11, 2024 | |
Internal Component Versions |
||
Antrea | v1.8.0 | Release Notes |
cAdvisor | v0.47.2 | |
Cloud Providers | AWS: v1.27.1 vSphere: v1.27.0 |
Release Notes: AWS vSphere |
Containerd | Linux: v1.6.28 Windows: v1.6.28 |
|
CoreDNS | v1.10.1+vmware.20* | |
CSI Driver for vSphere | v3.1.2 | Release Notes |
etcd | v3.5.12* | |
Harbor | v2.10.2* | Release Notes |
Kubernetes | v1.27.13* | Release Notes |
Metrics Server | v0.6.4 | |
NCP | v4.1.2.1 | Release Notes |
Percona XtraDB Cluster (PXC) (in BOSH pxc-release) |
v8.0.36-28* pxc-release: v1.0.28* |
Release Notes: PXC pxc-release |
UAA | v74.5.116* | |
Velero | v1.11.1 | Release Notes |
Wavefront | Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4 |
|
Stemcell Compatibility |
||
Ubuntu Jammy stemcells | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB. | |
Windows stemcells | v2019.69 or later | |
Interoperability |
||
Ops Manager | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB. | |
VMware Aria Operations Management Pack for Kubernetes | v2.0 | Release Notes |
VMware Cloud Foundation (VCF) | v5.0, v4.5.2 | Release Notes: v5.0, v4.5.2 |
VMware NSX** | See VMware Product Interoperability Matrices***. | |
vSphere |
* Components marked with an asterisk have been updated.
** As of May 7, 2024, NSX networking and firewall components are sold separately from TKGI.
*** Migration from NSX Management Plane API to NSX Policy API requires VMware NSX v4.0.1.1 or later. NSX v4.0.1.1 supports only 50% of NSX Management Plane API scale. To use Policy API at 100% of Management Plane API scale, you require NSX v4.1.1 or later.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.18.4 are from TKGI v1.18.3 and earlier v1.18 patches, and from TKGI v1.17.6 and earlier v1.17 patches.
TKGI v1.18.4 does not include any new breaking changes.
tkgi
CLI to upgrade TKGI clusters. Note: The TKGI Management Console v1.18.4 and prior do not support upgrading TKGI running on multiple datacenters.
tkgi update-cluster
process up to three times if it fails, to improve resilience.TKGI v1.18.4 resolves the following issues:
Error creating NSX-T cluster network... Invalid transport path
error in NSX Policy API mode.Except where noted, the known issues in TKGI v1.18.3 are also in TKGI v1.18.4. For more information, see TKGI v1.18.3 Known Issues below.
TKGI v1.18.4 does not include any new known issues.
No TKGI features have been deprecated or removed from TKGI v1.18.
Release Date: June 11, 2024
Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.
Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.
Element | Details | |
---|---|---|
Version | v1.18.4 | |
Release date | June 11, 2024 | |
Installed TKGI version | v1.18.4 | |
Installed Ops Manager version | v3.0.29* | Release Notes |
Component | Version | |
Installed Kubernetes version | v1.27.13* | Release Notes |
Installed Harbor Registry version | v2.10.2* | Release Notes |
Ubuntu Jammy stemcell | v1.445* | Release Notes |
* Components marked with an asterisk have been updated.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.18.4 are from TKGI v1.18.3 and earlier v1.18 patches, and from TKGI v1.17.6 and earlier v1.17 patches.
TKGI v1.18.4 resolves the following issues:
Except where noted, the known issues in TKGI Management Console v1.18.3 are also in TKGI Management Console v1.18.4. For more information, see TKGI Management Console v1.18.3 Known Issues below.
TKGI v1.18.4 does not include any new known issues.
No TKGI features have been deprecated or removed from TKGI Management Console v1.18.
Release Date: April 16, 2024
Release Details |
||
---|---|---|
Version | v1.18.3 | |
Release date | April 16, 2024 | |
Internal Component Versions |
||
Antrea | v1.8.0 | Release Notes |
cAdvisor | v0.47.2 | |
Cloud Providers | AWS: v1.27.1 vSphere: v1.27.0 |
Release Notes: AWS vSphere |
Containerd | Linux: v1.6.28 Windows: v1.6.28 |
|
CoreDNS | v1.10.1+vmware.18* | |
CSI Driver for vSphere | v3.1.2 | Release Notes |
etcd | v3.5.12* | |
Harbor | v2.10.0* | Release Notes |
Kubernetes | v1.27.12* | Release Notes |
Metrics Server | v0.6.4 | |
NCP | v4.1.2.1 | Release Notes |
Percona XtraDB Cluster (PXC) (in BOSH pxc-release) |
v8.0.36-28* pxc-release: v1.0.26* |
Release Notes: PXC pxc-release |
UAA | v74.5.108* | |
Velero | v1.11.1 | Release Notes |
Wavefront | Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4 |
|
Stemcell Compatibility |
||
Ubuntu Jammy stemcells | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB. | |
Windows stemcells | v2019.69 or later | |
Interoperability |
||
Ops Manager | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB. | |
VMware Aria Operations Management Pack for Kubernetes | v2.0 | Release Notes |
VMware Cloud Foundation (VCF) | v5.0, v4.5.2 | Release Notes: v5.0, v4.5.2 |
VMware NSX | See VMware Product Interoperability Matrices**. | |
vSphere |
* Components marked with an asterisk have been updated.
** Migration from NSX Management Plane API to NSX Policy API requires VMware NSX v4.0.1.1 or later. NSX v4.0.1.1 supports only 50% of NSX Management Plane API scale. To use Policy API at 100% of Management Plane API scale, you require NSX v4.1.1 or later.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.18.3 are from TKGI v1.18.2 and earlier v1.18 patches, and from TKGI v1.17.4 and earlier v1.17 patches.
TKGI v1.18.3 does not include any new breaking changes.
TKGI v1.18.3 does not include any new features.
TKGI v1.18.3 resolves the following issues:
Strict-Transport-Security
header to nginx HTTPS server configuration on Management Console VM to improve security scan results.service-cluster-ip-range
in a custom Kubernetes profile is set to a smaller range than the Networking > Kubernetes Service Network CIDR Range setting in the TKGI Ops Manager tile.Except where noted, the known issues in TKGI v1.18.2 are also in TKGI v1.18.3. For more information, see TKGI v1.18.2 Known Issues below.
TKGI v1.18.3 does not include any new known issues.
No TKGI features have been deprecated or removed from TKGI v1.18.
Release Date: April 16, 2024
Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.
Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.
Element | Details | |
---|---|---|
Version | v1.18.3 | |
Release date | April 16, 2024 | |
Installed TKGI version | v1.18.3 | |
Installed Ops Manager version | v3.0.25* | Release Notes |
Component | Version | |
Installed Kubernetes version | v1.27.12* | Release Notes |
Installed Harbor Registry version | v2.10.0 | Release Notes |
Ubuntu Jammy stemcell | v1.406* | Release Notes |
* Components marked with an asterisk have been updated.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.18.3 are from TKGI v1.18.2 and earlier v1.18 patches, and from TKGI v1.17.4 and earlier v1.17 patches.
TKGI v1.18.3 resolves the following issues:
Except where noted, the known issues in TKGI Management Console v1.18.2 are also in TKGI Management Console v1.18.3. For more information, see TKGI Management Console v1.18.2 Known Issues below.
TKGI v1.18.3 does not include any new known issues.
No TKGI features have been deprecated or removed from TKGI Management Console v1.18.
Release Date: March 5, 2024
Release Details |
||
---|---|---|
Version | v1.18.2 | |
Release date | March 5, 2024 | |
Internal Component Versions |
||
Antrea | v1.8.0 | Release Notes |
cAdvisor | v0.47.2 | |
Cloud Providers | AWS: v1.27.1 vSphere: v1.27.0 |
Release Notes: AWS vSphere |
Containerd | Linux: v1.6.28* Windows: v1.6.28* |
|
CoreDNS | v1.10.1+vmware.17* | |
CSI Driver for vSphere | v3.1.2* | Release Notes |
etcd | v3.5.9 | |
Harbor | v2.10.0* | Release Notes |
Kubernetes | v1.27.11* | Release Notes |
Metrics Server | v0.6.4 | |
NCP | v4.1.2.1* | Release Notes |
Percona XtraDB Cluster (PXC) (in BOSH pxc-release) |
v8.0.35-27* pxc-release: v1.0.24* |
Release Notes: PXC pxc-release |
UAA | v74.5.104* | |
Velero | v1.11.1* | Release Notes |
Wavefront | Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4 |
|
Stemcell Compatibility |
||
Ubuntu Jammy stemcells | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.. | |
Windows stemcells | v2019.69* or later | |
Interoperability |
||
Ops Manager | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.. | |
VMware Aria Operations Management Pack for Kubernetes | v2.0 | Release Notes |
VMware Cloud Foundation (VCF) | v5.0, v4.5.2 | Release Notes: v5.0, v4.5.2 |
VMware NSX | See VMware Product Interoperability Matrices***. | |
vSphere |
* Components marked with an asterisk have been updated.
** VCF v5.0 and VCF v4.5.2 are supported but have not been tested with TKGI v1.18.
*** Migration from NSX Management Plane API to NSX Policy API requires VMware NSX v4.0.1.1 or later. NSX v4.0.1.1 supports only 50% of NSX Management Plane API scale. To use Policy API at 100% of Management Plane API scale, you require NSX v4.1.1.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.18.2 are from TKGI v1.18.1 and earlier v1.18 patches, and from TKGI v1.17.3 and earlier v1.17 patches.
TKGI v1.18.2 does not include any new breaking changes.
TKGI v1.18.2 resolves the following issues:
ClusterMetricSink
configuration details, preventing Prometheus from contacting Telegraf pods.NCP update to v4.1.2.1 fixes issues:
cookie_name
is configurable in the network profile when ingress persistence_type
is set as cookie
.Except where noted, the known issues in TKGI v1.18.1 are also in TKGI v1.18.2. For more information, see TKGI v1.18.1 Known Issues below.
TKGI v1.18.2 contains an additional known issue:
This issue is fixed in TKGI v1.18.3.
Symptom
When upgrading TKGI with vSphere Container Storage Plug-in (CSI) enabled, pods listed by kubectl get pods
remain stuck with STATUS
Pending
.
Running kubectl describe
on worker nodes lists Taints: node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
.
Log file /var/vcap/sys/log/vsphere-cloud-controller-manager/vsphere-cloud-controller-manager.stderr.log
includes Credentials not found
error, for example:
E0326 02:55:12.479807 21110 node_controller.go:236] error syncing 'b9a897b1-bf26-4460-a002-1c16a84d40a0': failed to get provider ID for node b9a897b1-bf26-4460-a002-1c16a84d40a0 at cloudprovider: failed to get instance ID from cloud provider: Credentials not found, requeuing
This behavior occurs when your vSphere password starts with a special character or contains backslash (\
) characters.
Explanation
With internal vSphere CPI change from in-tree to out-of-tree, the CSI driver upgrade operation parses vCenter passwords incorrectly and cannot retrieve node information. This leads to the uninitialized=true:NoSchedule
taint being attached to nodes.
Workaround
Change your vSphere password to not start with a special character or contain backslash (\
) characters.
No TKGI features have been deprecated or removed from TKGI v1.18.
Release Date: March 5, 2024
Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.
Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.
Element | Details | |
---|---|---|
Version | v1.18.2 | |
Release date | March 5, 2024 | |
Installed TKGI version | v1.18.2 | |
Installed Ops Manager version | v3.0.24* | Release Notes |
Component | Version | |
Installed Kubernetes version | v1.27.11* | Release Notes |
Installed Harbor Registry version | v2.10.0* | Release Notes |
Ubuntu Jammy stemcell | v1.379* | Release Notes |
* Components marked with an asterisk have been updated.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.18.2 are from TKGI v1.18.1 and earlier v1.18 patches, and from TKGI v1.17.3 and earlier v1.17 patches.
Except where noted, the known issues in TKGI Management Console v1.18.1 are also in TKGI Management Console v1.18.2. For more information, see TKGI Management Console v1.18.1 Known Issues below.
TKGI v1.18.2 does not include any new known issues.
No TKGI features have been deprecated or removed from TKGI Management Console v1.18.
Release Date: December 19, 2023
Release Details |
||
---|---|---|
Version | v1.18.1 | |
Release date | December 19, 2023 | |
Internal Component Versions |
||
Antrea | v1.8.0 | Release Notes |
cAdvisor | v0.47.2 | |
Cloud Providers | AWS: v1.27.1 vSphere: v1.27.0 |
Release Notes: AWS vSphere |
Containerd | Linux: v1.6.24* Windows: v1.6.24* |
|
CoreDNS | v1.10.1+vmware.12* | |
CSI Driver for vSphere | v3.1.1* | Release Notes |
etcd | v3.5.9 | |
Harbor | v2.9.1* | Release Notes |
Kubernetes | v1.27.8* | Release Notes |
Metrics Server | v0.6.4 | |
NCP | v4.1.2.0 | Release Notes |
Percona XtraDB Cluster (PXC) (in BOSH pxc-release) |
v8.0.34-26 pxc-release: v1.0.20* |
Release Notes: PXC pxc-release |
UAA | v74.5.95* | |
Velero | v1.11.1* | Release Notes |
Wavefront | Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4 |
|
Stemcell Compatibility |
||
Ubuntu Jammy stemcells | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.. | |
Windows stemcells | v2019.65 or later | |
Interoperability |
||
Ops Manager | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.. | |
VMware Aria Operations Management Pack for Kubernetes | v2.0 | Release Notes |
VMware Cloud Foundation (VCF) | v5.0, v4.5.2 | Release Notes: v5.0, v4.5.2 |
VMware NSX | See VMware Product Interoperability Matrices***. | |
vSphere |
* Components marked with an asterisk have been updated.
** VCF v5.0 and VCF v4.5.2 are supported but have not been tested with TKGI v1.18.
*** Migration from NSX Management Plane API to NSX Policy API requires VMware NSX v4.0.1.1 or later. NSX v4.0.1.1 supports only 50% of NSX Management Plane API scale. To use Policy API at 100% of Management Plane API scale, you require NSX v4.1.1.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.18.1 are from TKGI v1.18.0, and TKGI v1.17.2 and earlier TKGI v1.17 patches.
TKGI v1.18.1 does not include any new breaking changes.
TKGI v1.18.1 does not include any new features or enhancements.
TKGI v1.18.1 resolves the following issues:
Component bumps fix the following:
Except where noted, the known issues in TKGI v1.18.0 are also in TKGI v1.18.1. For more information, see TKGI v1.18.0 Known Issues below.
TKGI v1.18.1 does not include any new known issues.
Important: To address CVE-2024-21626 by patching TKGI with a runc upgrade, see High-Severity CVE-2024-21626 in runc 1.1.11 and earlier below.
No TKGI features have been deprecated or removed from TKGI v1.18.
Release Date: December 19, 2023
Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.
Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.
Element | Details | |
---|---|---|
Version | v1.18.1 | |
Release date | December 19, 2023 | |
Installed TKGI version | v1.18.1 | |
Installed Ops Manager version | v3.0.19* | Release Notes |
Component | Version | |
Installed Kubernetes version | v1.27.8* | Release Notes |
Installed Harbor Registry version | v2.9.1* | Release Notes |
Ubuntu Jammy stemcell | v1.318* | Release Notes |
* Components marked with an asterisk have been updated.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.18.1 are from TKGI v1.18.0, and TKGI v1.17.2 and earlier TKGI v1.17 patches.
TKGI Management Console v1.18.1 does not include any new features or enhancements.
Except where noted, the known issues in TKGI Management Console v1.18.0 are also in TKGI Management Console v1.18.1. For more information, see TKGI Management Console v1.18.0 Known Issues below.
This issue is fixed in TKGI v1.18.2.
Symptom
After upgrading to TKGI v1.18.1 you cannot use the key displayed in the Management Console > Deployment Metadata tab to ssh
in to the Ops Manager VM as described in Connect to Operations Manager with SSH .
Explanation
The upgrade process generates a new SSH private key but does not update its value as shown in the Management Console > Deployment Metadata tab. After upgrade, the displayed key no longer matches any public key in the Ops Manager VM’s /home/ubuntu/.ssh/authorized_keys
.
Workaround
Retrieve and use an up-to-date SSH private key from a configuration file on the TKG Management Console VM:
ssh
in to the Management Console VM as root:
ssh root@MC-IP-ADDRESS
Where MC-IP-ADDRESS
is the IP address that you use to access the Management Console with a browser as described in Step 2: Log In to TKGI Management Console.
Retrieve the private key from the file /etc/vmware/.pks/om_root_ca
.
ssh
in to the Ops Manager VM by passing the private key to the -i
flag of ssh
. For example, to connect to the Ops Manager VM directly from the MC VM:
ssh -i /etc/vmware/.pks/om_root_ca ubuntu@OPS-MANAGER-IP
You can also copy the private key to ssh
from your local workstation.
No TKGI features have been deprecated or removed from TKGI Management Console v1.18.
Release Date: November 02, 2023
Release Details |
||
---|---|---|
Version | v1.18.0 | |
Release date | November 02, 2023 | |
Internal Component Versions |
||
Antrea | v1.8.0* | Release Notes |
cAdvisor | v0.47.2* | |
Cloud Providers | AWS: v1.27.1* vSphere: v1.27.0* |
Release Notes: AWS vSphere |
Containerd | Linux: v1.6.18 Windows: v1.6.18 |
|
CoreDNS | v1.10.1+vmware.7* | |
CSI Driver for vSphere | v3.0.2 | Release Notes |
etcd | v3.5.9 | |
Harbor | v2.9.0* | Release Notes |
Kubernetes | v1.27.5* | Release Notes |
Metrics Server | v0.6.4* | |
NCP | v4.1.2.0* | Release Notes |
Percona XtraDB Cluster (PXC) (in BOSH pxc-release) |
v8.0.33-25* pxc-release: v1.0.18* |
Release Notes: PXC pxc-release |
UAA | v74.5.90* | |
Velero | v1.11.1* | Release Notes |
Wavefront | Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4 |
|
Stemcell Compatibility |
||
Ubuntu Jammy stemcells | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.. | |
Windows stemcells | v2019.65 or later* | |
Interoperability |
||
Ops Manager | See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.. | |
VMware Aria Operations Management Pack for Kubernetes | v2.0 | Release Notes |
VMware Cloud Foundation (VCF) | v5.0**, v4.5.2** | Release Notes: v5.0, v4.5.2 |
VMware NSX | See VMware Product Interoperability Matrices***. | |
vSphere |
* Components marked with an asterisk have been updated.
** VCF v5.0 and VCF v4.5.2 are supported but have not been tested with TKGI v1.18.
*** Migration from NSX Management Plane API to NSX Policy API requires VMware NSX v4.0.1.1 or later. NSX v4.0.1.1 supports only 50% of NSX Management Plane API scale. To use Policy API at 100% of Management Plane API scale, you require NSX v4.1.1.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.18.0 are from TKGI v1.17.1 and v1.17.0.
TKGI v1.18.0 includes the following breaking changes:
SecurityContextDeny Admission Controller Support Has Been Removed: Support for SecurityContextDeny admission controller has been removed in TKGI v1.18. You must deactivate SecurityContextDeny in all active Plans before upgrading to TKGI v1.18.
The SecurityContextDeny admission controller has been deprecated, and the Kubernetes community recommends the controller not be used. For more information, see Deactivate the SecurityContextDeny Admission Controller in Upgrade Preparation Checklist for TKGI.
Pod security admission (PSA) is the preferred method for providing a more secure Kubernetes environment. For more information about PSA, see Pod Security Admission in TKGI.
Telegraf has been upgraded for host monitoring: Telegraf has been upgraded to Telegraf v1.28.1 for host monitoring.
Previous TKGI versions have used Telegraf v1.20.3 for host monitoring. For information on the differences between the Telegraf v1.20.3 and v1.28.1 releases, see Telegraf v1.28.1 and Telegraf v1.24.0 Breaking Change in the Telegraf Release Notes documentation.
The Out-of-Tree Kubernetes AWS Cloud Provider Requires Additional Permissions: In AWS environments, TKGI v1.18 integrates the out-of-tree AWS cloud provider for Kubernetes. The Kubernetes AWS out-of-tree cloud provider requires a different AWS configuration than was required by the in-tree Kubernetes AWS cloud provider used by previous TKGI versions. Basic cloud provider functions will fail in TKGI v1.18 if the AWS out-of-tree cloud provider requirements are not met. For more information, see AWS Permissions Errors When Using the Out-of-Tree Kubernetes AWS Cloud Provider in General Troubleshooting.
Windows Stemcells Must Be Updated to Expose Ethernet Adapter Information: By default, Windows worker node VM Ethernet adapter information is not exposed on TKGI Windows clusters. When creating BOSH Windows stemcells for TKGI v1.18, you must configure your base Windows OS image to expose Ethernet adapter information. For more information, see Expose Ethernet Adapter Information on Worker Node VMs in the revised Creating a Windows Stemcell for vSphere Using Stembuild procedure.
TKGI v1.18.0 includes the following features:
Supports configuring the maximum number of persistent volumes attached to a cluster node on vSphere. For information on how to configure the maximum number of persistent volumes, see Customize the Maximum Number of Persistent Volumes in Deploying and Managing Cloud Native Storage (CNS) on vSphere.
Supports DNS-based service discovery for Linux worker nodes. TKGI lets you configure a worker.cfcr.internal
DNS entry for TKGI-provisioned Linux worker nodes. For more information on integrating with Prometheus DNS-based service discovery, see Monitoring Components and Integrations in Monitoring Linux Workers and Workloads.
TKGI v1.18.0 includes the following enhancements:
Supports the following vSphere CSI driver features:
Integrates the vSphere and AWS out-of-tree cloud providers for Kubernetes. TKGI continues to use the Azure and GCP in-tree cloud providers.
TKGI v1.18.0 resolves the following issues:
cluster_name
Tag to Logging after Cluster Upgrade.Note: You must grant the AWS Worker Instance Profile additional AWS Identity and Access Management (IAM) permissions before using the Antrea Egress feature with worker nodes on AWS. For more information, see Prepare AWS Worker Instance Profile Permissions in General Troubleshooting.
The following TKGI features have been deprecated or removed from TKGI v1.18:
Google Cloud Platform: Support for the Google Cloud Platform (GCP) is deprecated. Support for GCP will be entirely removed in TKGI v1.19.
Flannel Support: Support for the Flannel Container Networking Interface (CNI) is deprecated. Support for Flannel will be entirely removed in TKGI v1.19. VMware recommends that you switch your Flannel CNI-configured clusters to the Antrea CNI. For more information about Flannel CNI deprecation, see About Switching from the Flannel CNI to the Antrea CNI in About Tanzu Kubernetes Grid Integrated Edition Upgrades.
TKGI v1.18.0 has the following known issues:
This issue is fixed in TKGI v1.18.2.
To address CVE-2024-21626, which impacts runc v1.1.11 and earlier, follow Instructions to address CVE-2024-21626 for TKGI in the VMware Knowledge Base.
This issue is fixed in TKGI v1.18.4.
TKGI on vSphere does not support running workload clusters in multiple vCenter server inventories. All vSphere clusters must be managed by the same vCenter server, due to an internal vSphere CPI change from in-tree to out-of-tree.
The VMware vSphere CSI Driver supports a limited set of VMware vSphere features. Before enabling the vSphere CSI Driver on a TKGI cluster, confirm the cluster and storage configuration are supported by the driver. For more information, see Unsupported Features and Limitations in Deploying and Managing Cloud Native Storage (CNS) on vSphere.
TKGI supports using a public cloud CSI Driver on a TKGI-provisioned cluster.
Installing a Public Cloud CSI Driver on a TKGI Cluster
If you plan to use a public cloud CSI Driver on a TKGI-provisioned cluster, VMware recommends you take additional steps before installing the CSI Driver:
For most public clouds, VMware recommends you follow the CSI Driver installation procedure recommended by the public cloud provider.
For installing the Azure CSI Driver on a TKGI cluster, VMware recommends you follow the procedure in the How to install Azure file/disk CSI driver onto TKGI 1.14 cluster knowledge base article in the VMware Tanzu Support Hub.
Managing a TKGI Cluster That Uses a Public Cloud CSI Driver
If you have enabled a public cloud CSI Driver on a TKGI cluster, you must take additional steps when deleting,upgrading, or updating the cluster:
Updating a Cluster on a Public Cloud
When updating a cluster that uses a public cloud CSI Driver:
To prepare a single-worker node cluster for updating:
Upgrading a Cluster on a Public Cloud
When upgrading a cluster that uses a public cloud CSI Driver:
To prepare a single-worker node cluster for upgrading:
Deleting a Cluster on a Public Cloud
When deleting a cluster that uses a public cloud CSI Driver:
This issue is fixed in TKGI v1.18.5.
Symptom
Deploying new clusters with NSX Edge nodes fails due to a failure in the pks-nsx-t-prepare-master-vm
job. On TKGI control plane VMs, the job log file /var/vcap/data/sys/log/pks-nsx-t-prepare-master-vm/pre-start.stdout.log
reports an error like:
Creating Load Balancer
create loadbalancer: update lb service: [PUT /infra/lb-services/{lb-service-id}][400] updateLBServiceBadRequest &{RelatedAPIError:{Details: ErrorCode:502001 ErrorData:<nil> ErrorMessage:Errors validating path=[/infra/lb-services/lb-pks-b5ef6df4-cd11-4461-8861-893533940ecb]. ModuleName:policy} RelatedErrors:[0xc0001b25a0]}
Explanation
When creating a cluster, TKGI creates a NSX Tier-1 gateway and attaches a load balancer to it. This becomes the cluster’s default load balancer, hosting the virtual server for the cluster’s API endpoints and ingress rules. The error occurs when the Tier-1 gateway creates the LB in the NSX routing allocation pool instead of the NSX LB allocation pool. This can cause NSX Service Router components to deploy to edge nodes with no LB capacity, resulting in cluster creation failure.
Workarounds
This issue is fixed in TKGI v1.18.3.
Symptom
Fluentd component logfiles that contain error text invalid byte sequence in UTF-8
fill up and overrun ephemeral storage on cluster, for example /tmp/fluent/backup/worker0/object_c79c/
mapped to ephemeral storage disk dev/sdb1
.
Workaround
Modify Fluentd configuration by changing its log handling code out_loginsight_buffered.rb
as follows:
Upload the latest BOSH release to the BOSH Director:
bosh upload-release --sha1 daf34e35f1ac678ba05db3496c4226064b99b3e4 "https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=22.2.1"
For a different release, download it with wget https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=RELEASE"
and calculate its SHA-1 with shasum os-conf-release\?v\=RELEASE"
.
Create a runtime config file runtime.yml
with the following code, to modify the out_loginsight_buffered.rb
code:
releases:
- name: "os-conf"
version: "22.2.1"
addons:
- name: os-configuration
include:
deployments: [dep1,dep2...] =====> need to change to the cluster deployment id that you want to apply the change, if you want apply the change to all the clusters, just delete the include section.
jobs:
- name: pre-start-script
release: os-conf
properties:
script: |-
#!/bin/bash
if [ -d "/var/vcap/jobs/fluentd/packages/vrli-fluentd/plugins/" ]; then
sed -i 's/force_encoding("utf-8")/encode!('\''UTF-8'\'', '\''binary'\'', invalid: :replace, undef: :replace, replace: '\'' '\'' )/g' /var/vcap/jobs/fluentd/packages/vrli-fluentd/plugins/out_loginsight_buffered.rb
echo "done"
fi
Apply the runtime config file to the BOSH Director, giving it the name runtime-fluentd
:
bosh update-runtime-config --name runtime-fluentd runtime.yml
Check if the runtime config been set successfully:
bosh runtime-config --name runtime-fluentd
The command output should list the contents of runtime.yml
.
Upgrade the cluster with ephemeral storage overrun, to apply the new runtime config:
tkgi upgrade-cluster CLUSTER-NAME
Log in to a cluster worker or master node and check that the file /var/vcap/jobs/fluentd/packages/vrli-fluentd/plugins/out_loginsight_buffered.rb
includes the following lines:
key.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: ' ' )
...
value.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: ' ' )
Check that no more logfiles are routing to the directory /tmp/fluent/backup/worker0
and no more error logs containing invalid byte sequence in UTF-8
are writing to /var/vcap/sys/log/fluentd.stdout.log
.
(Optional) Delete the runtime config:
bosh delete-config --type runtime --name runtime-fluentd
Unless you delete it, the runtime config persists in the BOSH Director and is applied with TKGI updates and upgrades, and other operations that run bosh deploy
. After you upgrade to a TKGI version that fixes this issue, leaving the runtime config in place is harmless, but it does leave an unneeded file on the BOSH Director filesystem and make BOSH execute an unnecessary script.
This issue is fixed in TKGI v1.18.2.
Symptom
For clusters configured with Topology-aware volume provisioning, upgrading to TKGI 1.18.0 or 1.18.1 generates an error like:
"failed to get vCenterInstance for vCenter Host: \"[vcsa027.zd.datev.de|http://vcsa027.zd.datev.de/]\". Error: virtual center was already registered","
Explanation
TKGI does not automatically set the feature state switch to enable multi-vcenter-csi-topology
for the cluster.
Workaround
Follow the instructions in SSH into a Kubernetes Cluster VM to ssh
into the master node of the failed cluster.
Edit the file /var/vcap/jobs/csi-controller/config/feature-switch.yml
.
Add the following line to set the feature gate:
"multi-vcenter-csi-topology": "true"
Stop the csi-controller
and confirm that it has stopped:
sudo monit stop csi-controller
sudo monit summary
Start the csi-controller
and confirm that it has restarted:
sudo monit start csi-controller
sudo monit summary
Symptom
On clusters configured to use a containerd registry and Istio CNI, upgrading the TKGI version without also upgrading the stemcell fails with errors kubelet cannot find istio-cni binary
and nsx fails to recieve message header
.
This error does not occur when you upgrade to a new stemcell along with the new TKGI version.
Explanation
When TKGI cluster upgrades and drains the node during upgrade, it leaves the cluster nodes’ Istio CNI agent and CNI configuration in a corrupted state.
If the cluster nodes are not automatically re-created by a stemcell change, the corrupted Istio CNI state remains.
Workaround
For clusters that use both Containerd and Istio CNI:
If you have already encountered this issue, re-create all worker nodes using the bosh recreate
command:
Run the bosh vms
command to list the cluster VMs:
bosh -d service-instance-DEPLOYMENT-ID vms
Where DEPLOYMENT-ID
is the BOSH-generated ID of your Kubernetes cluster deployment.
For each VM instance listed as worker/UUID
in the output, run bosh recreate VM-NAME
:
bosh -d service-instance-DEPLOYMENT-ID recreate worker/UUID
In the future, you can avoid this issue by upgrading a cluster’s stemcell whenever you upgrade its TKGI version.
Symptom
For Windows worker clusters that authenticate users via a group Managed Service Account (gMSA) in Microsoft AD, upgrading the clusters to a new Windows stemcell may cause users to be unable to log in to the cluster. Valid credentials for containers on the cluster may no longer work.
Logfile join-domain/pre-start.stdout.log
contains:
WARNING: The changes will take effect after you restart the computer WIN-<ID-STRING>.
Already joined to domain
Explanation
When BOSH upgrades a VM’s Windows stemcell, it re-creates the VM and then triggers a join-domain
job to reconnect it with its gMSA group. Reconnecting with gMSA requires a second VM reboot, but TKGI does not currently trigger the reboot automatically because its timing would interfere with other upgrade operations.
Workaround
After upgrading a Windows cluster with GMSA to a new stemcell, manually reboot its worker nodes:
Run bosh vms
to list the names of the worker nodes, and record their Deployment
ID and Instance
IDs.
For the Windows worker nodes, which have Instance
IDs that begin with worker/
, log in to them and restart them as follows:
Run bosh -d DEPLOYMENT-ID ssh INSTANCE-ID
.
powershell
.Run the following script, which restarts the VM and returns you to your local shell:
Set-Service bosh-agent -StartupType Automatic
Set-Service bosh-dns-windows -StartupType Automatic
Set-Service bosh-dns-healthcheck-windows -StartupType Automatic
Set-Service bosh-dns-nameserverconfig-windows -StartupType Automatic
Set-Service kubelet -StartupType Automatic
Set-Service nsx-kube-proxy -StartupType Automatic
Set-Service nsx-node-agent -StartupType Automatic
Set-Service containerd -StartupType Automatic
Set-Service ovs-vswitchd -StartupType Automatic
Set-Service ovsdb-server -StartupType Automatic
Set-Service system-metrics-agent -StartupType Automatic
Get-Service bosh-agent | Select-Object -Property Name, StartType, Status
Stop-Service -Name bosh-agent -Force -NoWait
# Restart to apply changes
echo "Restarting vm"
Restart-Computer
Wait until the VM is restarted and check the pod status:
kubectl get pod POD-NAME
If the pod is stuck, bosh restart
it. For example:
bosh -d service-instance_0d0f7798-e4e9-473f-8ddb-279bc61faef0 restart worker/2deca1a2-c6ed-4e37-8dce-91b141d98e8f
Where service-instance_0d0f7798-e4e9-473f-8ddb-279bc61faef0
is the example instance group DEPLOYMENT-ID and worker/2deca1a2-c6ed-4e37-8dce-91b141d98e8f
is the example VM INSTANCE-ID.
This issue is fixed in TKGI v1.18.2.
Symptom
When installing or upgrading TKGI with vSphere Container Storage Plug-in (CSI) enabled, CSI pods fail with the error ErrImageNeverPull
and cluster logs show the error unknown escape sequence
.
Explanation
The CSI driver cannot correctly parse the vCenter username configuration setting if it contains a backslash (\
) character.
Workaround
When entering a vCenter user name in the TKGI Configuration Wizard or Ops Manager tile, use the format user@domainname
, for example: “[email protected]”. It cannot contain a backslash (\
) character.
Symptom
When upgrading TKGI with vSphere Container Storage Plug-in (CSI) enabled, pods listed by kubectl get pods
remain stuck with STATUS
Pending
.
Running kubectl describe
on worker nodes lists Taints: node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
.
Log file /var/vcap/sys/log/vsphere-cloud-controller-manager/vsphere-cloud-controller-manager.stderr.log
includes Credentials not found
error, for example:
E0326 02:55:12.479807 21110 node_controller.go:236] error syncing 'b9a897b1-bf26-4460-a002-1c16a84d40a0': failed to get provider ID for node b9a897b1-bf26-4460-a002-1c16a84d40a0 at cloudprovider: failed to get instance ID from cloud provider: Credentials not found, requeuing
This behavior occurs when your vSphere password starts with a special character or contains backslash (\
) characters.
Explanation
With internal vSphere CPI change from in-tree to out-of-tree, the CSI driver upgrade operation parses vCenter passwords incorrectly and cannot retrieve node information. This leads to the uninitialized=true:NoSchedule
taint being attached to nodes.
Workaround
Change your vSphere password to not start with a special character or contain backslash (\
) characters.
Symptom
When you deploy a workload on a TKGI-provisioned cluster with NSX networking that is running Tanzu Application Platform (TAP), you see an error Failed to create pod sandbox
and no resources are created in the cluster’s nsx-system
namespace.
Explanation
The total number of Kubernetes object labels and other tags created by both TKGI and TAP can exceed the number that is allowed by NSX.
Workaround
Create or update your network profile as described in Creating and Managing Network Profiles (NSX Only), setting the cni_configurations
parameter extensions.ncp.k8s.label_filtering_regex_list
as described under label_filtering Settings.
This issue is fixed in TKGI v1.18.2.
Symptom
When you scale up a cluster by passing --num-nodes
to tkgi cluster-upgrade
you see the error An error occurred in the PKS API
.
The pks-api.log includes: Request processing failed; nested exception is java.lang.NullPointerException
, which triggers within the nested ClusterService
methods extractCustomizationNodePoolNames
< validateComputeProfileUuidAndKubernetesWorkerInstances
< updateCluster
.
Explanation
The cluster’s Compute Profile lacks a node-pool specification, and setting --num-nodes
for cluster updating does not work when the Compute Profile does not specify a node pool.
Workaround
Create and apply a new compute profile that specifies a node pool:
Create a new JSON compute profile definition that defines a node pool with a worker node instance count. For example, this code defines a compute profile cp-1
with target instance count 16
:
cat cp-1.json
{
"name": "cp-1",
"description": "compute profile 1",
"parameters": {
"cluster_customization": {
"control_plane": {
"cpu": 2,
"memory_in_mb": 8192,
"persistent_disk_in_mb": 28240,
"ephemeral_disk_in_mb": 28240,
"instances": 3
},
"node_pools": [
{
"name": "pool-1",
"description": "node pool 1",
"instances": 16
}
]
}
}
}
Apply the new compute profile to the cluster:
tkgi update-cluster my-cluster --compute-profile cp-1.json
When you apply the new profile, the worker nodes migrate from instance group worker
to the new instance group, worker-pool-1
. If the instance count exceeds the number specified in plan, TKGI adds instances to the new group before updating existing instances.
After the new node pool is created, you can scale the cluster by running:
tkgi update-cluster cluster-1 --node-pool-instances "POOL:COUNT"
Where POOL
is the node pool name and COUNT
is its instance count.
This issue is fixed in TKGI v1.18.3.
Symptom
For clusters created with or assigned a compute profile as described in Using Compute Profiles, scaling the cluster by updating its compute profile and then performing additional cluster update operations may leave the cluster with the wrong node count.
For example, after the following steps, the cluster may have a node count of 12 instead of 18 in its updated node pool:
my-cluster
with a compute profile cp-1
that defines a node pool pool-1
with 6 nodes.tkgi update-cluster my-cluster --node-pool-instances "pool-1:12"
.cp-2
that sets the pool-1
node pool to 18 nodes.cp-2
by running tkgi update-cluster my-cluster --compute-profile cp-2
.tkgi update-cluster
operations.After the last step, the cluster pool’s node count may erroneously revert to 12.
Workaround
After you update a cluster with a new compute profile that changes node counts, run tkgi update-cluster CLUSTER-NAME --node-pool-instances "NODEPOOL-NAME:NODE-COUNT"
where NODE-COUNT
is the new node count.
With the example above, after Step 3 run tkgi update-cluster my-cluster --node-pool-instances "pool-1:18"
.
Description
Interoperability with VMware Aria Operations Management Pack for Kubernetes is temporarily unavailable.
VMware Aria Operations Management Pack for Kubernetes is currently not compatible with TKGI v1.18. Interoperability between VMware Aria Operations Management Pack for Kubernetes and TKGI v1.18 is expected at a later time.
This issue is fixed in TKGI v1.18.4.
Symptom
In the management console, when you update a Linux cluster that is created with a compute profile as described in Update Cluster Configuration, the Advanced Settings panel shows and applies incorrect defaults for node drain and pod shutdown grace period settings.
Workaround
Under Update Cluster, before you change the Compute Profile setting, click Show More to open the Advanced Settings. Record the current settings, and set them back if selecting a compute profile changes those settings.
Default settings are:
This issue is fixed in TKGI v1.18.4.
Symptom
After running tkgi upgrade-cluster
with TKGI, the cluster’s tags no longer appear. This issue exists on vSphere, AWS and Azure.
Workaround
To restore the cluster tags after upgrading a cluster with the TKGI CLI:
Run tkgi cluster CLUSTER-NAME
as described in Review Your Tags and copy the Tags:
value from the command output.
Run tkgi update-cluster CLUSTER-NAME --tags TAGS
and pass in the existing tags value.
Symptom
After you restore Ops Manager and the TKGI API VM from backup, TKGI functions normally, but your TKGI MC tabs include the following error: “…product ‘pivotal-container service’ is not deployed…”.
Explanation
TKGI MC is associated with an Ops Manager with a specific name. If you rename Ops Manager with a new name while restoring, your TKGI MC will not recognize the restored Ops Manager and cannot manage it.
Symptom
Cluster upgrade or BOSH manifest deploy fails with cloud controller errors, for example:
Process 'vsphere-cloud-controller-manager' Does not exist
failed jobs
error for aws-cloud-controller-manager
and related components.
aws-cloud-controller-manager
logs list failed to create listener
errors for multiple processes.Explanation
During cluster upgrade, the cluster’s Cloud Controller Manager (CCM) initializes, and if the API server is not yet available, the initialization process waits. When the API server becomes available, the CCM may then launch redundant processes that confuse internal coordination.
Workaround
Restart the CCM:
Log in to the cluster’s control-plane node.
Find the process ID (PID) for the CCM:
ps aux | grep cloud-controller-manager
Kill the CCM process:
kill CCM-PROCESS
Where CCM-PROCESS
is the PID listed by the previous step.
The CCM and its sub-processes re-create automatically.
Symptom
After clicking Apply Changes on the TKGI tile in an Azure environment, you experience an error ‘…could not execute “apply-changes”…’ with either of the following descriptions:
For example:
INFO | 2020-09-21 03:46:49 +0000 | Vessel::Workflows::Installer#run | Install product (apply changes)
2020/09/21 03:47:02 could not execute "apply-changes": installation failed to trigger: request failed: unexpected response from /api/v0/installations:
HTTP/1.1 500 Internal Server Error
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Mon, 21 Sep 2020 17:51:50 GMT
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Pragma: no-cache
Referrer-Policy: strict-origin-when-cross-origin
Server: Ops Manager
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: f5fc99c1-21a7-45c3-7f39
X-Runtime: 9.905591
X-Xss-Protection: 1; mode=block
44
{"errors":{"base":["undefined method `location' for nil:NilClass"]}}
0
Explanation
The Azure CPI endpoint used by Ops Manager has been changed and your installed version of Ops Manager is not compatible with the new endpoint.
Workaround
Run the following Ops Manager CLI command:
om --skip-ssl-validation --username USERNAME --password PASSWORD --target https://OPSMAN-API curl --silent --path /api/v0/staged/director/verifiers/install_time/IaasConfigurationVerifier -x PUT -d '{ "enabled": false }'
Where:
USERNAME
is the account to use to run Ops Manager API commands.PASSWORD
is the password for the account.OPSMAN-API
is the IP address for the Ops Manager APIFor more information, see Error ‘undefined method location’ is received when running Apply Change on Azure in the VMware Tanzu Knowledge Base.
VMware vRealize Operations (vROPs) does not support Windows worker-based Kubernetes clusters and cannot be used to manage TKGI-provisioned Windows workers.
To monitor Windows-based worker node clusters with a Wavefront collector and proxy, you must first install Wavefront on the clusters manually, using Helm. For instructions, see the Wavefront section of the Monitoring Windows Worker Clusters and Nodes topic.
TKGI-provisioned Windows worker-based Kubernetes clusters inherit a Kubernetes limitation that prevents outbound ICMP communication from workers. As a result, pinging Windows workers does not work.
For information about this limitation, see Limitations > Networking in the Windows in Kubernetes documentation.
When restoring the TKGI management plane from backup as described in Restoring TKGI Management Plane Components, you may see an error like the following, along with errors for the bbr-uaadb
and pks-api
components:
```
ERROR 3780 (HY000) at line 25: Referencing column 'SESSION_PRIMARY_ID' and referenced column 'PRIMARY_ID' in foreign key constraint 'SPRING_SESSION_ATTRIBUTES_FK' are incompatible.
```
With these errors, the User Account and Authentication (UAA) database fails to restore.
You can use Velero to back up stateless TKGI-provisioned Windows workers only. You cannot use Velero to back up stateful Windows applications. For more information, see Velero on Windows in Basic Install in the Velero documentation.
TKGI on Google Cloud Platform (GCP) does not support Tanzu Mission Control (TMC) integration, which is configured in the Tanzu Kubernetes Grid Integrated Edition tile > the Tanzu Mission Control pane.
If you intend to run TKGI on GCP, skip this pane when configuring the Tanzu Kubernetes Grid Integrated Edition tile.
TMC Data Protection feature supports privileged TKGI containers only. For more information, see Plans in the Installing TKGI topic for your IaaS.
Windows worker-based Kubernetes clusters integrated with group Managed Service Account (gMSA) cannot be managed using compute profiles.
On vSphere with NSX networking you can use compute profiles with both Linux and Windows worker‑based Kubernetes clusters. On vSphere with Flannel networking, you can apply compute profiles only to Linux clusters.
TKGI CLI does not prevent accidentally reducing a cluster’s control plane node count using a compute profile.
Warning: Reducing a cluster’s control plane node count can destroy the cluster. Do not scale out or scale in existing control plane nodes by reconfiguring the TKGI tile or by using a compute profile. Reducing a cluster’s number of control plane nodes might remove a control plane node and cause the cluster to become inactive.
Symptom
After you delete a VM using the management console of your infrastructure provider, you notice a Windows worker node that had been on that VM is now in a notReady
state.
Solution
To identify the leftover node:
kubectl get no -o wide
notReady
state and have the same IP address as another node in the list.To manually delete a notReady
node:
kubectl delete node NODE-NAME
Where NODE-NAME
is the name of the node in the notReady
state.
Symptom
You experience a “502 Bad Gateway” error from the NSX load balancer after you log in to OIDC.
Explanation
A large response header has exceeded your NSX load balancer maximum response header size. The default maximum response header size is 10,240 characters and should be resized to 16,384.
Workaround
If you experience this issue, manually reconfigure your NSX request_header_size
to 4096
characters and your response_header_size
to 16384
. For information about configuring NSX default header sizes, see OIDC Response Header Overflow in the Knowledge Base.
You must configure a global proxy in the Tanzu Kubernetes Grid Integrated Edition tile > Networking pane before you create any Windows workers that use the proxy.
You cannot change the proxy configuration for Windows workers in an existing cluster.
For vSphere with NSX, the HTTP Proxy password field does not support the following special characters: &
or ;
.
Symptom
You receive the following error after modifying your existing Harbor installation’s storage configuration:
Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown
Explanation
Harbor does not support modifying an existing Harbor installation’s storage configuration.
Workaround
To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.
Symptom
Permissions are removed from your cluster’s files and processes after resizing the persistent disk during a cluster upgrade. The ingress controller statefulset fails to start.
Explanation
When resizing a persistent disk, Bosh migrates the data from the old disk to the new disk but does not copy the files’ extended attributes.
Workaround
To resolve the problem, complete the steps in [Ingress controller statefulset fails to start after resize of worker nodes with permission denied] (https://knowledge.broadcom.com/external/article/298618/) in the Broadcom Support Knowledge Base.
Symptom
You experience issues when configuring a load balancer for a multi-control plane node Kubernetes cluster or creating a service of type LoadBalancer
. Additionally, in the Azure portal, the VM > Networking page does not display any inbound and outbound traffic rules for your cluster VMs.
Explanation
As part of configuring the Tanzu Kubernetes Grid Integrated Edition tile for Azure, you enter Default Security Group in the Kubernetes Cloud Provider pane. When you create a Kubernetes cluster, Tanzu Kubernetes Grid Integrated Edition automatically assigns this security group to each VM in the cluster. However, on Azure the automatic assignment might not occur.
As a result, your inbound and outbound traffic rules defined in the security group are not applied to the cluster VMs.
Workaround
If you experience this issue, manually assign the default security group to each VM NIC in your cluster.
Symptom
One of your plan IDs is one character longer than your other plan IDs.
Explanation
In TKGI, each plan has a unique plan ID. A plan ID is normally a UUID consisting of 32 alphanumeric characters and 4 hyphens. However, the Plan 4 ID consists of 33 alphanumeric characters and 4 hyphens.
Solution
You can safely configure and use Plan 4. The length of the Plan 4 ID does not affect the functionality of Plan 4 clusters.
If you require all plan IDs to have identical length, do not activate or use Plan 4.
Symptom
After you stop one instance in a multiple-instance database cluster, the cluster stops, or communication between the remaining databases times out, and the entire cluster becomes unreachable.
The following might be in your UAA log:
WSREP has not yet prepared node for application use
Explanation
The database cluster is unable to recover automatically because a member is no longer available to reconcile quorum.
Symptom
Backing up vSphere persistent volumes using Velero fails and your Velero backup log includes the following error:
rpc error: code = Unknown desc = Failed during IsObjectBlocked check: Could not translate selfLink to CRD name
Explanation
This is a known issue when backing up clusters on Kubernetes v1.20 and later using the Velero Plugin for vSphere v1.1.0 or earlier.
Workaround
To resolve the problem, complete the steps in Velero backups of vSphere persistent volumes fail on Kubernetes clusters version 1.20 or higher (83314) in the VMware Tanzu Knowledge Base.
Symptom
The first time that you try to create two Windows clusters at the same time, the creation of one of the clusters fails. If you run pks cluster CLUSTER-NAME
to examine the last action taken on the cluster, you see the following:
Last Action: Create Last Action State: failed Last Action Description: Instance provisioning failed: There was a problem completing your request. … operation: create, error-message: Failed to acquire lock … locking task id is 111, description: ‘create deployment’
Explanation
This is a known issue that occurs the first time that you create two Windows clusters concurrently.
Workaround
Recreate the failed cluster. This issue only occurs the first time that you create two Windows clusters concurrently.
Symptom
After running tkgi delete-cluster
and cluster deletion has completed, the deleted cluster continues to be listed when running tkgi clusters
.
Workaround
You must manually remove the deleted cluster using a customized version of the ncp_cleanup script. For more information, see Deleting a Tanzu Kubernetes Grid Integrated Edition cluster with “tkgi delete-cluster” stuck “in progress” status in the Broadcom Support Knowledge Base.
Symptom
After you uninstall TKGI, then reinstall TKGI in the same environment, BOSH Director logs errors similar to the following:
.../gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/cloud_manifest_parser.rb:120:in `parse_vm_extensions': Duplicate vm extension name 'disk_enable_uuid' (Bosh::Director::DeploymentDuplicateVmExtensionName)
Explanation
The pivotal-container-service
cloud-config was not removed when you uninstalled the TKGI tile, and it remained active. When you reinstalled the TKGI tile, an additional pivotal-container-service
cloud-config was created, causing the metrics_server to fall into a crash-loop state.
Workaround
You must manually remove the pivotal-container-service
cloud-config after removing your TKGI deployment, including after removing the TKGI tile from Ops Manager.
For more information, see “Duplicate vm extension name” error when metrics_server runs on Director VM in Tanzu Kubernetes Grid Integrated Edition in the VMware Tanzu Community Knowledge Base.
Symptom
Your TKGI logs include the following error:
'uaa'. Errors are:- Error filling in template 'uaa.yml.erb' (line 59: Client redirect-uri is invalid: uaa.clients.pks_cli.redirect-uri Client redirect-uri is invalid: uaa.clients.pks_cluster_client.redirect-uri)
Explanation
The TKGI API fully-qualified domain name (FQDN) for your cluster contains leading or trailing whitespace.
Workaround
Do not include whitespace in the TKGI tile API Hostname (FQDN) field.
The TMC Cluster Data Protection Backup fails in TKGI environments upgraded from an earlier version.
Symptom
The TMC Cluster Data Protection Backup fails to back up your existing clusters and logs the following error:
error executing custom action (groupResource=customresourcedefinitions.apiextensions.k8s.io, namespace=, name=ncpconfigs.nsx.vmware.com): rpc error: code = Unknown desc = error fetching v1beta1 version of ncpconfigs.nsx.vmware.com: the server could not find the requested resource
Explanation
Kubernetes v1.22 disallows the spec.preserveUnknownFields: true
configuration in your existing clusters and the creation of a v1 CustomResourceDefinitions configuration fails.
The TMC Cluster Data Protection Restore operation can fail when restoring multiple Antea resources.
Symptom
The TMC Cluster Data Protection Restore fails and logs errors that requests to restore the admission webhook
have been denied.
Explanation
Velero has encountered a race condition while operating a resource. For more information, see Allow customizing restore order for Kubernetes controllers and their managed resources in the Velero GitHub repository.
TKGI does not support environments where there are multiple matching networks, such as a mixed CVDS/NVDS environment.
Symptom
TKGI logs errors similar to the following in an environment with multiple matching networks:
LastOperationstatus='failed', description='Instance provisioning failed:
There was a problem completing your request. Please contact your operations team providing the following information:
service: p.pks, service-instance-guid: ..., broker-request-id: ..., task-id: ..., operation: create,
error-message: Unknown CPI error 'Unknown' with message 'undefined method `mob' for <VimSdk::Vim::OpaqueNetwork:' in create_vm' CPI method
Explanation
TKGI cannot identify which of the matching networks you intend to use and has selected the wrong network.
Occasionally, tkgi update-cluster
hangs while updating a Windows worker node instance and the BOSH task cannot finish and exits.
Symptom
The ovsdb-server
service has stopped but other processes report that it is running.
Explanation
The ovsdb-server.pid
file uses the pid for a process that is not the ovsdb-server.
To confirm that this is the root cause for tkgi update-cluster
to hang:
ovsdb-server
service has actually stopped, run the PowerShell Get-services
command on the Windows worker node.To verify that other processes report the ovsdb-server
service is still running:
Review the ovsdb-server job-service-wrapper.err.log
log file.
The job-service-wrapper.err.log
log file is located at:
C:\var\vcap\sys\log\openvswitch-windows\ovsdb-server\job-service-wrapper.err.log
Confirm that after the flushing processes, the log includes an error similar to the following:
Pid-Guard : ovsdb-server is already runing, please stop it first
At C:\var\vcap\jobs\openvswitch-windows\bin\ovsdb-server_ctl.ps1:30 char:5
+ Pid-Guard $PIDFILE "ovsdb-server"
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: ( [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Pid-Guard
To verify the root cause:
Run the following PowerShell commands on the Windows worker node:
$RUN_DIR = "C:\var\vcap\sys\run\openvswitch-windows"
$PIDFILE = "$RUN_DIR\ovsdb-server.pid"
$pid1 = Get-Content $PidFile -First 1
echo $pid1
$rst = Get-Process -Id $pid1 -ErrorAction SilentlyContinue
echo $rst
ProcessName
is not ovsdb-server
.Workaround
To resolve this issue for a single Windows worker:
Run the following:
rm C:\var\vcap\sys\run\openvswitch-windows\ovsdb-server.pid
ovsdb-server
process to start.If LDAP is enabled, Harbor private projects are inaccessible after upgrading to TKGI v1.13.0. For more information, see Private projects become inaccessible after upgrading Harbor for TKGI to v2.4.x with LDAP feature enabled in the Broadcom Support Knowledge Base.
Microsoft changed Microsoft Windows’ support for tar file commands in the January 2022 Microsoft Windows security patch.
Packaging scripts that use tar commands for Windows worker-based Kubernetes Cluster deployments can fail after the Microsoft tar command patch update has been applied.
The BOSH agent used by vSphere stemcells built by stembuild v2019.43 and earlier use tar commands that are no longer supported and will fail if the Microsoft Windows security patch has been applied.
Workaround
stembuild v2019.44 and later include a version of the BOSH agent that does not use unsupported tar commands.
If you use vSphere stemcells, use stembuild 2019.44 or later to avoid the BOSH agent tar error.
TKGI supports clusters that use NSGroup Policy API resources, but Policy API NSGroups created in one NSX version will be empty after upgrading NSX to a newer version.
Workaround
BOSH reconfigures a deployment’s NSGroup members if the deployment is redeployed.
After upgrading NSX, redeploy affected deployments to reconfigure their NSGroup members:
This issue is fixed in TKGI v1.20.0.
After migrating from NSX Management Plane API to NSX Policy API, rotating NSX certificates sometimes fails due to a mismatch between policy display name and ID.
Symptom
Running tkgi rotate-certificates CLUSTER --non-interactive --only-nsx
results in the following error seen in the pks-api
logs:
```
Failed to retrieve certificate of display name pks-f5703ad0-1af1-402a-8f77-8a0cb52fea58
2024-06-13 14:16:21.749 ERROR 278082 — [nio-9021-exec-8] i.p.pks.cluster.CertificateService : Unknown error occurred rotating nsx certs
```
Explanation:
When TKGI first creates a cluster, it names its NSX certificates following the pattern pks-CLUSTER-ID
, as both a display name and an internal name.
TKGI v1.14 and prior had a known issue: Rotating a cluster’s NSX certificates saved the new certificates under an autogenerated internal name, a GUID without a pks-
prefix, and did not retain the cert’s display name.
When you migrate a cluster NSX Policy API, its NSX certificate is saved as a policy object with its name set to the certificate’s internal name.
The certificate rotation process retrieves certificates by their display name, so it cannot find certificates rotated in TKGI v1.14 and prior.
Workaround
See How to rotate Tanzu Kubernetes Grid Integrated Edition tls-nsx-t cluster certificate in the Broadcom Support KB.
When TKGI is deployed on NSX v3.2.3 and there are large numbers of pods with liveness probes, the pods on TKGI-provisioned clusters can enter a NotReady
state.
Symptom
In addition to your pods being NotReady
, if you restart NSX Manager:
"POST /nsxapi/api/v1/firewall/sections/.../rules?operation=insert_bottom HTTP/1.1" ...
.Your NCP logs include errors similar to:
"nsx-container-ncp" subcomp="ncp" level="ERROR" security="True" errorCode="NCP00034"] nsx_ujo.ncp.nsx.manager.firewall_service Failed to create health check rule for port ...: Service cluster: 'https://nsx-manager.example.com' is unavailable. Please, check NSX setup and/or configuration.
Description
As pods are created or deleted, DFW firewall rules are replicated for the pod’s liveness probe. In NSX v3.2.3, the firewall rules are unintentionally duplicated during this replication. After numerous pod creation/deletion events, the compounded duplication creates a DFW firewall section large enough to create noticeable delays during pod operations and, eventually, a pod NotReady
state.
Workaround
Upgrade NSX to a version that includes the fix, namely 3.2.4 or 4.1.1 or later.
Release Date: November 02, 2023
Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.
Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.
Element | Details | |
---|---|---|
Version | v1.18.0 | |
Release date | November 02, 2023 | |
Installed TKGI version | v1.18.0 | |
Installed Ops Manager version | v3.0.18* | Release Notes |
Component | Version | |
Installed Kubernetes version | v1.27.5* | Release Notes |
Installed Harbor Registry version | v2.9.0* | Release Notes |
Ubuntu Jammy stemcell | v1.260* | Release Notes |
* Components marked with an asterisk have been updated.
The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.18.0 are from TKGI MC v1.17.1 and v1.17.0.
TKGI Management Console v1.18.0 includes the following enhancements:
tkgi delete-cluster
reliability by increasing the default TKGI Operation Timeout value from 60 seconds to 120 seconds. TKGI MC also supports configuring the TKGI Operation Timeout. For more information about configuring the TKGI Operation Timeout from the TKGI MC, see Generate Configuration File and Deploy Tanzu Kubernetes Grid Integrated Edition in Deploy Tanzu Kubernetes Grid Integrated Edition by Using the Configuration Wizard.No TKGI features have been deprecated or removed from TKGI Management Console v1.18.
Tanzu Kubernetes Grid Integrated Edition Management Console v1.18.0 has the following known issues:
This issue is fixed in TKGI v1.18.4.
Symptom
With TKGI deployed by the Management Console (MC), after you rotate the Ops Manager CA certificate, you cannot upgrade Tanzu Kubernetes Grid Integrated Edition. The upgrade fails with errors that the MC cannot access BOSH:
```
Error GetInstanceByID: cannot get BOSH client:
[...]
Get [https://10.110.93.3:25555/info|https://10.110.93.3:25555/info/]: x509: certificate signed by unknown authority
```
Workaround
Immediately after you rotate the Ops Manager CA, run the Tanzu Kubernetes Grid Integrated Edition MC Configuration Wizard, step through the configuration panes, run Generate Configuration, and then run Apply Configuration.
This issue is fixed in TKGI v1.18.5.
Symptom
On TKGI deployments on which users have updated cluster IP ranges using the NSX Manager instead of TKGI network profiles, after TKGI upgrade via the Management Console configured for Automated NAT Deployment, clusters fail with network connection errors. NCP logs list NSX configuration errors Resource could not be found
for IpPool
.
Explanation
During TKGI upgrade, the Management Console does not check whether cluster IP Pools have been updated at the underlying NSX layer, and instead re-applies the IP pool settings as configured in TKGI. This causes an IP pool mismatch between TKGI and NSX.
Workaround
Contact Support for scripts that reallocate IP addresses to the cluster’s current floating IP pool, release unused addresses, and delete stale IP pools.
To avoid this issue, update cluster IP pools via TKGI network profiles rather than in NSX Manager.
This issue is fixed in TKGI v1.18.3.
Symptom
With TKGI deployed by the Management Console (MC), after you restart the Ops Manager v3 VM from vSphere, you cannot log in to the MC, and the MC can no longer communicate with Ops Manager.
Workaround
First, determine if the Ops Manager VM address changed when you restarted it. If so, the following steps will restore its original IP address.
The MC deploys Ops Manager to the first IP in TKGI’s Deployment CIDR range, following the Gateway address, as configured under Network Resources in the MC. You can also retrieve the Deployment CIDR from the TKGI tile in Ops Manager, under Networks > Subnets > CIDR and Gateway. If the Gateway address ends in 1
and is the first in the CIDR, the Ops Manager address ends in 2
.
If the Ops Manager IP shown in vSphere is not the first IP in the Deployment CIDR range for TKGI, its address has changed.
If the Ops Manager VM address has changed, remove its networkd
service and restart the VM so that its networking service picks up the correct, static IP from its OVF settings:
ssh
in to the Ops Manager VM.sudo mv /usr/lib/systemd/system/*networkd* /root/
Update the Ops Manager VM’s network settings to specify a network that supports DHCP, and record its previous network settings.
ssh
in to the Ops Manager VM again, so if it lacks a DHCP network adapter you need some other way to access it. For example: mount its disk to another VM, inject an Ubuntu user password, and then use the vSphere GUI to power on the Ops Manager VM and log in with the new password.Power on the Ops Manager VM. It should have an IP address assigned by DHCP.
ssh
to the Ops Manager VM.sudo mv /usr/lib/systemd/system/*networkd* /root/
Symptom
The Tanzu Kubernetes Grid Integrated Edition Management Console integration to vRealize Log Insight does not support connections to the HTTPS port on the vRealize Log Insight server.
Workaround
/lib/systemd/system/pks-loginsight.service
in a text editor.-e LOG_SERVER_ENABLE_SSL_VERIFY=false
.Set -e LOG_SERVER_USE_SSL=true
.
The resulting file should look like the following example:
ExecStart=/bin/docker run --privileged --restart=always --network=pks
-v /var/log/journal:/var/log/journal
--name=pks-loginsight
-e TYPE=gear2-vm
-e LOG_SERVER_HOST=${LOGINSIGHT_HOST}
-e LOG_SERVER_PORT=${LOGINSIGHT_PORT}
-e LOG_SERVER_ENABLE_SSL_VERIFY=false
-e LOG_SERVER_USE_SSL=true
-e LOG_SERVER_AGENT_ID=${LOGINSIGHT_ID}
pksoctopus/vrli-journald:v07092019
Save the file and run systemctl daemon-reload
.
systemctl restart pks-loginsight.service
.Tanzu Kubernetes Grid Integrated Edition Management Console can now send logs to the HTTPS port on the vRealize Log Insight server.
Symptom
If you enable vSphere HA on a cluster, if the TKGI Management Console appliance VM is running on a host in that cluster, and if the host reboots, vSphere HA recreates a new TKGI Management Console appliance VM on another host in the cluster. Due to an issue with vSphere HA, the ovfenv
data for the newly created appliance VM is corrupted and the new appliance VM does not boot up with the correct network configuration.
Workaround
Reconfigure virtual machine
task has run on the appliance VM.Symptom
Some file arguments in Kubernetes profiles are base64 encoded. When the management console displays the Kubernetes profile, some file arguments are not decoded.
Workaround
Run echo "$content" | base64 --decode
Symptom
If you create network profiles and then try to apply them in the Create Cluster page, the new profiles are not available for selection.
Workaround
Log out of the management console and log back in again.
Symptom
In the cluster summary page, only default IP pool, pod IP block, node IP block values are displayed, rather than the real-time values from the associated network profile.
Workaround
None
Symptom
You receive the following error after modifying your existing Harbor installation’s storage configuration:
Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown
Explanation
Harbor does not support modifying an existing Harbor installation’s storage configuration.
Workaround
To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.
Symptom
After upgrading Ops Manager, your Management Console does not recognize a Windows stemcell imported when using the prior version of Ops Manager.
Workaround
If your Management Console does not recognize a Windows stemcell after upgrading Ops Manager:
Symptom
After you create a cluster, Tanzu Mission Control does not include the cluster in cluster lists. You have a “Resource not found” error similar to the following in your BOSH logs:
Cluster Name in TMC: cluster-1
Cluster Name Prefix: tkgi-my-prefix-
Group Name in TMC: my-prefix-clusters
Cluster Description in TMC: VMware Enterprise PKS Attaching cluster ''tkgi-my-prefix-cluster-1'' to TMC
Fetching token successful
request POST:/v1alpha1/clusters,
response 404 Not Found:{"error":"Resource not found - clustergroup(my-prefix-clusters)
org id(d859dc9f-g622-426d-8c91-939a9f13dea9)",
"code":5,"message":"Resource not found - clustergroup(my-prefix-clusters)
Explanation
The cluster group you assign a cluster to must be defined in Tanzu Mission Control before you assign your cluster to the cluster group in the TKGI Management Console.
Workaround
To resolve the problem, complete the steps in Attaching a Tanzu Kubernetes Grid Integrated (TKGI) cluster to Tanzu Mission Control (TMC) fails with “Resource not found - clustergroup(cluster-group-name)” in the VMware Tanzu Knowledge Base.