Release Notes

This topic contains release notes for Tanzu Kubernetes Grid Integrated Edition (TKGI) v1.17.

TKGI v1.17.6

Release Date: July 16, 2024

Product Snapshot

Release Details
Version	v1.17.6
Release date	July 16, 2024
Internal Component Versions
Antrea	v1.7.1
cAdvisor	v0.39.1
Containerd	Linux: v1.6.28 Windows: v1.6.28
CoreDNS	v1.9.3+vmware.23
CSI Driver for vSphere	v3.0.3	Release Notes
etcd	v3.5.11
Harbor	v2.10.2	Release Notes
Kubernetes	v1.26.15	Release Notes
Metrics Server	v0.6.4
NCP	v4.1.1.4	Release Notes
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v8.0.31-23 pxc-release: v1.0.8	Release Notes: PXC pxc-release
UAA	v74.5.122*
Velero	v1.10.3	Release Notes
Wavefront	Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4
Stemcell Compatibility
Ubuntu Jammy stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
Windows stemcells	v2019.74* or later
Interoperability
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
VMware Aria Operations Management Pack for Kubernetes	v2.0	Release Notes
VMware Cloud Foundation (VCF)	v5.0, v4.5.2	Release Notes: v5.0, v4.5.2
VMware NSX**	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.

* Components marked with an asterisk have been updated.

** As of May 7, 2024, NSX networking and firewall components are sold separately from TKGI.

*** Migration from NSX Management Plane API to NSX Policy API requires VMware NSX v4.0.1.1 or later. NSX v4.0.1.1 supports only 50% of NSX Management Plane API scale. To use Policy API at 100% of Management Plane API scale, you require NSX v4.1.1.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.17.6 are from TKGI v1.17.5, and TKGI v1.16.8 and earlier TKGI v1.16 patches.

Breaking Changes

As of the TKGI v1.17.6 release date, Broadcom Support organizes TKGI downloads as follows:

Tanzu Kubernetes Grid Integrated Edition (TKGi) - CLI & Tile: TKGI Ops Manager Tile, TKGI CLI and compatible Kubectl versions, TKGI OSL.
Tanzu Kubernetes Grid Integrated Edition (TKGi) - Mgmt Console: TKGI Management Console and Velero backup tools.

Features and Enhancements

TKGI v1.17.6 does not include any new features.

Resolved Issues

TKGI v1.17.6 resolves the following issues:

Fixes TKGI Cluster creation with NSX fails with “no available capacity on edge node”.
Fixes Upgrading cluster with CLI loses tags

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.17.5 are also in Tanzu Kubernetes Grid Integrated Edition v1.17.6. For more information, see TKGI v1.17.5 Known Issues below.

TKGI v1.17.6 does not include any additional known issues.

TKGI Management Console v1.17.6

Release Date: July 16, 2024

Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.

Product Snapshot

Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.

Element	Details
Version	v1.17.6
Release date	July 16, 2024
Installed TKGI version	v1.17.6
Installed Ops Manager version	v3.0.30*	Release Notes
Component	Version
Installed Kubernetes version	v1.26.15	Release Notes
Installed Harbor Registry version	v2.10.2	Release Notes
Ubuntu Jammy stemcell	v1.390	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.6 are from TKGI MC v1.17.5, and TKGI MC v1.16.8 and earlier TKGI v1.16 patches.

Features and Resolved Issues

TKGI MC v1.17.6 resolves the following issue:

Fixes issue Cannot upgrade after rotating Ops Manager CA.
Fixes issue Wrong cluster Floating IP pools after TKGI upgrade with Management Console.
Updated Photon OS to v4.0 to address CVE-2024-6387.

TKGI Management Console v1.17.6 does not include any new features.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.5 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.6. For more information, see TKGI v1.17.5 Known Issues below.

TKGI MC v1.17.6 does not include any additional known issues.

TKGI v1.17.5

Release Date: May 9, 2024

Product Snapshot

Release Details
Version	v1.17.5
Release date	May 9, 2024
Internal Component Versions
Antrea	v1.7.1
cAdvisor	v0.39.1
Containerd	Linux: v1.6.28 Windows: v1.6.28
CoreDNS	v1.9.3+vmware.23*
CSI Driver for vSphere	v3.0.3	Release Notes
etcd	v3.5.11*
Harbor	v2.10.2*	Release Notes
Kubernetes	v1.26.15*	Release Notes
Metrics Server	v0.6.4
NCP	v4.1.1.4	Release Notes
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v8.0.31-23 pxc-release: v1.0.8	Release Notes: PXC pxc-release
UAA	v74.5.111*
Velero	v1.10.3	Release Notes
Wavefront	Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4
Stemcell Compatibility
Ubuntu Jammy stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
Windows stemcells	v2019.72* or later
Interoperability
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
VMware Aria Operations Management Pack for Kubernetes	v2.0	Release Notes
VMware Cloud Foundation (VCF)	v5.0, v4.5.2	Release Notes: v5.0, v4.5.2
VMware NSX**	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.

* Components marked with an asterisk have been updated.

** As of May 7, 2024, NSX networking and firewall components are sold separately from TKGI.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.17.5 are from TKGI v1.17.4, and TKGI v1.16.8 and earlier TKGI v1.16 patches.

Breaking Changes

As of the TKGI v1.17.5 release date, you can find Management Console downloads at Broadcom Support (login required) instead of VMware Customer Connect. The TKGI CLI continues to be distributed on VMware Tanzu Network until May 31, 2024.

The TKGI downloads page on Broadcom Support organizes TKGI downloads as follows:

Workload Backup and Recovery: Some of the Velero backup tools for TKGI.
VMware Tanzu Kubernetes Grid Integrated Edition: The TKGI OSL and Ops Manager tile.
TKG Integrated Edition: The TKGI Management Console and other Velero backup tools for TKGI.

Features and Enhancements

Cluster update retry setting: Ops Manager TKGI tile > TKGI API > Automatic retry on cluster update operations failure option sets TKGI to retry tkgi update-cluster process up to three times if it fails, to improve resilience.

Resolved Issues

TKGI v1.17.5 resolves the following issues:

Fixes Wrong Node Scale after Updating Cluster with New Compute Profile.
Added Strict-Transport-Security header to nginx HTTPS server configuration on Management Console VM to improve security scan results.
Fixes Ephemeral storage overrun with temporary logfiles from Fluentd.
Fixes cluster creation error when service-cluster-ip-range in a custom Kubernetes profile is set to a smaller range than the Networking > Kubernetes Service Network CIDR Range setting in the TKGI Ops Manager tile.
Fixes Error creating NSX-T cluster network... Invalid transport path error in NSX Policy API mode.
SSH configuration improvements address CVE-2023-48795.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.17.4 are also in Tanzu Kubernetes Grid Integrated Edition v1.17.5. For more information, see TKGI v1.17.4 Known Issues below.

TKGI v1.17.5 does not include any additional known issues.

TKGI Management Console v1.17.5

Release Date: May 9, 2024

Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.

Product Snapshot

Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.

Element	Details
Version	v1.17.5
Release date	May 9, 2024
Installed TKGI version	v1.17.5
Installed Ops Manager version	v3.0.25*	Release Notes
Component	Version
Installed Kubernetes version	v1.26.15*	Release Notes
Installed Harbor Registry version	v2.10.2	Release Notes
Ubuntu Jammy stemcell	v1.390	Release Notes

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.5 are from TKGI MC v1.17.4, and TKGI MC v1.16.8 and earlier TKGI v1.16 patches.

Features and Resolved Issues

TKGI MC v1.17.5 resolves the following issue:

Fixes Updating cluster compute profile loses node drain and shutdown settings.
Fixes issue of LDAP password appears not hidden in MC-generated TKGI configuration file.
Fixes network configuration error “network configuration | Failed | failed to create overlay segment … Invalid transport zone path” during cluster deploy after migrating NSX to Policy Mode as described in NSX Management Plane API to NSX Policy API](mp2p-migration.html).

TKGI Management Console v1.17.5 does not include any new features.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.4 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.5. For more information, see TKGI v1.17.4 Known Issues below.

TKGI MC v1.17.5 does not include any additional known issues.

TKGI v1.17.4

Release Date: March 14, 2024

Product Snapshot

Release Details
Version	v1.17.4
Release date	March 14, 2024
Internal Component Versions
Antrea	v1.7.1
cAdvisor	v0.39.1
Containerd	Linux: v1.6.28* Windows: v1.6.28*
CoreDNS	v1.9.3+vmware.22*
CSI Driver for vSphere	v3.0.3	Release Notes
etcd	v3.5.9
Harbor	v2.10.0	Release Notes
Kubernetes	v1.26.14*	Release Notes
Metrics Server	v0.6.4
NCP	v4.1.1.4*	Release Notes
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v8.0.31-23 pxc-release: v1.0.8	Release Notes: PXC pxc-release
UAA	v74.5.105*
Velero	v1.10.3	Release Notes
Wavefront	Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4
Stemcell Compatibility
Ubuntu Jammy stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
Windows stemcells	v2019.69* or later
Interoperability
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
VMware Aria Operations Management Pack for Kubernetes	v2.0	Release Notes
VMware Cloud Foundation (VCF)	v5.0, v4.5.2	Release Notes: v5.0, v4.5.2
VMware NSX	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.

* Components marked with an asterisk have been updated.
** VCF v5.0 and VCF v4.5.2 are supported but have not been tested with TKGI v1.17.
*** Migration from NSX Management Plane API to NSX Policy API requires VMware NSX v4.0.1.1 or later. NSX v4.0.1.1 supports only 50% of NSX Management Plane API scale. To use Policy API at 100% of Management Plane API scale, you require NSX v4.1.1.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.17.4 are from TKGI v1.17.3, and TKGI v1.16.7 and earlier TKGI v1.16 patches.

Breaking Changes

TKGI v1.17.4 does not include any new breaking changes.

Features and Enhancements

TKGI v1.17.4 does not include any new features.

Resolved Issues

TKGI v1.17.4 resolves the following issues:

Containerd update to v1.6.28 fixes issue High-Severity CVE-2024-21626 in runc 1.1.11 and earlier.
Fixes security issue of Telegraf host-monitoring outputs exposing node OS release info.
Fixes issue of snapshot validation not getting correct CSI service account.
NCP update to v4.1.1.4 fixes issues:
- The parameter cookie_name is configurable in the network profile when ingress persistence_type is set as cookie.
- Issue 3332908: Failure of load balancer during an edit of VS instance in TKGI.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.17.3 are also in Tanzu Kubernetes Grid Integrated Edition v1.17.4. For more information, see TKGI v1.17.3 Known Issues below.

TKGI v1.17.4 does not include any additional known issues.

TKGI Management Console v1.17.4

Release Date: March 14, 2024

Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.

Product Snapshot

Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.

Element	Details
Version	v1.17.4
Release date	March 14, 2024
Installed TKGI version	v1.17.4
Installed Ops Manager version	v3.0.24*	Release Notes
Component	Version
Installed Kubernetes version	v1.26.14*	Release Notes
Installed Harbor Registry version	v2.10.0	Release Notes
Ubuntu Jammy stemcell	v1.390*	Release Notes

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.4 are from TKGI MC v1.17.3, and TKGI MC v1.16.7 and earlier TKGI v1.16 patches.

Features and Resolved Issues

TKGI MC v1.17.4 resolves the following issue:

Fixes issue Cannot log in to the TKGI MC after restarting Ops Manager v3 VM from vSphere.

TKGI Management Console v1.17.4 does not include any new features.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.3 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.4. For more information, see TKGI v1.17.3 Known Issues below.

TKGI MC v1.17.4 does not include any additional known issues.

TKGI v1.17.3

Release Date: January 31, 2024

Product Snapshot

Release Details
Version	v1.17.3
Release date	January 31, 2024
Internal Component Versions
Antrea	v1.7.1
cAdvisor	v0.39.1
Containerd	Linux: v1.6.24 Windows: v1.6.24
CoreDNS	v1.9.3+vmware.20*
CSI Driver for vSphere	v3.0.3	Release Notes
etcd	v3.5.9
Harbor	v2.8.4	Release Notes
Kubernetes	v1.26.12*	Release Notes
Metrics Server	v0.6.4
NCP	v4.1.1.3*	Release Notes
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v8.0.31-23 pxc-release: v1.0.8	Release Notes: PXC pxc-release
UAA	v74.5.99*
Velero	v1.10.3	Release Notes
Wavefront	Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4
Stemcell Compatibility
Ubuntu Jammy stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
Windows stemcells	v2019.65* or later
Interoperability
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
VMware Aria Operations Management Pack for Kubernetes	v2.0	Release Notes
VMware Cloud Foundation (VCF)	v5.0, v4.5.2	Release Notes: v5.0, v4.5.2
VMware NSX	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.17.3 are from TKGI v1.17.2, and TKGI v1.16.6 and earlier TKGI v1.16 patches.

Breaking Changes

TKGI v1.17.3 does not include any new breaking changes.

Features and Enhancements

TKGI v1.17.3 includes the following enhancement:

Prevents broker APIs from including base64-encoded PNG images in broker service metadata, to save syslog space.

Resolved Issues

TKGI v1.17.3 resolves the following issues:

Fixes vSphere CSI Failure When Backslash in User Name.
Fixes Error Scaling Clusters when Compute Profile Lacks Node Pool.
Fixes issue: Updating Kubernetes clusters loses ClusterMetricSink configuration details, preventing Prometheus from contacting Telegraf pods.
NCP update to v4.1.1.3 fixes bugs:
- TKGI apps cannot access NSX load balancer.
- Issue 3310860 - UDP connections with the same port may fail when endpoints of clusterIP Service restart in the NCP 4.1.1.3 Release Notes.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.17.2 are also in Tanzu Kubernetes Grid Integrated Edition v1.17.3. For more information, see TKGI v1.17.2 Known Issues below.

TKGI v1.17.3 does not include any additional known issues.

Important: To address CVE-2024-21626 by patching TKGI with a runc upgrade, see High-Severity CVE-2024-21626 in runc 1.1.11 and earlier below.

TKGI Management Console v1.17.3

Release Date: January 31, 2024

Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.

Product Snapshot

Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.

Element	Details
Version	v1.17.3
Release date	January 31, 2024
Installed TKGI version	v1.17.3
Installed Ops Manager version	v3.0.23*	Release Notes
Component	Version
Installed Kubernetes version	v1.26.12*	Release Notes
Installed Harbor Registry version	v2.8.4	Release Notes
Ubuntu Jammy stemcell	v1.340*	Release Notes

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.3 are from TKGI MC v1.17.2, and TKGI MC v1.16.6 and earlier TKGI v1.16 patches.

Features and Resolved Issues

TKGI MC v1.17.3 resolves the following issue:

Fixes Cannot SSH in to Ops Manager Using Displayed Key.

TKGI Management Console v1.17.3 does not include any new features.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.2 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.3. For more information, see TKGI v1.17.2 Known Issues below.

TKGI MC v1.17.3 does not include any additional known issues.

TKGI v1.17.2

Release Date: November 16, 2023

Product Snapshot

Release Details
Version	v1.17.2
Release date	November 16, 2023
Internal Component Versions
Antrea	v1.7.1*
cAdvisor	v0.39.1
Containerd	Linux: v1.6.24* Windows: v1.6.24*
CoreDNS	v1.9.3+vmware.18*
CSI Driver for vSphere	v3.0.3*	Release Notes
etcd	v3.5.9
Harbor	v2.8.4*	Release Notes
Kubernetes	v1.26.10*	Release Notes
Metrics Server	v0.6.4
NCP	v4.1.1.1	Release Notes
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v8.0.31-23 pxc-release: v1.0.8	Release Notes: PXC pxc-release
UAA	v74.5.92*
Velero	v1.10.3	Release Notes
Wavefront	Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4
Stemcell Compatibility
Ubuntu Jammy stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
Windows stemcells	v2019.65* or later
Interoperability
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
VMware Aria Operations Management Pack for Kubernetes	v2.0	Release Notes
VMware Cloud Foundation (VCF)	v5.0, v4.5.2	Release Notes: v5.0, v4.5.2
VMware NSX	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.17.2 are from TKGI v1.17.1, and TKGI v1.16.5 and earlier TKGI v1.16 patches.

Breaking Changes

TKGI v1.17.2 does not include any new breaking changes.

Features and Enhancements

TKGI v1.17.2 does not include any new features or enhancements.

Resolved Issues

TKGI v1.17.2 resolves the following issues:

Component bumps fix the following:
- Upgrades Antrea to include downstream v1.11.3+vmware.2 enhancements: Fixes Antrea agent pods restart issue when FQDN-based rules or network policy logging is used.
- Upgrades CSI Driver for vSphere to v3.0.3: Fixes Prevent node cache update during attach and detach.
Fixes TKGI Does Not Support the Antrea Egress Feature on AWS.

Note: You must grant the AWS Worker Instance Profile additional AWS Identity and Access Management (IAM) permissions before using the Antrea Egress feature with worker nodes on AWS. For more information, see Prepare AWS Worker Instance Profile Permissions in General Troubleshooting.
Fixes Cluster Update Operations Fail Due to Duplicate Tag Keys.
Fixes Node Drain Operation Ignores the TKGI Deployment Plan Settings.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.17.1 are also in Tanzu Kubernetes Grid Integrated Edition v1.17.2. For more information, see TKGI v1.17.1 Known Issues below.

TKGI v1.17.2 does not include any additional known issues.

TKGI Management Console v1.17.2

Release Date: November 16, 2023

Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.

Product Snapshot

Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.

Element	Details
Version	v1.17.2
Release date	November 16, 2023
Installed TKGI version	v1.17.2
Installed Ops Manager version	v3.0.18*	Release Notes
Component	Version
Installed Kubernetes version	v1.26.10*	Release Notes
Installed Harbor Registry version	v2.8.4*	Release Notes
Ubuntu Jammy stemcell	v1.289*	Release Notes

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.2 are from TKGI MC v1.17.1, and TKGI MC v1.16.5 and earlier TKGI v1.16 patches.

Features and Resolved Issues

TKGI Management Console v1.17.2 does not include any new features or resolved issues.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.1 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.2. For more information, see TKGI v1.17.1 Known Issues below.

TKGI MC v1.17.2 has the following additional known issue.

Cannot SSH in to Ops Manager Using Displayed Key

This issue is fixed in TKGI v1.17.3.

Symptom

After upgrading to TKGI v1.17.2 you cannot use the key displayed in the Management Console > Deployment Metadata tab to ssh in to the Ops Manager VM as described in Connect to Operations Manager with SSH .

Explanation

The upgrade process generates a new SSH private key but does not update its value as shown in the Management Console > Deployment Metadata tab. After upgrade, the displayed key no longer matches any public key in the Ops Manager VM’s /home/ubuntu/.ssh/authorized_keys.

Workaround

Retrieve and use an up-to-date SSH private key from a configuration file on the TKG Management Console VM:

ssh in to the Management Console VM as root:
```
ssh root@MC-IP-ADDRESS
```
Where MC-IP-ADDRESS is the IP address that you use to access the Management Console with a browser as described in Step 2: Log In to TKGI Management Console.
Retrieve the private key from the file /etc/vmware/.pks/om_root_ca.
ssh in to the Ops Manager VM by passing the private key to the -i flag of ssh. For example, to connect to the Ops Manager VM directly from the MC VM:
```
ssh -i /etc/vmware/.pks/om_root_ca ubuntu@OPS-MANAGER-IP
```
You can also copy the private key to ssh from your local workstation.

TKGI v1.17.1

Release Date: September 14, 2023

Product Snapshot

Release Details
Version	v1.17.1
Release date	September 14, 2023
Internal Component Versions
Antrea	v1.7.0	Release Notes
cAdvisor	v0.39.1
Containerd	Linux: v1.6.18 Windows: v1.6.18
CoreDNS	v1.9.3+vmware.16*
CSI Driver for vSphere	v3.0.2	Release Notes
etcd	v3.5.9
Harbor	v2.8.2	Release Notes
Kubernetes	v1.26.8*	Release Notes
Metrics Server	v0.6.4*
NCP	v4.1.1.1*	Release Notes
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v8.0.31-23 pxc-release: v1.0.8	Release Notes: PXC pxc-release
UAA	v74.5.85*
Velero	v1.10.3	Release Notes
Wavefront	Wavefront Collector: v1.13.0 Wavefront Proxy: v12.4
Stemcell Compatibility
Ubuntu Jammy stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
Windows stemcells	v2019.61 or later
Interoperability
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
VMware Aria Operations Management Pack for Kubernetes	v2.0	Release Notes
VMware Cloud Foundation (VCF)	v5.0, v4.5.2	Release Notes: v5.0, v4.5.2
VMware NSX	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.17.1 are from TKGI v1.17.0, and TKGI v1.16.3 and earlier TKGI v1.16 patches.

Breaking Changes

TKGI v1.17.1 does not include any new breaking changes.

Features and Enhancements

TKGI v1.17.1 does not include any new features or enhancements.

Resolved Issues

TKGI v1.17.1 resolves the following issues:

[Security Fix] Component bumps fix the following:
- Upgrades Kubernetes to v1.26.8:
  - Fixes CVE-2023-2728.
- Upgrades Metrics Server to v0.6.4:
  - Fixes CVE-2022-28948 and CVE-2022-41721.
- Fixes CVE-2023-3676 and CVE-2023-3955.
Upgrades NCP to v4.1.1.1:
- Fixes FailedCreatePodSandBox error during cluster creation.
Fixes Cluster Might Fail to Send the cluster_name Tag to Logging after Cluster Upgrade.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.17.0 are also in Tanzu Kubernetes Grid Integrated Edition v1.17.1. For more information, see TKGI v1.17.0 Known Issues below.

TKGI v1.17.1 does not include any additional known issues.

TKGI Management Console v1.17.1

Release Date: September 14, 2023

Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.

Product Snapshot

Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.

Element	Details
Version	v1.17.1
Release date	September 14, 2023
Installed TKGI version	v1.17.1
Installed Ops Manager version	v3.0.14*	Release Notes
Component	Version
Installed Kubernetes version	v1.26.8*	Release Notes
Installed Harbor Registry version	v2.8.2	Release Notes
Ubuntu Jammy stemcell	v1.207*	Release Notes

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.1 are from TKGI MC v1.17.0, and TKGI MC v1.16.3 and earlier TKGI v1.16 patches.

Features and Resolved Issues

TKGI Management Console v1.17.1 includes the following enhancements:

NSX Only: Enhances tkgi delete-cluster reliability by increasing the default TKGI Operation Timeout value from 60 seconds to 120 seconds. TKGI MC v1.17.1 also supports configuring the TKGI Operation Timeout. For more information about configuring the TKGI Operation Timeout from the TKGI MC, see Generate Configuration File and Deploy Tanzu Kubernetes Grid Integrated Edition in Deploy Tanzu Kubernetes Grid Integrated Edition by Using the Configuration Wizard.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.0 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.1. For more information, see TKGI v1.17.0 Known Issues below.

TKGI MC v1.17.1 does not include any additional known issues.

TKGI v1.17.0

Release Date: August 3, 2023

Product Snapshot

Release Details
Version	v1.17.0
Release date	August 3, 2023
Internal Component Versions
Antrea	v1.7.0*	Release Notes
cAdvisor	v0.39.1
Containerd	Linux: v1.6.18* Windows: v1.6.18*
CoreDNS	v1.9.3+vmware.11*
CSI Driver for vSphere	v3.0.2*	Release Notes
etcd	v3.5.9*
Harbor	v2.8.2*	Release Notes
Kubernetes	v1.26.5*	Release Notes
Metrics Server	v0.6.1
NCP	v4.1.1.0*
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v8.0.31-23* pxc-release: v1.0.8*	Release Notes: PXC pxc-release
UAA	v74.5.81*
Velero	v1.10.3*	Release Notes
Wavefront	Wavefront Collector: v1.13.0* Wavefront Proxy: v12.4*
Stemcell Compatibility
Ubuntu Jammy stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
Windows stemcells	v2019.61* or later
Interoperability
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
VMware Aria Operations Management Pack for Kubernetes	v2.0*	Release Notes
VMware Cloud Foundation (VCF)	v5.0, v4.5.2	Release Notes: v5.0, v4.5.2
VMware NSX	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition v1.17.0 are from TKGI v1.16.2 and earlier TKGI v1.16 patches.

Breaking Changes

TKGI v1.17.0 has the following breaking changes:

Removals in Kubernetes v1.26: The following APIs are removed in Kubernetes v1.26:
- v1alpha2 CRI API
- v1beta1 flow control API group
- v2beta2 HorizontalPodAutoscaler API
For information on other removals in Kubernetes v1.26, see Kubernetes Removals, Deprecations, and Major Changes in 1.26 in Kubernetes Blog.
In-Tree vSphere Storage Volume Support: In-Tree vSphere Storage volume support has been entirely removed. The TKGI v1.17 upgrade automatically migrates TKGI clusters from in-tree vSphere storage to the CSI Driver for vSphere. VMware strongly recommends that you migrate your in-tree vSphere storage volumes to vSphere CSI volumes before upgrading to TKGI v1.17. For information on how to manually migrate In-Tree vSphere Storage volumes on existing TKGI clusters from In-Tree vSphere Storage to the automatically installed vSphere CSI Driver, see Migrate an In-Tree vSphere Storage Volume to the vSphere CSI Driver in Deploying and Managing Cloud Native Storage (CNS) on vSphere.

Warning: If you have TKGI-provisioned Windows worker clusters, do not activate the Upgrade all clusters errand before upgrading to the TKGI v1.17 tile. You cannot use the Upgrade all clusters errand because you must manually migrate each individual Windows worker cluster to the CSI Driver for vSphere. For more information, see Configure vSphere CSI for Windows in Deploying and Managing Cloud Native Storage (CNS) on vSphere.

Features and Enhancements

TKGI v1.17.0 includes the following features:

Cluster Upgrade Enhancements

The tkgi upgrade-cluster CLI command includes the following enhancements:

Supports upgrading a cluster’s worker nodes in parallel. For more information, see Upgrade Worker Nodes in Parallel in Upgrading Clusters.
Supports automated cluster validation before upgrading a cluster. For more information, see Upgrade Cluster Validation in Upgrading Clusters.

Note: These enhancements apply to tkgi upgrade-cluster only. For example, when using tkgi upgrade-clusters, the clusters can be upgraded in parallel, but the worker nodes within an upgrading cluster are always upgraded serially.

Compute Profile Enhancements

TKGI v1.17.0 includes the following compute profile enhancements:

Supports configuring compute profiles with a node pool description. For more information, see node_pools Block in Creating and Managing Compute Profiles with the CLI (vSphere).
Supports optionally skipping compute profile validation. For more information, see Assign a Compute Profile in Using Compute Profiles (vSphere).
Prevents updating an existing compute profile with unsupported configuration changes. Prevents: Changing the number of control plane nodes, changing the node pool name property, and adding a new node pool while deleting an existing node pool.

CSI Driver for vSphere Enhancements

TKGI v1.17.0 includes the following CSI Driver for vSphere enhancements:

Supports the XFS file system on Linux clusters. For more information, see Use XFS File System with vSphere Container Storage Plug-in in the VMware vSphere Container Storage Plug-in documentation.
Supports the CSI Driver for vSphere on Windows clusters. For more information, see Configure vSphere CSI for Windows in Deploying and Managing Cloud Native Storage (CNS) on vSphere.

Additional Features

TKGI v1.17.0 includes the following additional features:

Supports VMware vSphere v8.0 and VMware vSAN 8.0. For more information, see Scenario 2: Upgrading to TKGI v1.17 and vSphere v8.0 in Upgrade Order for TKGI Environments on vSphere and VMware Product Interoperability Matrices.
TKGI API and the UAA connectivity now support TLS v1.3, in addition to TLS v1.2.
Supports configuring cluster-level Pod Security Admission (PSA). For more information, see Pod Security Admission in a TKGI Cluster.
NSX Only: Enhances tkgi delete-cluster reliability by increasing the default TKGI Operation Timeout from 60 seconds to 120 seconds. For more information about configuring the TKGI Operation Timeout field on the TKGI Tile, see Networking in Installing Tanzu Kubernetes Grid Integrated Edition on vSphere with VMware NSX.
Upgrades the TKGI Database from MySQL v5.7 to MySQL v8. For information about the differences between MySQL v5.7 to MySQL v8, see MySQL Server Version Reference in the MySQL documentation.
Supports resizing clusters that have not been upgraded to the current TKGI control plane version. For more information, see Tasks Supported Following a TKGI Control Plane Upgrade in About TKGI Upgrades.
vSphere with VMware NSX only: Supports specifying a network profile for configuring a TKGI upgrade smoke test cluster. For more information, see Errands in Installing Tanzu Kubernetes Grid Integrated Edition on vSphere with VMware NSX.
Improves NSX resource clean-up when deleting a Kubernetes cluster in NSX Policy API mode.
Supports configuring additional NCP Network Profiles parameters: nsx_v3.cookie_name, nsx_v3.members_per_medium_lbs, nsx_v3.members_per_small_lbs, nsx_v3.natfirewallmatch, nsx_v3.ncp_enforced_pool_member_limit, nsx_v3.relax_scale_validation. For more information, see cni_configurations Extensions Parameters in Creating and Managing Network Profiles.
Supports configuring the Fluent Bit container memory limit. For more information, see Log Sink Resources in Installing TKGI or The Fluent Bit Pod restarts Due to Out-of-Memory Issue in Troubleshooting.
Increases the default expiration period of the monitoring-metric-cert certificate from one to four years.
Decreases the Fluentd default refresh interval to 30 seconds from 60 seconds. This ensures that all the logs are forwarded to VMware vRealize Log Insight consistently.
Supports NSX Policy API at 100% of Management Plane API scale with NSX v4.1.1.

Resolved Issues

TKGI v1.17.0 resolves the following issues:

Fixes ‘Input not an X.509 certificate’ When Applying Change on the TKGI Tile.
Fixes The ‘kube-state-metrics’ ClusterRole Is Deleted during Cluster Upgrade.
Fixes Rotated TKGI Certificates Remain Listed as Expiring on the Ops Manager Certificates List.
Fixes Kubernetes API Server and etcd Daemon Occasionally Fail to Start During BBR Restore.
Fixes Cluster Deletion Incomplete If an Error Occurs.
Fixes The Validator Secret Certificate Is Not Rotated.
Fixes HTTPS Ingress Outage During VMware NSX Certificate Rotation.
Fixes NullPointerException error when creating a Compute Profile configured with instances: 0 and the max_worker_instances parameter.
Fixes TKGI Sets the Maximum Persistent Volumes per Node to 59 Instead of 45.
Fixes Some Telegraf Metric-Sink Pods Crash After TKGI Upgrade.
Fixes Telemetry Does Not Report Large Metrics.
Fixes API Server Audit Logs Leak Tokens.
Component bumps fix the following:
- Upgrades CSI Driver for vSphere to v3.0.2:
  - Fixes 2.5 and older series has locking related issues: #2155.
  - Fixes vSphere CSI driver is unable to detect not authenticated sessions: #2157.
- Upgrades Antrea to v1.7.0: Fixes Antrea agent pods restart issue when FQDN-based rules or network policy logging is used.

Deprecations

The following TKGI features have been deprecated or removed from TKGI v1.17:

In-Tree vSphere Storage Volume Support: In-Tree vSphere Storage volume support has been entirely removed. For more information, see Breaking Changes above.
Google Cloud Platform: Support for the Google Cloud Platform (GCP) is deprecated. Support for GCP will be entirely removed in TKGI v1.19.
The log_dropped_traffic CNI Configuration parameter: In TKGI v1.17.0 and later, the log_dropped_traffic CNI Configuration parameter is ignored.

To configure logging in a Network Profile, modify the log_firewall_traffic parameter. For more information, see log_settings in the cni_configurations Parameters section in Creating and Managing Network Profiles.
Flannel Support: Support for the Flannel Container Networking Interface (CNI) is deprecated. Support for Flannel will be entirely removed in TKGI v1.19. VMware recommends that you switch your Flannel CNI-configured clusters to the Antrea CNI. For more information about Flannel CNI deprecation, see About Switching from the Flannel CNI to the Antrea CNI in About Tanzu Kubernetes Grid Integrated Edition Upgrades.
SecurityContextDeny Admission Controller Support: TKGI support for the SecurityContextDeny admission controller will be removed in TKGI v1.18. SecurityContextDeny has been deprecated, and the Kubernetes community recommends the controller not be used. Pod security admission (PSA) is the preferred method for providing a more secure Kubernetes environment. For more information about PSA, see Pod Security Admission in TKGI.

Known Issues

TKGI v1.17.0 has the following known issues.

High-Severity CVE-2024-21626 in runc 1.1.11 and earlier

This issue is fixed in TKGI v1.17.4.

To address CVE-2024-21626, which impacts runc v1.1.11 and earlier, follow Instructions to address CVE-2024-21626 for TKGI in the VMware Knowledge Base.

Limitations on Using the VMware vSphere CSI Driver

The VMware vSphere CSI Driver supports a limited set of VMware vSphere features. Before enabling the vSphere CSI Driver on a TKGI cluster, confirm the cluster and storage configuration are supported by the driver. For more information, see Unsupported Features and Limitations in Deploying and Managing Cloud Native Storage (CNS) on vSphere.

Limitations on Using a Public Cloud CSI Driver

TKGI supports using a public cloud CSI Driver on a TKGI-provisioned cluster.

Installing a Public Cloud CSI Driver on a TKGI Cluster

If you plan to use a public cloud CSI Driver on a TKGI-provisioned cluster, VMware recommends you take additional steps before installing the CSI Driver:

For most public clouds, VMware recommends you follow the CSI Driver installation procedure recommended by the public cloud provider.
For installing the Azure CSI Driver on a TKGI cluster, VMware recommends you follow the procedure in the How to install Azure file/disk CSI driver onto TKGI 1.14 cluster knowledge base article in the VMware Tanzu Support Hub.

Managing a TKGI Cluster That Uses a Public Cloud CSI Driver

If you have enabled a public cloud CSI Driver on a TKGI cluster, you must take additional steps when deleting，upgrading, or updating the cluster:

Updating a Cluster on a Public Cloud
Upgrading a Cluster on a Public Cloud
Deleting a Cluster on a Public Cloud

Updating a Cluster on a Public Cloud

When updating a cluster that uses a public cloud CSI Driver:

No preparation step are needed when updating a multi-worker node cluster.
To prepare a single-worker node cluster for updating:
1. Resize the cluster to two or more worker nodes before updating the cluster. For more information, see Scaling Existing Clusters.
2. Update the cluster.

Upgrading a Cluster on a Public Cloud

When upgrading a cluster that uses a public cloud CSI Driver:

No preparation steps are needed when upgrading a multi-worker node cluster.
To prepare a single-worker node cluster for upgrading:
1. Resize the cluster to two or more worker nodes before upgrading the cluster. For more information, see Scaling Existing Clusters.
2. Upgrade the cluster. For more information on upgrading clusters, see Upgrading Clusters.

Deleting a Cluster on a Public Cloud

When deleting a cluster that uses a public cloud CSI Driver:

Manually delete the workload PVCs and PVs before deleting the cluster.
Delete the cluster. For more information on deleting clusters, see Deleting Clusters.

Upgrading cluster with CLI loses tags

This issue is fixed in TKGI v1.17.6.

Symptom

After you run tkgi upgrade-cluster, the cluster’s tags no longer appear in the AWS management console or other infrastructure portal. This issue exists on vSphere, AWS and Azure.

Workaround

To restore the cluster tags after upgrading a cluster with the TKGI CLI:

Run tkgi cluster CLUSTER-NAME as described in Review Your Tags and copy the Tags: value from the command output.
Run tkgi update-cluster CLUSTER-NAME --tags TAGS and pass in the existing tags value.

TKGI Cluster creation with NSX Edge fails with “no available capacity on edge node”.

This issue is fixed in TKGI v1.17.6.

Symptom

Deploying new clusters with NSX Edge nodes fails due to a failure in the pks-nsx-t-prepare-master-vm job. On TKGI control plane VMs, the job log file /var/vcap/data/sys/log/pks-nsx-t-prepare-master-vm/pre-start.stdout.log reports an error like:

Creating Load Balancer
create loadbalancer: update lb service: [PUT /infra/lb-services/{lb-service-id}][400] updateLBServiceBadRequest &{RelatedAPIError:{Details: ErrorCode:502001 ErrorData:<nil> ErrorMessage:Errors validating path=[/infra/lb-services/lb-pks-b5ef6df4-cd11-4461-8861-893533940ecb]. ModuleName:policy} RelatedErrors:[0xc0001b25a0]}

Explanation

When creating a cluster, TKGI creates a NSX Tier-1 gateway and attaches a load balancer to it. This becomes the cluster’s default load balancer, hosting the virtual server for the cluster’s API endpoints and ingress rules. The error occurs when the Tier-1 gateway creates the LB in the NSX routing allocation pool instead of the NSX LB allocation pool. This can cause NSX Service Router components to deploy to edge nodes with no LB capacity, resulting in cluster creation failure.

Workarounds

Use a different edge cluster with load balancer capacity on most nodes.
Add nodes to the current edge clusters, in pairs to allow deployment of both active and standby service routers.
Reconfigure allocation pools for existing TKGI cluster’s Tier-1 router.
- This does not apply to routers created for namespaces in dedicated Tier-1 topology.

Ephemeral storage overrun with temporary logfiles from Fluentd

This issue is fixed in TKGI v1.17.5.

Symptom

Fluentd component logfiles that contain error text invalid byte sequence in UTF-8 fill up and overrun ephemeral storage on cluster, for example /tmp/fluent/backup/worker0/object_c79c/ mapped to ephemeral storage disk dev/sdb1.

Workaround

Modify Fluentd configuration by changing its log handling code out_loginsight_buffered.rb as follows:

Upload the latest BOSH release to the BOSH Director:
```
bosh upload-release --sha1 daf34e35f1ac678ba05db3496c4226064b99b3e4 "https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=22.2.1" 
```
For a different release, download it with wget https://bosh.io/d/github.com/cloudfoundry/os-conf-release?v=RELEASE" and calculate its SHA-1 with shasum os-conf-release\?v\=RELEASE".

Create a runtime config file runtime.yml with the following code, to modify the out_loginsight_buffered.rb code:

releases:
- name: "os-conf"
version: "22.2.1"
addons:
- name: os-configuration
include:
  deployments: [dep1,dep2...]  =====> need to change to the cluster deployment id that you want to apply the change, if you want apply the change to all the clusters, just delete the include section.
jobs:
- name: pre-start-script
  release: os-conf
  properties:
    script: |-
      #!/bin/bash
      if [ -d "/var/vcap/jobs/fluentd/packages/vrli-fluentd/plugins/" ]; then
        sed -i 's/force_encoding("utf-8")/encode!('\''UTF-8'\'', '\''binary'\'', invalid: :replace, undef: :replace, replace: '\'' '\'' )/g' /var/vcap/jobs/fluentd/packages/vrli-fluentd/plugins/out_loginsight_buffered.rb
        echo "done"
      fi

Apply the runtime config file to the BOSH Director, giving it the name runtime-fluentd:
```
bosh update-runtime-config --name runtime-fluentd runtime.yml
```
Check if the runtime config been set successfully:
```
bosh runtime-config  --name runtime-fluentd
```
The command output should list the contents of runtime.yml.
Upgrade the cluster with ephemeral storage overrun, to apply the new runtime config:
```
tkgi upgrade-cluster CLUSTER-NAME
```
Log in to a cluster worker or master node and check that the file /var/vcap/jobs/fluentd/packages/vrli-fluentd/plugins/out_loginsight_buffered.rb includes the following lines:
```
key.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: ' ' )
...
value.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: ' ' )
```
Check that no more logfiles are routing to the directory /tmp/fluent/backup/worker0 and no more error logs containing invalid byte sequence in UTF-8 are writing to /var/vcap/sys/log/fluentd.stdout.log.
(Optional) Delete the runtime config:
```
bosh delete-config --type runtime --name runtime-fluentd
```
Unless you delete it, the runtime config persists in the BOSH Director and is applied with TKGI updates and upgrades, and other operations that run bosh deploy. After you upgrade to a TKGI version that fixes this issue, leaving the runtime config in place is harmless, but it does leave an unneeded file on the BOSH Director filesystem and make BOSH execute an unnecessary script.

TKGI version upgrade without new stemcell fails for Containerd runtime clusters with Istio CNI

Symptom

On clusters configured to use a containerd registry and Istio CNI, upgrading the TKGI version without also upgrading the stemcell fails with errors kubelet cannot find istio-cni binary and nsx fails to recieve message header.

This error does not occur when you upgrade to a new stemcell along with the new TKGI version.

Explanation

When TKGI cluster upgrades and drains the node during upgrade, it leaves the cluster nodes’ Istio CNI agent and CNI configuration in a corrupted state.

If the cluster nodes are not automatically re-created by a stemcell change, the corrupted Istio CNI state remains.

Workaround

For clusters that use both Containerd and Istio CNI:

If you have already encountered this issue, re-create all worker nodes using the bosh recreate command:
1. Run the bosh vms command to list the cluster VMs:
```
bosh -d service-instance-DEPLOYMENT-ID vms
```
  Where DEPLOYMENT-ID is the BOSH-generated ID of your Kubernetes cluster deployment.
2. For each VM instance listed as worker/UUID in the output, run bosh recreate VM-NAME:
```
bosh -d service-instance-DEPLOYMENT-ID recreate worker/UUID
```
In the future, you can avoid this issue by upgrading a cluster’s stemcell whenever you upgrade its TKGI version.

NSX pod creation fails when using Tanzu Application Platform

Symptom

When you deploy a workload on a TKGI-provisioned cluster with NSX networking that is running Tanzu Application Platform (TAP), you see an error Failed to create pod sandbox and no resources are created in the cluster’s nsx-system namespace.

Explanation

The total number of Kubernetes object labels and other tags created by both TKGI and TAP can exceed the number that is allowed by NSX.

Workaround

Create or update your network profile as described in Creating and Managing Network Profiles (NSX Only), setting the cni_configurations parameter extensions.ncp.k8s.label_filtering_regex_list as described under label_filtering Settings.

Error Scaling Clusters when Compute Profile Lacks Node Pool

Symptom

When you scale up a cluster by passing --num-nodes to tkgi cluster-upgrade you see the error An error occurred in the PKS API.

The pks-api.log includes: Request processing failed; nested exception is java.lang.NullPointerException, which triggers within the nested ClusterService methods extractCustomizationNodePoolNames < validateComputeProfileUuidAndKubernetesWorkerInstances < updateCluster.

Explanation

The cluster’s Compute Profile lacks a node-pool specification, and setting --num-nodes for cluster updating does not work when the Compute Profile does not specify a node pool.

Workaround

Create and apply a new compute profile that specifies a node pool:

Create a new JSON compute profile definition that defines a node pool with a worker node instance count. For example, this code defines a compute profile cp-1 with target instance count 16:

cat cp-1.json
{
  "name": "cp-1",
  "description": "compute profile 1",
  "parameters": {
    "cluster_customization": {
      "control_plane": {
        "cpu": 2,
        "memory_in_mb": 8192,
        "persistent_disk_in_mb": 28240,
        "ephemeral_disk_in_mb": 28240,
        "instances": 3
      },
      "node_pools": [
        {
          "name": "pool-1",
          "description": "node pool 1",
          "instances": 16
        }
      ]
    }
  }
}

Apply the new compute profile to the cluster:
```
tkgi update-cluster my-cluster --compute-profile cp-1.json
```
When you apply the new profile, the worker nodes migrate from instance group worker to the new instance group, worker-pool-1. If the instance count exceeds the number specified in plan, TKGI adds instances to the new group before updating existing instances.

After the new node pool is created, you can scale the cluster by running:

tkgi update-cluster cluster-1 --node-pool-instances "POOL:COUNT"

Where POOL is the node pool name and COUNT is its instance count.

Wrong Node Scale after Updating Cluster with New Compute Profile

This issue is fixed in TKGI v1.17.5.

Symptom

For clusters created with or assigned a compute profile as described in Using Compute Profiles, scaling the cluster by updating its compute profile and then performing additional cluster update operations may leave the cluster with the wrong node count.

For example, after the following steps, the cluster may have a node count of 12 instead of 18 in its updated node pool:

Create cluster my-cluster with a compute profile cp-1 that defines a node pool pool-1 with 6 nodes.
Scale the cluster to 12 nodes by running tkgi update-cluster my-cluster --node-pool-instances "pool-1:12".
Create a new compute profile cp-2 that sets the pool-1 node pool to 18 nodes.
Update the cluster with the profile cp-2 by running tkgi update-cluster my-cluster --compute-profile cp-2.
Rotate the cluster’s certificates or perform other tkgi update-cluster operations.

After the last step, the cluster pool’s node count may erroneously revert to 12.

Workaround

After you update a cluster with a new compute profile that changes node counts, run tkgi update-cluster CLUSTER-NAME --node-pool-instances "NODEPOOL-NAME:NODE-COUNT" where NODE-COUNT is the new node count.

With the example above, after Step 3 run tkgi update-cluster my-cluster --node-pool-instances "pool-1:18".

Interoperability with Tanzu Mission Control is Unavailable

This issue is fixed by using the July 21, 2023 or later releases of Tanzu Mission Control.

Tanzu Mission Control (TMC) is not compatible with Kubernetes v1.26 at the time of the TKGI v1.17 release and temporarily cannot manage TKGI v1.17 Kubernetes clusters. Interoperability between TMC and TKGI v1.17 is expected at a later time.

Refer to the VMware Tanzu Mission Control Release Notes for an announcement of compatibility with Kubernetes v1.26.

Interoperability with VMware Aria Operations Management Pack for Kubernetes Is Unavailable

This issue has been resolved: VMware Aria Operations Management Pack for Kubernetes v2.0 provides interoperability with TKGI v1.17.

For more information, see the VMware Aria Operations for Integrations Release Notes.

Description

Interoperability with VMware Aria Operations Management Pack for Kubernetes is temporarily unavailable.

VMware Aria Operations Management Pack for Kubernetes is currently not compatible with TKGI v1.17. Interoperability between VMware Aria Operations Management Pack for Kubernetes and TKGI v1.17 is expected at a later time.

Updating cluster compute profile loses node drain and shutdown settings

This issue is fixed in TKGI v1.17.5.

Symptom

In the management console, when you update a Linux cluster that is created with a compute profile as described in Update Cluster Configuration, the Advanced Settings panel shows and applies incorrect defaults for node drain and pod shutdown grace period settings.

Workaround

Under Update Cluster, before you change the Compute Profile setting, click Show More to open the Advanced Settings. Record the current settings, and set them back if selecting a compute profile changes those settings.

Default settings are:

Node Train Timeout: 0
Pod Shutdown Grace Period: 10
Enabled: Force drain if externally-managed pods
Enabled: Force drain if DaemonSet-managed pods
Enabled: Force drain if pods using emptyDir
Disabled: Force drain if pods running after timeout{panel}

TKGI MC Unable to Manage TKGI after Restoring the TKGI Control Plane from Backup

Symptom

After you restore Ops Manager and the TKGI API VM from backup, TKGI functions normally, but your TKGI MC tabs include the following error: “…product ‘pivotal-container service’ is not deployed…”.

Explanation

TKGI MC is associated with an Ops Manager with a specific name. If you rename Ops Manager with a new name while restoring, your TKGI MC will not recognize the restored Ops Manager and cannot manage it.

vSphere CSI Failure When Backslash in User Name

Symptom

When installing or upgrading TKGI with vSphere Container Storage Plug-in (CSI) enabled, CSI pods fail with the error ErrImageNeverPull and cluster logs show the error unknown escape sequence.

Explanation

The CSI driver cannot correctly parse the vCenter username configuration setting if it contains a backslash (\) character.

Workaround

When entering a vCenter user name in the TKGI Configuration Wizard or Ops Manager tile, use the format user@domainname, for example: “[email protected]”. It cannot contain a backslash (\) character.

Error: Could Not Execute “Apply-Changes” in Azure Environment

Symptom

After clicking Apply Changes on the TKGI tile in an Azure environment, you experience an error ‘…could not execute “apply-changes”…’ with either of the following descriptions:

{“errors”:{“base”:[“undefined method ‘location’ for nil:NilClass”]}}
FailedError.new(“Resource Groups in region ‘#{location}’ do not support Availability Zones”))

For example:

INFO | 2020-09-21 03:46:49 +0000 | Vessel::Workflows::Installer#run | Install product (apply changes)
2020/09/21 03:47:02 could not execute "apply-changes": installation failed to trigger: request failed: unexpected response from /api/v0/installations:
HTTP/1.1 500 Internal Server Error
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Mon, 21 Sep 2020 17:51:50 GMT
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Pragma: no-cache
Referrer-Policy: strict-origin-when-cross-origin
Server: Ops Manager
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: f5fc99c1-21a7-45c3-7f39
X-Runtime: 9.905591
X-Xss-Protection: 1; mode=block

44
{"errors":{"base":["undefined method `location' for nil:NilClass"]}}
0

Explanation

The Azure CPI endpoint used by Ops Manager has been changed and your installed version of Ops Manager is not compatible with the new endpoint.

Workaround

Run the following Ops Manager CLI command:

om --skip-ssl-validation --username USERNAME --password PASSWORD --target https://OPSMAN-API curl --silent --path /api/v0/staged/director/verifiers/install_time/IaasConfigurationVerifier -x PUT -d '{ "enabled": false }'

Where:

USERNAME is the account to use to run Ops Manager API commands.
PASSWORD is the password for the account.
OPSMAN-API is the IP address for the Ops Manager API

For more information, see Error ‘undefined method location’ is received when running Apply Change on Azure in the VMware Tanzu Knowledge Base.

VMware vRealize Operations Does Not Support Windows Worker-Based Kubernetes Clusters

VMware vRealize Operations (vROPs) does not support Windows worker-based Kubernetes clusters and cannot be used to manage TKGI-provisioned Windows workers.

TKGI Wavefront Requires Manual Installation for Windows Workers

To monitor Windows-based worker node clusters with a Wavefront collector and proxy, you must first install Wavefront on the clusters manually, using Helm. For instructions, see the Wavefront section of the Monitoring Windows Worker Clusters and Nodes topic.

Pinging Windows Worker Kubernetes Clusters Does Not Work

TKGI-provisioned Windows worker-based Kubernetes clusters inherit a Kubernetes limitation that prevents outbound ICMP communication from workers. As a result, pinging Windows workers does not work.

For information about this limitation, see Limitations > Networking in the Windows in Kubernetes documentation.

Velero Does Not Support Backing Up Stateful Windows Workloads

You can use Velero to back up stateless TKGI-provisioned Windows workers only. You cannot use Velero to back up stateful Windows applications. For more information, see Velero on Windows in Basic Install in the Velero documentation.

Tanzu Mission Control Integration Not Supported on GCP

TKGI on Google Cloud Platform (GCP) does not support Tanzu Mission Control (TMC) integration, which is configured in the Tanzu Kubernetes Grid Integrated Edition tile > the Tanzu Mission Control pane.

If you intend to run TKGI on GCP, skip this pane when configuring the Tanzu Kubernetes Grid Integrated Edition tile.

TMC Data Protection Feature Requires Privileged TKGI Containers

TMC Data Protection feature supports privileged TKGI containers only. For more information, see Plans in the Installing TKGI topic for your IaaS.

Windows Worker Kubernetes Clusters with Group Managed Service Account Do Not Support Compute Profiles

Windows worker-based Kubernetes clusters integrated with group Managed Service Account (gMSA) cannot be managed using compute profiles.

Windows Worker Kubernetes Clusters on Flannel Do Not Support Compute Profiles

On vSphere with NSX-T networking you can use compute profiles with both Linux and Windows worker‑based Kubernetes clusters. On vSphere with Flannel networking, you can apply compute profiles only to Linux clusters.

TKGI CLI Does Not Prevent Reducing the Control Plane Node Count

TKGI CLI does not prevent accidentally reducing a cluster’s control plane node count using a compute profile.

Warning: Reducing a cluster’s control plane node count can destroy the cluster. Do not scale out or scale in existing control plane nodes by reconfiguring the TKGI tile or by using a compute profile. Reducing a cluster’s number of control plane nodes might remove a control plane node and cause the cluster to become inactive.

Windows Cluster Nodes Not Deleted After VM Deleted

Symptom

After you delete a VM using the management console of your infrastructure provider, you notice a Windows worker node that had been on that VM is now in a notReady state.

Solution

To identify the leftover node:
```
kubectl get no -o wide
```
Locate nodes on the returned list that are in a notReady state and have the same IP address as another node in the list.
To manually delete a notReady node:
```
kubectl delete node NODE-NAME
```
Where NODE-NAME is the name of the node in the notReady state.

502 Bad Gateway After OIDC Login

Symptom

You experience a “502 Bad Gateway” error from the NSX load balancer after you log in to OIDC.

Explanation

A large response header has exceeded your NSX-T load balancer maximum response header size. The default maximum response header size is 10,240 characters and should be resized to 16,384.

Workaround

If you experience this issue, manually reconfigure your NSX-T request_header_size to 4096 characters and your response_header_size to 16384. For information about configuring NSX default header sizes, see OIDC Response Header Overflow in the Knowledge Base.

Difficulty Changing Proxy for Windows Workers

You must configure a global proxy in the Tanzu Kubernetes Grid Integrated Edition tile > Networking pane before you create any Windows workers that use the proxy.

You cannot change the proxy configuration for Windows workers in an existing cluster.

Character Limitations in HTTP Proxy Password

For vSphere with NSX-T, the HTTP Proxy password field does not support the following special characters: & or ;.

Error After Modifying Your Harbor Storage Configuration

Symptom

You receive the following error after modifying your existing Harbor installation’s storage configuration:

Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown

Explanation

Harbor does not support modifying an existing Harbor installation’s storage configuration.

Workaround

To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.

Ingress Controller Statefulset Fails to Start After Resizing Worker Nodes

Symptom

Permissions are removed from your cluster’s files and processes after resizing the persistent disk during a cluster upgrade. The ingress controller statefulset fails to start.

Explanation

When resizing a persistent disk, Bosh migrates the data from the old disk to the new disk but does not copy the files’ extended attributes.

Workaround

To resolve the problem, complete the steps in [Ingress controller statefulset fails to start after resize of worker nodes with permission denied] (https://knowledge.broadcom.com/external/article/298618/) in the Broadcom Support Knowledge Base.

Azure Default Security Group Is Not Automatically Assigned to Cluster VMs

Symptom

You experience issues when configuring a load balancer for a multi-control plane node Kubernetes cluster or creating a service of type LoadBalancer. Additionally, in the Azure portal, the VM > Networking page does not display any inbound and outbound traffic rules for your cluster VMs.

Explanation

As part of configuring the Tanzu Kubernetes Grid Integrated Edition tile for Azure, you enter Default Security Group in the Kubernetes Cloud Provider pane. When you create a Kubernetes cluster, Tanzu Kubernetes Grid Integrated Edition automatically assigns this security group to each VM in the cluster. However, on Azure the automatic assignment might not occur.

As a result, your inbound and outbound traffic rules defined in the security group are not applied to the cluster VMs.

Workaround

If you experience this issue, manually assign the default security group to each VM NIC in your cluster.

One Plan ID Longer than Other Plan IDs

Symptom

One of your plan IDs is one character longer than your other plan IDs.

Explanation

In TKGI, each plan has a unique plan ID. A plan ID is normally a UUID consisting of 32 alphanumeric characters and 4 hyphens. However, the Plan 4 ID consists of 33 alphanumeric characters and 4 hyphens.

Solution

You can safely configure and use Plan 4. The length of the Plan 4 ID does not affect the functionality of Plan 4 clusters.

If you require all plan IDs to have identical length, do not activate or use Plan 4.

Database Cluster Stops After a Database Instance is Stopped

Symptom

After you stop one instance in a multiple-instance database cluster, the cluster stops, or communication between the remaining databases times out, and the entire cluster becomes unreachable.

The following might be in your UAA log:

WSREP has not yet prepared node for application use

Explanation

The database cluster is unable to recover automatically because a member is no longer available to reconcile quorum.

Velero Back Up Fails for vSphere PVs Attached to Clusters on Kubernetes v1.20 and Later

Symptom

Backing up vSphere persistent volumes using Velero fails and your Velero backup log includes the following error:

rpc error: code = Unknown desc = Failed during IsObjectBlocked check: Could not translate selfLink to CRD name

Explanation

This is a known issue when backing up clusters on Kubernetes v1.20 and later using the Velero Plugin for vSphere v1.1.0 or earlier.

Workaround

To resolve the problem, complete the steps in Velero backups of vSphere persistent volumes fail on Kubernetes clusters version 1.20 or higher (83314) in the VMware Tanzu Knowledge Base.

Creating Two Windows Clusters at the Same Time Fails

Symptom

The first time that you try to create two Windows clusters at the same time, the creation of one of the clusters fails. If you run pks cluster CLUSTER-NAME to examine the last action taken on the cluster, you see the following:

 Last Action: Create Last Action State: failed Last Action Description: Instance provisioning failed: There was a problem completing your request. … operation: create, error-message: Failed to acquire lock … locking task id is 111, description: ‘create deployment’

Explanation

This is a known issue that occurs the first time that you create two Windows clusters concurrently.

Workaround

Recreate the failed cluster. This issue only occurs the first time that you create two Windows clusters concurrently.

Deleted Clusters are Listed in Cluster Lists

Symptom

After running tkgi delete-cluster and cluster deletion has completed, the deleted cluster continues to be listed when running tkgi clusters.

Workaround

You must manually remove the deleted cluster using a customized version of the ncp_cleanup script. For more information, see Deleting a Tanzu Kubernetes Grid Integrated Edition cluster with “tkgi delete-cluster” stuck “in progress” status in the Broadcom Support Knowledge Base.

BOSH Director Logs the Error ‘Duplicate vm extension name’

Symptom

After you uninstall TKGI, then reinstall TKGI in the same environment, BOSH Director logs errors similar to the following:

.../gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/cloud_manifest_parser.rb:120:in `parse_vm_extensions': Duplicate vm extension name 'disk_enable_uuid' (Bosh::Director::DeploymentDuplicateVmExtensionName)

Explanation

The pivotal-container-service cloud-config was not removed when you uninstalled the TKGI tile, and it remained active. When you reinstalled the TKGI tile, an additional pivotal-container-service cloud-config was created, causing the metrics_server to fall into a crash-loop state.

Workaround

You must manually remove the pivotal-container-service cloud-config after removing your TKGI deployment, including after removing the TKGI tile from Ops Manager.

For more information, see “Duplicate vm extension name” error when metrics_server runs on Director VM in Tanzu Kubernetes Grid Integrated Edition in the VMware Tanzu Community Knowledge Base.

The TKGI API FQDN Must Not Include Trailing Whitespace

Symptom

Your TKGI logs include the following error:

'uaa'. Errors are:- Error filling in template 'uaa.yml.erb' (line 59: Client redirect-uri is invalid: uaa.clients.pks_cli.redirect-uri Client redirect-uri is invalid: uaa.clients.pks_cluster_client.redirect-uri)

Explanation

The TKGI API fully-qualified domain name (FQDN) for your cluster contains leading or trailing whitespace.

Workaround

Do not include whitespace in the TKGI tile API Hostname (FQDN) field.

TMC Cluster Data Protection Backup Fails After Upgrading TKGI

The TMC Cluster Data Protection Backup fails in TKGI environments upgraded from an earlier version.

Symptom

The TMC Cluster Data Protection Backup fails to back up your existing clusters and logs the following error:

error executing custom action (groupResource=customresourcedefinitions.apiextensions.k8s.io, namespace=, name=ncpconfigs.nsx.vmware.com): rpc error: code = Unknown desc = error fetching v1beta1 version of ncpconfigs.nsx.vmware.com: the server could not find the requested resource

Explanation

Kubernetes v1.22 disallows the spec.preserveUnknownFields: true configuration in your existing clusters and the creation of a v1 CustomResourceDefinitions configuration fails.

TMC Cluster Data Protection Restore Fails When Using Antrea CNI

The TMC Cluster Data Protection Restore operation can fail when restoring multiple Antea resources.

Symptom

The TMC Cluster Data Protection Restore fails and logs errors that requests to restore the admission webhook have been denied.

Explanation

Velero has encountered a race condition while operating a resource. For more information, see Allow customizing restore order for Kubernetes controllers and their managed resources in the Velero GitHub repository.

TKGI Does Not Support CVDS / NVDS Mixed Environments

TKGI does not support environments where there are multiple matching networks, such as a mixed CVDS/NVDS environment.

Symptom

TKGI logs errors similar to the following in an environment with multiple matching networks:

LastOperationstatus='failed', description='Instance provisioning failed:
There was a problem completing your request. Please contact your operations team providing the following information:
service: p.pks, service-instance-guid: ..., broker-request-id: ..., task-id: ..., operation: create,
error-message: Unknown CPI error 'Unknown' with message 'undefined method `mob' for <VimSdk::Vim::OpaqueNetwork:' in create_vm' CPI method

Explanation

TKGI cannot identify which of the matching networks you intend to use and has selected the wrong network.

Occasionally update-cluster Does Not Complete for Windows Workers

Occasionally, tkgi update-cluster hangs while updating a Windows worker node instance and the BOSH task cannot finish and exits.

Symptom

The ovsdb-server service has stopped but other processes report that it is running.

Explanation

The ovsdb-server.pid file uses the pid for a process that is not the ovsdb-server.

To confirm that this is the root cause for tkgi update-cluster to hang:

To verify the ovsdb-server service has actually stopped, run the PowerShell Get-services command on the Windows worker node.

To verify that other processes report the ovsdb-server service is still running:

Review the ovsdb-server job-service-wrapper.err.log log file.
The job-service-wrapper.err.log log file is located at:
```
C:\var\vcap\sys\log\openvswitch-windows\ovsdb-server\job-service-wrapper.err.log
```

Confirm that after the flushing processes, the log includes an error similar to the following:

Pid-Guard : ovsdb-server is already runing, please stop it first
At C:\var\vcap\jobs\openvswitch-windows\bin\ovsdb-server_ctl.ps1:30 char:5
+     Pid-Guard $PIDFILE "ovsdb-server"
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: ( [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Pid-Guard

To verify the root cause:

Run the following PowerShell commands on the Windows worker node:

$RUN_DIR = "C:\var\vcap\sys\run\openvswitch-windows"
$PIDFILE = "$RUN_DIR\ovsdb-server.pid"
$pid1 = Get-Content $PidFile -First 1
echo $pid1
$rst = Get-Process -Id $pid1 -ErrorAction SilentlyContinue
echo $rst

Confirm the returned ProcessName is not ovsdb-server.

Workaround

To resolve this issue for a single Windows worker:

SSH to the affected worker node.

Run the following:

rm C:\var\vcap\sys\run\openvswitch-windows\ovsdb-server.pid

Wait for the ovsdb-server process to start.
Confirm the dependent services also start.

Harbor Private Projects Are Inaccessible after Upgrading to TKGI v1.13.0

If LDAP is enabled, Harbor private projects are inaccessible after upgrading to TKGI v1.13.0. For more information, see Private projects become inaccessible after upgrading Harbor for TKGI to v2.4.x with LDAP feature enabled in the Broadcom Support Knowledge Base.

Deployments Fail on TKGI Windows Worker-based Kubernetes Clusters after the January 2022 Microsoft Windows Security Patch

Microsoft changed Microsoft Windows’ support for tar file commands in the January 2022 Microsoft Windows security patch.

Packaging scripts that use tar commands for Windows worker-based Kubernetes Cluster deployments can fail after the Microsoft tar command patch update has been applied.

The BOSH agent used by vSphere stemcells built by stembuild v2019.43 and earlier use tar commands that are no longer supported and will fail if the Microsoft Windows security patch has been applied.

Workaround

stembuild v2019.44 and later include a version of the BOSH agent that does not use unsupported tar commands.

If you use vSphere stemcells, use stembuild 2019.44 or later to avoid the BOSH agent tar error.

TKGI Clusters Fail after NSX Upgrade If They Use NSGroup Policy API Resources

TKGI supports clusters that use NSGroup Policy API resources, but Policy API NSGroups created in one NSX version will be empty after upgrading NSX to a newer version.

Workaround

BOSH reconfigures a deployment’s NSGroup members if the deployment is redeployed.

After upgrading NSX, redeploy affected deployments to reconfigure their NSGroup members:

Re-Apply Changes on the Ops Manager UI to redeploy TKGI tile deployments.
Re-deploy the affected cluster deployments.

Rotating NSX certificates fails after migrating to NSX Policy API

This issue is fixed in TKGI v1.20.0.

After migrating from NSX Management Plane API to NSX Policy API, rotating NSX certificates sometimes fails due to a mismatch between policy display name and ID.

Symptom

Running tkgi rotate-certificates CLUSTER --non-interactive --only-nsx results in the following error seen in the pks-api logs:

```
Failed to retrieve certificate of display name pks-f5703ad0-1af1-402a-8f77-8a0cb52fea58
2024-06-13 14:16:21.749 ERROR 278082 — [nio-9021-exec-8] i.p.pks.cluster.CertificateService : Unknown error occurred rotating nsx certs
```

Explanation:

When TKGI first creates a cluster, it names its NSX certificates following the pattern pks-CLUSTER-ID, as both a display name and an internal name.

TKGI v1.14 and prior had a known issue: Rotating a cluster’s NSX certificates saved the new certificates under an autogenerated internal name, a GUID without a pks- prefix, and did not retain the cert’s display name.

When you migrate a cluster NSX Policy API, its NSX certificate is saved as a policy object with its name set to the certificate’s internal name.

The certificate rotation process retrieves certificates by their display name, so it cannot find certificates rotated in TKGI v1.14 and prior.

Workaround

See How to rotate Tanzu Kubernetes Grid Integrated Edition tls-nsx-t cluster certificate in the Broadcom Support KB.

API Server Audit Logs Leak Tokens

This issue is fixed in TKGI v1.17.0.

The API Server audit logs include clear-text tokens on clusters master nodes that use the default audit policy.

Description

JSON Web Tokens in the tokenrequests API body are being written to the API Server audit log /var/vcap/sys/log/kube-apiserver/audit/log on TKGI clusters.

TKGI Does Not Support the Antrea Egress Feature on AWS

This issue is fixed in TKGI v1.17.2.

In AWS environments, TKGI does not support the Antrea CNI Egress feature. For example, the Egress resource egressIP and externalIPPool fields in an antrea-config configuration are ignored for clusters on AWS, including both single-AZ and multiple-AZ clusters. For more information about the Antra Egress feature, see What is Egress in the Antrea documentation.

Note: You must grant the AWS Worker Instance Profile additional AWS Identity and Access Management (IAM) permissions before using the Antrea Egress feature with worker nodes on AWS. For more information, see Prepare AWS Worker Instance Profile Permissions in General Troubleshooting.

Cluster Might Fail to Send the `cluster_name` Tag to Logging after Cluster Upgrade

This issue is fixed in TKGI v1.17.1.

Occasionally, a cluster might fail to send the cluster_name tag to logging after being upgraded.

Description

After upgrading a cluster, the Name record_modifier filter will occasionally be missing from the cluster’s fluent-bit ConfigMap, and the cluster_name is not included in log entries. This problem occurs if the sink-controller process configures the cluster before the observability-manager starts, which overwrites the desired configuration.

Cluster Update Operations Fail Due to Duplicate Tag Keys

This issue is fixed in TKGI v1.17.2.

The cluster update operations fail if you reuse the same key in different tags in a cluster.

Description

Tag keys must be unique within a cluster, for example, key1:value1, key2:value2. TKGI does not prevent you from reusing the same key for multiple tags in a cluster, for example, key1:value1, key1:value2. However, the cluster update operations fail.

Workaround

Use different keys for the tags within a cluster, for example, key1:value1, key2:value2. For more information, see Tagging Rules.

Node Drain Operation Ignores the TKGI Deployment Plan Settings

This issue is fixed in TKGI v1.17.2.

When upgrading a cluster, the node drain operation ignores the pod shutdown grace period specified in the deployment plan on the TKGI Tile.

Pods on NSX v3.2.3 Can Enter a NotReady State

When TKGI is deployed on NSX v3.2.3 and there are large numbers of pods with liveness probes, the pods on TKGI-provisioned clusters can enter a NotReady state.

Symptom

In addition to your pods being NotReady, if you restart NSX Manager:

Your NSX API logs include numerous repetitions of "POST /nsxapi/api/v1/firewall/sections/.../rules?operation=insert_bottom HTTP/1.1" ....

Your NCP logs include errors similar to:

"nsx-container-ncp" subcomp="ncp" level="ERROR" security="True" errorCode="NCP00034"] nsx_ujo.ncp.nsx.manager.firewall_service Failed to create health check rule for port ...: Service cluster: 'https://nsx-manager.example.com' is unavailable. Please, check NSX setup and/or configuration.

Description

As pods are created or deleted, DFW firewall rules are replicated for the pod’s liveness probe. In NSX v3.2.3, the firewall rules are unintentionally duplicated during this replication. After numerous pod creation/deletion events, the compounded duplication creates a DFW firewall section large enough to create noticeable delays during pod operations and, eventually, a pod NotReady state.

Workaround

Upgrade NSX to a version that includes the fix, namely 3.2.4 or 4.1.1 or later.

TKGI Management Console v1.17.0

Release Date: August 3, 2023

Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI.

Product Snapshot

Note: The component versions supported by TKGI Management Console might differ from or be more limited than the versions supported by TKGI.

Element	Details
Version	v1.17.0
Release date	August 3, 2023
Installed TKGI version	v1.17.0
Installed Ops Manager version	v3.0.13*	Release Notes
Component	Version
Installed Kubernetes version	v1.26.5*	Release Notes
Installed Harbor Registry version	v2.8.2*	Release Notes
Ubuntu Jammy stemcell	v1.179*	Release Notes

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.0 are from TKGI MC v1.16.2 and earlier TKGI v1.16 patches.

Breaking Changes

Features and Resolved Issues

TKGI Management Console v1.17.0 includes the following features:

Upgrades the TKGI MC database from MySQL v5.7 to MySQL v8. For information about the differences between MySQL v5.7 to MySQL v8, see MySQL Server Version Reference in the MySQL documentation.
Prevents deploying multiple TKGI instances on a vCenter Server. By using the TKGI MC, you can now deploy only one instance of TKGI on a vCenter Server.

TKGI Management Console v1.17.0 resolves the following issues:

Deprecations

The following TKGI features have been deprecated or removed from TKGI Management Console v1.17:

Known Issues

The Tanzu Kubernetes Grid Integrated Edition Management Console v1.17.0 has the following known issues:

Cannot upgrade after rotating Ops Manager CA

This issue is fixed in TKGI v1.17.6.

Symptom

With TKGI deployed by the Management Console (MC), after you rotate the Ops Manager CA certificate, you cannot upgrade Tanzu Kubernetes Grid Integrated Edition. The upgrade fails with errors that the MC cannot access BOSH:

Error GetInstanceByID: cannot get BOSH client: 
[...]
Get [https://10.110.93.3:25555/info|https://10.110.93.3:25555/info/]: x509: certificate signed by unknown authority

Workaround

Immediately after you rotate the Ops Manager CA, run the Tanzu Kubernetes Grid Integrated Edition MC Configuration Wizard, step through the configuration panes, run Generate Configuration, and then run Apply Configuration.

Wrong cluster Floating IP pools after TKGI upgrade with Management Console in PNAT mode

This issue is fixed in TKGI v1.17.6.

Symptom

On TKGI deployments on which users have updated cluster IP ranges using the NSX Manager instead of TKGI network profiles, after TKGI upgrade via the Management Console running in PNAT mode, clusters fail with network connection errors. NCP logs list NSX configuration errors Resource could not be found for IpPool.

Explanation

During TKGI upgrade, the Management Console does not check whether cluster IP Pools have been updated at the underlying NSX layer, and instead re-applies the IP pool settings as configured in TKGI. This causes an IP pool mismatch between TKGI and NSX.

Workaround

Contact Support for scripts that reallocate IP addresses to the cluster’s current floating IP pool, release unused addresses, and delete stale IP pools.

To avoid this issue, update cluster IP pools via TKGI network profiles rather than in NSX Manager.

Cannot log in to the TKGI MC after restarting Ops Manager v3 VM from vSphere

This issue is fixed in TKGI v1.17.4.

Symptom

With TKGI deployed by the Management Console (MC), after you restart the Ops Manager v3 VM from vSphere, you cannot log in to the MC, and the MC can no longer communicate with Ops Manager.

Workaround

First, determine if the Ops Manager VM address changed when you restarted it. If so, the following steps will restore its original IP address.

The MC deploys Ops Manager to the first IP in TKGI’s Deployment CIDR range, following the Gateway address, as configured under Network Resources in the MC. You can also retrieve the Deployment CIDR from the TKGI tile in Ops Manager, under Networks > Subnets > CIDR and Gateway. If the Gateway address ends in 1 and is the first in the CIDR, the Ops Manager address ends in 2.

If the Ops Manager IP shown in vSphere is not the first IP in the Deployment CIDR range for TKGI, its address has changed.

If the Ops Manager VM address has changed, remove its networkd service and restart the VM so that its networking service picks up the correct, static IP from its OVF settings:

From vSphere, ssh in to the Ops Manager VM.
Run sudo mv /usr/lib/systemd/system/*networkd* /root/
Restart the Ops Manager VM from the vSphere, and check its IP address to see if it has changed back to the correct address, the first IP in the TKGI Deployment CIDR.
If the Ops Manager VM still does not have the correct IP address, continue with the following steps.
From vSphere, power off the Ops Manager VM.
Update the Ops Manager VM’s network settings to specify a network that supports DHCP, and record its previous network settings.
- You will need to ssh in to the Ops Manager VM again, so if it lacks a DHCP network adapter you need some other way to access it. For example: mount its disk to another VM, inject an Ubuntu user password, and then use the vSphere GUI to power on the Ops Manager VM and log in with the new password.
Power on the Ops Manager VM. It should have an IP address assigned by DHCP.
ssh to the Ops Manager VM.
Run sudo mv /usr/lib/systemd/system/*networkd* /root/
Power off the Ops Manager VM.
Update the VM’s network settings again to revert to its previous network configuration.
Power on the Ops Manager VM again. It should now have the correct static IP.

vRealize Log Insight Integration Does Not Support HTTPS Connections

Symptom

The Tanzu Kubernetes Grid Integrated Edition Management Console integration to vRealize Log Insight does not support connections to the HTTPS port on the vRealize Log Insight server.

Workaround

Use SSH to log in to the Tanzu Kubernetes Grid Integrated Edition Management Console appliance VM.
Open the file /lib/systemd/system/pks-loginsight.service in a text editor.
Add -e LOG_SERVER_ENABLE_SSL_VERIFY=false.

Set -e LOG_SERVER_USE_SSL=true.

The resulting file should look like the following example:

ExecStart=/bin/docker run --privileged --restart=always --network=pks
-v /var/log/journal:/var/log/journal
--name=pks-loginsight
-e TYPE=gear2-vm
-e LOG_SERVER_HOST=${LOGINSIGHT_HOST}
-e LOG_SERVER_PORT=${LOGINSIGHT_PORT}
-e LOG_SERVER_ENABLE_SSL_VERIFY=false
-e LOG_SERVER_USE_SSL=true
-e LOG_SERVER_AGENT_ID=${LOGINSIGHT_ID}
pksoctopus/vrli-journald:v07092019

Save the file and run systemctl daemon-reload.
To restart the vRealize Log Insight service, run systemctl restart pks-loginsight.service.

Tanzu Kubernetes Grid Integrated Edition Management Console can now send logs to the HTTPS port on the vRealize Log Insight server.

vSphere HA causes Management Console ovfenv Data Corruption

Symptom

If you enable vSphere HA on a cluster, if the TKGI Management Console appliance VM is running on a host in that cluster, and if the host reboots, vSphere HA recreates a new TKGI Management Console appliance VM on another host in the cluster. Due to an issue with vSphere HA, the ovfenv data for the newly created appliance VM is corrupted and the new appliance VM does not boot up with the correct network configuration.

Workaround

In the vSphere Client, right-click the appliance VM and select Power > Shut Down Guest OS.
Right-click the appliance again and select Edit Settings.
Select VM Options and click OK.
Verify under Recent Tasks that a Reconfigure virtual machine task has run on the appliance VM.
Power on the appliance VM.

Base64 encoded file arguments are not decoded in Kubernetes profiles

Symptom

Some file arguments in Kubernetes profiles are base64 encoded. When the management console displays the Kubernetes profile, some file arguments are not decoded.

Workaround

Run echo "$content" | base64 --decode

Network profiles not immediately selectable

Symptom

If you create network profiles and then try to apply them in the Create Cluster page, the new profiles are not available for selection.

Workaround

Log out of the management console and log back in again.

Real-Time IP information not displayed for network profiles

Symptom

In the cluster summary page, only default IP pool, pod IP block, node IP block values are displayed, rather than the real-time values from the associated network profile.

Workaround

None

Error After Modifying Your Harbor Storage Configuration

Symptom

You receive the following error after modifying your existing Harbor installation’s storage configuration:

Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown

Explanation

Harbor does not support modifying an existing Harbor installation’s storage configuration.

Workaround

To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.

Windows Stemcells Must be Re-Imported After Upgrading Ops Manager

Symptom

After upgrading Ops Manager, your Management Console does not recognize a Windows stemcell imported when using the prior version of Ops Manager.

Workaround

If your Management Console does not recognize a Windows stemcell after upgrading Ops Manager:

Re-import your previously imported Windows stemcell.
Apply Changes to TKGI MC.

Your New Clusters Are Not Shown In Tanzu Mission Control

Symptom

After you create a cluster, Tanzu Mission Control does not include the cluster in cluster lists. You have a “Resource not found” error similar to the following in your BOSH logs:

Cluster Name in TMC: cluster-1
Cluster Name Prefix: tkgi-my-prefix-
Group Name in TMC: my-prefix-clusters
Cluster Description in TMC: VMware Enterprise PKS Attaching cluster ''tkgi-my-prefix-cluster-1'' to TMC
Fetching token successful
request POST:/v1alpha1/clusters,
response 404 Not Found:{"error":"Resource not found - clustergroup(my-prefix-clusters)
org id(d859dc9f-g622-426d-8c91-939a9f13dea9)",
"code":5,"message":"Resource not found - clustergroup(my-prefix-clusters)

Explanation

The cluster group you assign a cluster to must be defined in Tanzu Mission Control before you assign your cluster to the cluster group in the TKGI Management Console.

Workaround

To resolve the problem, complete the steps in Attaching a Tanzu Kubernetes Grid Integrated (TKGI) cluster to Tanzu Mission Control (TMC) fails with “Resource not found - clustergroup(cluster-group-name)” in the VMware Tanzu Knowledge Base.

TKGI v1.17.6

Product Snapshot

Release Details

Internal Component Versions

Stemcell Compatibility

Interoperability

Upgrade Path

Breaking Changes

Features and Enhancements

Resolved Issues

Known Issues

TKGI Management Console v1.17.6

Product Snapshot

Upgrade Path

Features and Resolved Issues

Known Issues

TKGI v1.17.5

Product Snapshot

Release Details

Internal Component Versions

Stemcell Compatibility

Interoperability

Upgrade Path

Breaking Changes

Features and Enhancements

Resolved Issues

Known Issues

TKGI Management Console v1.17.5

Product Snapshot

Upgrade Path

Features and Resolved Issues

Known Issues

TKGI v1.17.4

Product Snapshot

Release Details

Internal Component Versions

Stemcell Compatibility

Interoperability

Upgrade Path

Breaking Changes

Features and Enhancements

Resolved Issues

Known Issues

TKGI Management Console v1.17.4

Product Snapshot

Upgrade Path

Features and Resolved Issues

Known Issues

TKGI v1.17.3

Product Snapshot

Release Details

Internal Component Versions

Stemcell Compatibility

Interoperability

Upgrade Path

Breaking Changes

Features and Enhancements

Resolved Issues

Known Issues

TKGI Management Console v1.17.3

Product Snapshot

Upgrade Path

Features and Resolved Issues

Known Issues

TKGI v1.17.2

Product Snapshot

Release Details

Internal Component Versions

Stemcell Compatibility

Interoperability

Upgrade Path

Breaking Changes

Features and Enhancements

Resolved Issues

Known Issues

TKGI Management Console v1.17.2

Product Snapshot

Upgrade Path

Features and Resolved Issues

Known Issues