Release Notes

This topic contains release notes for Tanzu Kubernetes Grid Integrated Edition (TKGI) v1.14.

TKGI v1.14.7

Release Date: June 20, 2023

Product Snapshot

Release	Details
Version	v1.14.7
Release date	June 20, 2023
Component	Version
Antrea	v1.4.0	Release Notes
cAdvisor	v0.39.1
Containerd	Linux: v1.6.18 Windows: v1.6.18
CoreDNS	v1.8.7+vmware.3
CSI Driver for vSphere	v2.5.4	Release Notes
Docker	Linux: v20.10.9 Windows: v20.10.9
etcd	v3.5.6
Harbor	v2.7.2*	Release Notes
Kubernetes	v1.23.17	Release Notes
Metrics Server	v0.5.0
NCP	v3.2.1.7*
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v5.7.38-31.59 pxc-release: v0.44.0	Release Notes: PXC pxc-release
UAA	v74.5.76*
Velero	v1.8.1	Release Notes
VMware Cloud Foundation (VCF)	v4.4** and v4.3.1	Release Notes: v4.4, v4.3.1
Wavefront	Wavefront Collector: v1.9.0 Wavefront Proxy: v10.14
Compatibilities	Versions
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
NSX-T	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.
Windows stemcells	v2019.61* or later
Xenial stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.

* Components marked with an asterisk have been updated.
** VCF v4.4 is supported but has not been tested with TKGI v1.14.7.
*** To use Policy API features, you must use NSX-T v3.1.3 or later.

Upgrade Path

The supported upgrade paths to TKGI v1.14.7 are from TKGI v1.14.6 and earlier TKGI v1.14 patches, and from TKGI v1.13.10 and earlier TKGI v1.13 patches.

Features and Enhancements

TKGI v1.14.7 includes the following enhancements:

Supports resizing clusters that have not been upgraded to the current TKGI control plane version. For more information, see Tasks Supported Following a TKGI Control Plane Upgrade in About TKGI Upgrades.

Resolved Issues

TKGI v1.14.7 resolves the following issues:

Fixes HTTPS Ingress Outage During VMware NSX Certificate Rotation.
Fixes The Validator Secret Certificate Is Not Rotated.
Fixes NullPointerException error when creating a Compute Profile configured with instances: 0 and the max_worker_instances parameter.
Fixes TKGI Certificate Rotation Might Remove NSX Ingress Certificates from TKGI.
Fixes Rotated TKGI Certificates Remain Listed as Expiring on the Ops Manager Certificates List.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.14.6 are also in Tanzu Kubernetes Grid Integrated Edition v1.14.7. See the TKGI v1.14.6 Known Issues below.

TKGI Management Console v1.14.7

Release Date: June 20, 2023

Note: Tanzu Kubernetes Grid Integrated Edition Management Console provides an opinionated installation of TKGI. The supported versions might differ from or be more limited than what is generally supported by TKGI.

Product Snapshot

Element	Details
Version	v1.14.7
Release date	June 20, 2023
Installed Tanzu Kubernetes Grid Integrated Edition version	v1.14.7
Installed Ops Manager version	v2.10.58	Release Notes
Component	Version
Installed Kubernetes version	v1.23.17	Release Notes
Installed Harbor Registry version	v2.7.2*	Release Notes
Linux stemcell	v621.561*

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.7 are from TKGI MC v1.14.6 and earlier TKGI MC v1.14 patches, and from TKGI MC v1.13.10 and earlier TKGI MC v1.13 patches.

Features and Resolved Issues

TKGI Management Console v1.14.7 includes the following enhancements:

Prevents deploying multiple TKGI instances on a vCenter Server. By using the TKGI MC, you can now deploy only one instance of TKGI on a vCenter Server.

TKGI Management Console v1.14.7 resolves the following issues:

Fixes Previous nsx-t-superuser-certificate Is Restored during TKGI MC Upgrade.
Fixes TKGI MC Unable to Create a Network Profile Configured with Source IP Ingress Persistence.

Deprecations

For information about upcoming deprecations, see Deprecations in the TKGI MC v1.14.0 Release Notes below.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.6 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.7. See the TKGI MC v1.14.6 Known Issues below.

TKGI v1.14.6

Release Date: April 18, 2023

Product Snapshot

Release	Details
Version	v1.14.6
Release date	April 18, 2023
Component	Version
Antrea	v1.4.0	Release Notes
cAdvisor	v0.39.1
Containerd	Linux: v1.6.18* Windows: v1.6.18*
CoreDNS	v1.8.7+vmware.3
CSI Driver for vSphere	v2.5.4	Release Notes
Docker	Linux: v20.10.9 Windows: v20.10.9
etcd	v3.5.6
Harbor	v2.7.1*	Release Notes
Kubernetes	v1.23.17*	Release Notes
Metrics Server	v0.5.0
NCP	v3.2.1.6
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v5.7.38-31.59 pxc-release: v0.44.0	Release Notes: PXC pxc-release
UAA	v74.5.66*
Velero	v1.8.1	Release Notes
VMware Cloud Foundation (VCF)	v4.4** and v4.3.1	Release Notes: v4.4, v4.3.1
Wavefront	Wavefront Collector: v1.9.0 Wavefront Proxy: v10.14
Compatibilities	Versions
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
NSX-T	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.
Windows stemcells	v2019.58 or later
Xenial stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.

* Components marked with an asterisk have been updated.
** VCF v4.4 is supported but has not been tested with TKGI v1.14.6.
*** To use Policy API features, you must use NSX-T v3.1.3 or later.

Upgrade Path

The supported upgrade paths to TKGI v1.14.6 are from TKGI v1.14.5 and earlier TKGI v1.14 patches, and from TKGI v1.13.10 and earlier TKGI v1.13 patches.

Features and Enhancements

TKGI v1.14.6 includes the following enhancements:

Increases the length of the insecure_registries column to 4K characters.
Decreases the Fluentd default refresh interval to 30 seconds from 60 seconds. This ensures that all the logs are forwarded to VMware vRealize Log Insight consistently.

Resolved Issues

TKGI v1.14.6 resolves the following issues:

Fixes The ‘kube-state-metrics’ ClusterRole Is Deleted during Cluster Upgrade.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.14.5 are also in Tanzu Kubernetes Grid Integrated Edition v1.14.6. See the TKGI v1.14.5 Known Issues below.

TKGI Management Console v1.14.6

Release Date: April 18, 2023

Product Snapshot

Element	Details
Version	v1.14.6
Release date	April 18, 2023
Installed Tanzu Kubernetes Grid Integrated Edition version	v1.14.6
Installed Ops Manager version	v2.10.55	Release Notes
Component	Version
Installed Kubernetes version	v1.23.17*	Release Notes
Installed Harbor Registry version	v2.7.1*	Release Notes
Linux stemcell	v621.463*

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.6 are from TKGI MC v1.14.5 and earlier TKGI MC v1.14 patches, and from TKGI MC v1.13.10 and earlier TKGI MC v1.13 patches.

Features and Resolved Issues

This release of the Tanzu Kubernetes Grid Integrated Edition Management Console includes no new features or resolved issues.

Deprecations

For information about upcoming deprecations, see Deprecations in the TKGI MC v1.14.0 Release Notes below.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.5 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.6. See the TKGI MC v1.14.5 Known Issues below.

TKGI v1.14.5

Release Date: February 21, 2023

Product Snapshot

Release	Details
Version	v1.14.5
Release date	February 21, 2023
Component	Version
Antrea	v1.4.0	Release Notes
cAdvisor	v0.39.1
Containerd	Linux: v1.6.6 Windows: v1.6.6
CoreDNS	v1.8.7+vmware.3
CSI Driver for vSphere	v2.5.4	Release Notes
Docker	Linux: v20.10.9 Windows: v20.10.9
etcd	v3.5.6*
Harbor	v2.7.0*	Release Notes
Kubernetes	v1.23.16*	Release Notes
Metrics Server	v0.5.0
NCP	v3.2.1.6*
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v5.7.38-31.59 pxc-release: v0.44.0	Release Notes: PXC pxc-release
UAA	v74.5.63*
Velero	v1.8.1	Release Notes
VMware Cloud Foundation (VCF)	v4.4** and v4.3.1	Release Notes: v4.4, v4.3.1
Wavefront	Wavefront Collector: v1.9.0 Wavefront Proxy: v10.14
Compatibilities	Versions
Ops Manager	See Broadcom Support.
NSX-T	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.
Windows stemcells	v2019.55 or later
Xenial stemcells	See Broadcom Support.

* Components marked with an asterisk have been updated.
** VCF v4.4 is supported but has not been tested with TKGI v1.14.5.
*** To use Policy API features, you must use NSX-T v3.1.3 or later.

Upgrade Path

The supported upgrade paths to TKGI v1.14.5 are from TKGI v1.14.4 and earlier TKGI v1.14 patches, and from TKGI v1.13.10 and earlier TKGI v1.13 patches.

Features and Enhancements

This release of the Tanzu Kubernetes Grid Integrated Edition includes no new features or resolved issues.

Resolved Issues

TKGI v1.14.5 resolves the following issues:

Fixes CSI Driver Image Missing After High Disk Utilization.
Fixes ‘Input not an X.509 certificate’ When Applying Change on the TKGI Tile.
Component bumps fix the following Known Issues:
- NCP v3.2.1.6:
  - Fixes NCP Known Issue: Misconfigured Ingress Invalidates All Ingress.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.14.4 are also in Tanzu Kubernetes Grid Integrated Edition v1.14.5. See the TKGI v1.14.4 Known Issues below.

TKGI Management Console v1.14.5

Release Date: February 21, 2023

Product Snapshot

Element	Details
Version	v1.14.5
Release date	February 21, 2023
Installed Tanzu Kubernetes Grid Integrated Edition version	v1.14.5
Installed Ops Manager version	v2.10.53*	Release Notes
Component	Version
Installed Kubernetes version	v1.23.16*	Release Notes
Installed Harbor Registry version	v2.7.0*	Release Notes
Linux stemcell	v621.418*

Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.5 are from TKGI MC v1.14.4 and earlier TKGI MC v1.14 patches, and from TKGI MC v1.13.10 and earlier TKGI MC v1.13 patches.

Features and Resolved Issues

This release of the Tanzu Kubernetes Grid Integrated Edition Management Console includes no new features or resolved issues.

Deprecations

For information about upcoming deprecations, see Deprecations in the TKGI MC v1.14.0 Release Notes below.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.4 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.5. See the TKGI MC v1.14.4 Known Issues below.

TKGI v1.14.4

Release Date: December 7, 2022

Product Snapshot

Release	Details
Version	v1.14.4
Release date	December 7, 2022
Component	Version
Antrea	v1.4.0	Release Notes
cAdvisor	v0.39.1
Containerd	Linux: v1.6.6 Windows: v1.6.6
CoreDNS	v1.8.7+vmware.3
CSI Driver for vSphere	v2.5.4	Release Notes
Docker	Linux: v20.10.9 Windows: v20.10.9
etcd	v3.5.5*
Harbor	v2.6.2*	Release Notes
Kubernetes	v1.23.14*	Release Notes
Metrics Server	v0.5.0
NCP	v3.2.1.3	Release Notes: v3.2.1.3
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v5.7.38-31.59 pxc-release: v0.44.0	Release Notes: PXC pxc-release
UAA	v74.5.60*
Velero	v1.8.1	Release Notes
VMware Cloud Foundation (VCF)	v4.4** and v4.3.1	Release Notes: v4.4, v4.3.1
Wavefront	Wavefront Collector: v1.9.0 Wavefront Proxy: v10.14
Compatibilities	Versions
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
NSX-T	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.
Windows stemcells	v2019.55 or later
Xenial stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.

* Components marked with an asterisk have been updated.
** VCF v4.4 is supported but has not been tested with TKGI v1.14.4.
*** To use Policy API features, you must use NSX-T v3.1.3 or later.

Upgrade Path

The supported upgrade paths to TKGI v1.14.4 are from TKGI v1.14.3 and earlier TKGI v1.14 patches, and from TKGI v1.13.9 and earlier TKGI v1.13 patches.

Features and Enhancements

TKGI v1.14.4 has the following features and enhancements:

Supports running the vROPs cAdvisor daemonset without Privileged permission. No longer requires that cAdvisor be run with Privileged permission.
Enables the VMware vSphere Container Storage Plug-in ability to suspend a specific datastore for volume provisioning using Cloud Native Storage Manager. For more information, see VMware vSphere Container Storage Plug-in 2.5 Release Notes.
Component bumps:
- Upgrades the fluent-plugin-vmware-loginsight Fluentd output plugin to v1.3.1. fluent-plugin-vmware-loginsight forwards logs to VMware Log Insight.

Resolved Issues

TKGI v1.14.4 resolves the following issues:

Fixes a delay when rebooting a VM caused by un-stopped BOSH jobs running on the VM’s containerd-runtime cluster worker nodes.

Known Issues

TKGI v1.14.4 has the following known issues:

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.14.3 are also in Tanzu Kubernetes Grid Integrated Edition v1.14.4. See the TKGI v1.14.3 Known Issues below.

The ‘kube-state-metrics’ ClusterRole Is Deleted during Cluster Upgrade

This issue is fixed in TKGI v1.14.6.

The wavefront-proxy-errand deletes the kube-state-metrics ClusterRole during cluster upgrade. The deleted ClusterRole must be manually restored after upgrading a cluster.

TKGI Management Console v1.14.4

Release Date: December 7, 2022

Product Snapshot

Element	Details
Version	v1.14.4
Release date	December 7, 2022
Installed Tanzu Kubernetes Grid Integrated Edition version	v1.14.4
Installed Ops Manager version	v2.10.50	Release Notes
Component	Version
Installed Kubernetes version	v1.23.14*	Release Notes
Installed Harbor Registry version	v2.6.2*	Release Notes
Linux stemcell	v621.330*

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.4 are from TKGI MC v1.14.3, MC v1.14.1, and v1.14.0, and from TKGI MC v1.13.9 and earlier TKGI MC v1.13 patches.

Features and Resolved Issues

TKGI Management Console v1.14.4 includes the following enhancements:

TKGI MC OS has been upgraded to Photon OS 4.0. For more information about Photon OS 4.0, see What is New in Photon OS 4 in the Project Photon OS documentation.

Deprecations

For information about upcoming deprecations, see Deprecations in the TKGI MC v1.14.0 Release Notes below.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.3 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.4. See the TKGI MC v1.14.3 Known Issues below.

TKGI v1.14.3

Release Date: October 26, 2022

Product Snapshot

Release	Details
Version	v1.14.3
Release date	October 26, 2022
Component	Version
Antrea	v1.4.0	Release Notes
cAdvisor	v0.39.1
Containerd for Linux	Linux: v1.6.6* Windows: v1.6.6*
CoreDNS	v1.8.7+vmware.3
CSI Driver for vSphere	v2.5.4*	Release Notes
Docker	Linux: v20.10.9 Windows: v20.10.9
etcd	v3.5.4
Harbor	v2.6.0*	Release Notes
Kubernetes	v1.23.12*	Release Notes
Metrics Server	v0.5.0
NCP	v3.2.1.3*	Release Notes: v3.2.1.3
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v5.7.38-31.59 pxc-release: v0.44.0	Release Notes: PXC pxc-release
UAA	v74.5.54*
Velero	v1.8.1	Release Notes
VMware Cloud Foundation (VCF)	v4.4** and v4.3.1	Release Notes: v4.4, v4.3.1
Wavefront	Wavefront Collector: v1.9.0 Wavefront Proxy: v10.14
Compatibilities	Versions
Ops Manager	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.
NSX-T	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.
Windows stemcells	v2019.51 or later
Xenial stemcells	See Retrieve Product Version Compatibilities from the Tanzu API in the Broadcom Support KB.

* Components marked with an asterisk have been updated.
** VCF v4.4 is supported but has not been tested with TKGI v1.14.3.
*** To use Policy API features, you must use NSX-T v3.1.3 or later.

Upgrade Path

The supported upgrade paths to TKGI v1.14.3 are from TKGI v1.14.2 and earlier TKGI v1.14 patches, and from TKGI v1.13.8 and earlier TKGI v1.13 patches.

Features and Enhancements

TKGI v1.14.3 has the following features and enhancements:

Supports using the vSphere Container Storage Interface (CSI) Driver on cluster worker nodes that are distributed across multiple data centers. For more information, see Configure CNS Data Centers in Deploying and Managing Cloud Native Storage (CNS) on vSphere.

Resolved Issues

TKGI v1.14.3 resolves the following issues:

[Security Fix] Component bumps fix the following:
- Upgrades Kubernetes to v1.23.12:
  - Fixes CVE-2021-25749: runAsNonRoot logic bypass for Windows containers.
Component bumps fix the following Known Issues:
- Upgrades Fluent Bit to v1.9.3:
  - Fixes Fluent Bit Does Not Merge Containerd Runtime Cluster Multi-Line Entries.
- Upgrades CSI Driver for vSphere to v2.5.4:
  - Fixes Persistent Volumes Fail to Detach from Nodes.
Fixes Switching Your Default CNI to Antrea is Not Supported.
Fixes Some Windows Pods Become Unreachable.
Fixes Windows Worker Nodes Are Unresponsive after Update-Cluster and Upgrade-Cluster.
Fixes Timeout While Switching Container Runtimes If the Docker Directory Is Too Large.
Fixes Pods on Clusters Using the containerd-Runtime Enter a CrashLoopBackOff State.
Fixes Slab Memory Leak.

Known Issues

TKGI v1.14.3 has the following known issues:

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.14.2 are also in Tanzu Kubernetes Grid Integrated Edition v1.14.3. See the TKGI v1.14.2 Known Issues below.

For Known Issues in NCP v3.2.1, see VMware NSX Container Plugin 3.2.1.1 Release Notes.

TKGI Management Console v1.14.3

Release Date: October 26, 2022

Product Snapshot

Element	Details
Version	v1.14.3
Release date	October 26, 2022
Installed Tanzu Kubernetes Grid Integrated Edition version	v1.14.3
Installed Ops Manager version	v2.10.47*	Release Notes
Component	Version
Installed Kubernetes version	v1.23.12*	Release Notes
Installed Harbor Registry version	v2.6.0*	Release Notes
Linux stemcell	v621.296*

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.3 are from TKGI MC v1.14.2, MC v1.14.1, and v1.14.0, and from TKGI MC v1.13.8 and earlier TKGI MC v1.13 patches.

Features and Resolved Issues

This release of the Tanzu Kubernetes Grid Integrated Edition Management Console includes no new features or resolved issues.

Deprecations

For information about upcoming deprecations, see Deprecations in the TKGI MC v1.14.0 Release Notes below.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.2 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.3. See the TKGI MC v1.14.2 Known Issues below.

TKGI v1.14.2 - Withdrawn

Warning: VMware recommends that you upgrade to TKGI v1.14.3 or later as soon as possible to mitigate a memory leak when using Ubuntu Xenial stemcell versions 621.241 through 621.305. For more information, see Slab memory leak on Ubuntu Xenial stemcells in the VMware Tanzu Support Hub documentation.

Release Date: August 29, 2022

Product Snapshot

Release	Details
Version	v1.14.2
Release date	August 29, 2022
Component	Version
Antrea	v1.4.0	Release Notes
cAdvisor	v0.39.1
Containerd for Linux	v1.6.4
CoreDNS	v1.8.6+vmware.9*
CSI Driver for vSphere	v2.5.2	Release Notes
Docker	Linux: v20.10.9 Windows: v20.10.9
etcd	v3.5.4
Harbor	v2.5.3*	Release Notes
Kubernetes	v1.23.7	Release Notes
Metrics Server	v0.5.0
NCP	v3.2.1.2*	Release Notes
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v5.7.38-31.59* pxc-release: v0.44.0*	Release Notes: PXC pxc-release
UAA	v74.5.47*
Velero	v1.8.1	Release Notes
VMware Cloud Foundation (VCF)	v4.4** and v4.3.1	Release Notes: v4.4, v4.3.1
Wavefront	Wavefront Collector: v1.9.0 Wavefront Proxy: v10.14
Compatibilities	Versions
Ops Manager	See Broadcom Support.
NSX-T	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.
Windows stemcells	v2019.51 or later
Xenial stemcells	See Broadcom Support.

* Components marked with an asterisk have been updated.
** VCF v4.4 is supported but has not been tested with TKGI v1.14.2.
*** To use Policy API features, you must use NSX-T v3.1.3 or later.

Upgrade Path

The supported upgrade paths to TKGI v1.14.2 are from TKGI v1.14.1 and earlier TKGI v1.14 patches, and from TKGI v1.13.6 and earlier TKGI v1.13 patches.

Features and Enhancements

TKGI v1.14.2 has the following features and enhancements:

Supports Reporting Additional Metrics

Supports the following telegraf metrics reporting enhancements:

Supports configuring telegraf to export output using the metric_version=1 format instead of the default metric_version=2 format.
Supports reporting Kubernetes Scheduler Metrics.
Supports reporting telegraf agent process metrics.

For more information, see Configure Telegraf in the Tile in Configuring Telegraf in TKGI. For more information on Telegraf output formats, see Example Output in the Telegraf GitHub documentation.

Resolved Issues

TKGI v1.14.2 resolves the following issues:

[Security Fix] Fixes the following CVEs:
- Open SSL vulnerability CVE-2022-1292.
- Open SSL vulnerability CVE-2022-1343.
- Open SSL vulnerability CVE-2022-1434.
- Open SSL vulnerability CVE-2022-1473.
- Open SSL vulnerability CVE-2022-2068.
- Open SSL vulnerability CVE-2022-2097.
Fixes The Docker Image Remains after Switching to the Containerd Container Runtime.
Fixes Upgrade of Windows cluster that has containerd as the container runtime can fail.
Fixes Kubernetes Services Cannot Be Accessed from a Windows Pod in Some Windows Clusters.
Fixes Healthwatch Unable to Scrape Kube Scheduler Metrics.
Fixes vROPs cAdvisor Daemonset Is Not Running after Upgrading TKGI.
Fixes The vsphere-csi-webhook Is Missing after Upgrading a Cluster.

Known Issues

TKGI v1.14.2 has the following known issues:

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.14.1 are also in Tanzu Kubernetes Grid Integrated Edition v1.14.2. See the TKGI v1.14.1 Known Issues below.

Some Windows Pods Become Unreachable

This issue is fixed in TKGI v1.14.3.

On occasion, the IP Addresses for one or more running Pods in a Windows cluster become unreachable. Pinging from within an unreachable Pod also fails, returning Request timed out..

The unreachable Pods might also enter a CrashLoopBackOff state.

Explanation

CNI requests within the Pod have entered a race condition. Afterward, networking for the Pod is unreachable.

Windows Worker Nodes Are Unresponsive after Update-Cluster and Upgrade-Cluster

This issue is fixed in TKGI v1.14.3.

While updating or upgrading a Windows Worker cluster that uses the containerd container runtime, some of the cluster’s nodes can become unresponsive. During the cluster update or upgrade, the node drain step for the cluster’s nodes can time out and not finish.

Symptom

While updating or upgrading a Windows Worker cluster, the drain step for some nodes hangs for an extended period, and the nodes enter an unresponsive agent state. The update or upgrade process eventually logs the error Error: Timed out sending 'get_task' to instance for the executing drain step.

TKGI Management Console v1.14.2 - Withdrawn

Warning: VMware recommends that you upgrade to TKGI MC v1.14.3 or later as soon as possible to mitigate a memory leak when using Ubuntu Xenial stemcell versions 621.241 through 621.305. For more information, see Slab memory leak on Ubuntu Xenial stemcells in the VMware Tanzu Support Hub documentation.

Release Date: August 29, 2022

Product Snapshot

Element	Details
Version	v1.14.2
Release date	August 29, 2022
Installed Tanzu Kubernetes Grid Integrated Edition version	v1.14.2
Installed Ops Manager version	v2.10.45*	Release Notes
Component	Version
Installed Kubernetes version	v1.23.7	Release Notes
Installed Harbor Registry version	v2.5.3*	Release Notes
Linux stemcell	v621.265*

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.2 are from TKGI MC v1.14.1, and v1.14.0, and from TKGI MC v1.13.7 and earlier TKGI MC v1.13 patches.

Features and Resolved Issues

This release of the Tanzu Kubernetes Grid Integrated Edition Management Console includes no new features or resolved issues.

Deprecations

For information about upcoming deprecations, see Deprecations in the TKGI MC v1.14.0 Release Notes below.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.1 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.2. See the TKGI MC v1.14.1 Known Issues below.

TKGI v1.14.1 - Withdrawn

Release Date: July 6, 2022

Warning: If you have Windows clusters, do not upgrade to TKGI v1.14.1. If you must upgrade to 1.14.1, lock the container runtime to use Docker as described in Lock a Cluster to the Docker Container Runtime. For More information, see Upgrade of Windows cluster that has containerd as the container runtime can fail.

Product Snapshot

Release	Details
Version	v1.14.1
Release date	July 6, 2022
Component	Version
Antrea	v1.4.0	Release Notes
cAdvisor	v0.39.1
Containerd for Linux	v1.6.4*
CoreDNS	v1.8.7+vmware.3*
CSI Driver for vSphere	v2.5.2*	Release Notes
Docker	Linux: v20.10.9 Windows: v20.10.9*
etcd	v3.5.4
Harbor	v2.5.1*	Release Notes
Kubernetes	v1.23.7*	Release Notes
Metrics Server	v0.5.0
NCP	v3.2.1.1*
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v5.7.36-31.55 pxc-release: v0.42.0	Release Notes: PXC pxc-release
UAA	v74.5.45*
Velero	v1.8.1	Release Notes
VMware Cloud Foundation (VCF)	v4.4** and v4.3.1	Release Notes: v4.4, v4.3.1
Wavefront	Wavefront Collector: v1.9.0 Wavefront Proxy: v10.14
Compatibilities	Versions
Ops Manager	See Broadcom Support.
NSX-T	See VMware Product Interoperability Matrices***.
vSphere	See VMware Product Interoperability Matrices***.
Windows stemcells	v2019.46 or later
Xenial stemcells	See Broadcom Support.

* Components marked with an asterisk have been updated.
** VCF v4.4 is supported but has not been tested with TKGI v1.14.1.
*** To use Policy API features, you must use NSX-T v3.1.3 or later.

Upgrade Path

The supported upgrade paths to TKGI v1.14.1 are from TKGI v1.14.0, and from TKGI v1.13.6 and earlier TKGI v1.13 patches.

Features

TKGI v1.14.1 has the following features and enhancements:

Supports VMware NSX-T v3.2.1.
NSX-T Policy API is now generally available with TKGI and NSX-T v3.2.1 at 50% of Management Plane API scale. For more information, see NSX-T Policy API Support.
Supports accessing images in a private Docker registry from Linux clusters with containerd container runtimes. For more information, see Configuring Cluster Access to Private Docker Registries (Beta).
In TKGI v1.14.1 and later, the nsx_ingress_controller CNI Configuration parameter is deprecated and ignored. Instead, the NSX Ingress Controller is automatically enabled when the NSX Load Balancer is enabled.

For more information, see Configure the HTTP/HTTPS Ingress Controller Network Profile in Creating and Managing Network Profiles.

Resolved Issues

TKGI v1.14.1 resolves the following issues:

[Security Fix] Fixes the following CVEs:
- TKGI API: Diffie-Hellman <= 1024 bits vulnerability CVE-2015-4000.
Fixes Custom BOSH vm_extensions Configuration Settings Applied Inconsistently to Cluster VMs.
Fixes Automatic vSphere CSI Driver Integration Ignores Proxy Configuration.
Fixes Windows Pause Image Not Pulled from Private Harbor Registry.
Fixes Docker Commands No Longer Work on Worker VMs.
Fixes VMDKs Are Deleted during Migration from In-Tree Storage to CSI.
Fixes Kubernetes Pods on NSX-T Become Stuck in a Creating State. This issue is fixed by using NSX-T v3.2 or later.

Known Issues

TKGI v1.14.1 has the following known issues:

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition v1.14.0 are also in Tanzu Kubernetes Grid Integrated Edition v1.14.1. See the TKGI v1.14.0 Known Issues below.

For Known Issues in NCP v3.2.1, see NSX Container Plugin 3.2.1.1 Release Notes.

Upgrade of Windows cluster that has containerd as the container runtime can fail

This issue is fixed in TKGI v1.14.2.

Symptom

Cluster upgrade for a Windows cluster can fail:

If the Windows cluster has containerd as the container runtime.
When you attempt to switch the container runtime from containerd to Docker during the cluster upgrade.

Explanation

The Windows cluster upgrade can fail with the following error message:

Error: Action Failed get_task: Task 472b18b9-484f-4323-77ed-fcabf7f51eae result: Applying: Keeping only needed packages: Uninstalling package bundle: remove /var/vcap/data/packages/containerd-windows/9beecbdcbcbc016def57e61f4914847998868a0d\containerd\containerd-shim-runhcs-v1.exe: Access is denied.

Workaround

Manually remove the VMs that have the issue. BOSH recreates the removed VMs automatically after a few minutes.

The vsphere-csi-webhook Is Missing after Upgrading a Cluster

This issue is fixed in TKGI v1.14.2.

If you have a TKGI cluster that uses the vSphere CSI Drive in an air-gapped environment, the vsphere-csi-webhook will be missing after upgrading the cluster to TKGI v1.14.

Symptom

After upgrading a TKGI cluster to TKGI v1.14, the Pods prefixed with vsphere-csi-webhook- have the status ErrImageNeverPull.

Slab Memory Leak

This issue is fixed in TKGI v1.14.3.

Ubuntu Xenial Stemcell versions 621.241 through 621.305 contain a Linux Kernel slab memory leak. VMware recommends that you upgrade to TKGI v1.14.3 or later as soon as possible to mitigate the memory leak.

For more information, see Slab memory leak on Ubuntu Xenial stemcells in the VMware Tanzu Support Hub documentation.

TKGI Management Console v1.14.1 - Withdrawn

Release Date: July 6, 2022

Product Snapshot

Element	Details
Version	v1.14.1
Release date	July 6, 2022
Installed Tanzu Kubernetes Grid Integrated Edition version	v1.14.1
Installed Ops Manager version	v2.10.43	Release Notes
Component	Version
Installed Kubernetes version	v1.23.7*	Release Notes
Installed Harbor Registry version	v2.5.1*	Release Notes
Linux stemcell	v621.252*

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade paths to Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.1 are from TKGI MC v1.14.0, and from TKGI MC v1.13.6 and earlier TKGI MC v1.13 patches.

Features and Resolved Issues

This release of the Tanzu Kubernetes Grid Integrated Edition Management Console includes no new features or resolved issues.

Deprecations

For information about upcoming deprecations, see Deprecations in the TKGI MC v1.14.0 Release Notes below.

Known Issues

Except where noted, the known issues in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.0 are also in Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.1. See the TKGI MC v1.14.0 Known Issues below.

TKGI v1.14.0

Release Date: May 12, 2022

Product Snapshot

Warning: If you have Windows clusters, do not upgrade to TKGI v1.14.0. If you must upgrade to 1.14.0, lock the container runtime to use Docker as described in Lock a Cluster to the Docker Container Runtime. For More information, see Upgrade of Windows cluster that has containerd as the container runtime can fail.

Release	Details
Version	v1.14.0
Release date	May 12, 2022
Component	Version
Antrea	v1.4.0	Release Notes
cAdvisor	v0.39.1
Containerd for Linux	v1.6.0*
CoreDNS	v1.8.6+vmware.4*
CSI Driver for vSphere	v2.5.1*	Release Notes
Docker	Linux: v20.10.9 Windows: v20.10.7
etcd	v3.5.4*
Harbor	v2.5.0*	Release Notes
Kubernetes	v1.23.4*	Release Notes
Metrics Server	v0.5.0
NCP	v3.2.1.0**
Percona XtraDB Cluster (PXC) (in BOSH pxc-release)	v5.7.36-31.55* pxc-release: v0.42.0*	Release Notes: PXC pxc-release
UAA	v74.5.39*
Velero	v1.8.1*	Release Notes
VMware Cloud Foundation (VCF)	v4.4*** and v4.3.1	Release Notes: v4.4, v4.3.1
Wavefront	Wavefront Collector: v1.9.0* Wavefront Proxy: v10.14*
Compatibilities	Versions
Ops Manager	See Broadcom Support.
NSX-T	See VMware Product Interoperability Matrices****.
vSphere	See VMware Product Interoperability Matrices****.
Windows stemcells	v2019.46 or later
Xenial stemcells	See Broadcom Support.

* Components marked with an asterisk have been updated.
** NCP v3.2.1.0 does not support NSX-T v3.2.1.
*** VCF v4.4 is supported but has not been tested with TKGI v1.14.0.
**** To use Policy API features, you must use NSX-T v3.1.3 or later.

Upgrade Path

The supported upgrade paths to TKGI v1.14.0 are from Tanzu Kubernetes Grid Integrated Edition v1.13.2, v1.13.1, and v1.13.0.

Breaking Changes

TKGI v1.14.0 has the following breaking changes:

Linux and Windows cluster container runtimes are automatically switched from Docker to containerd during the TKGI v1.14.0 upgrade.
Warning: During a TKGI upgrade, cluster workloads will experience downtime while the cluster switches from using the Docker container runtime to containerd. To avoid workload downtime, VMware recommends that you switch your clusters to the containerd container runtime before upgrading to TKGI v1.14. For more information, see Cluster Workloads Experience Downtime While Upgrading and Switching Container Runtimes below.

Warning: The upgrade to TKGI v1.14.0 will switch a “locked” cluster to using the containerd runtime if, between locking the container runtime and upgrading, you ran tkgi update-cluster without including the lock_container_runtime: true parameter in your configuration. For more information on locking the container runtime, see Customize Cluster Container Runtimes Before Upgrading in Upgrade Preparation Checklist for TKGI.
Docker commands on worker VMs’ no longer work: Clusters using the Docker container runtime no longer default to supporting Docker command line commands. For more information, see Docker Commands No Longer Work on Worker VMs below.
Kubernetes has been upgraded to Kubernetes v1.23.4:
- The CSI Migration feature is enabled by default in Kubernetes v1.23 but is not enabled by default in TKGI 1.14. This allows administrators to control the timing of their migration from an existing in-tree storage driver. The CSI Migration feature remains in Beta for Google Compute Engine (GCE) PD, Amazon Web Services (AWS) EBS, and Azure Disk.
- The PodSecurityPolicy admission controller has been deprecated and replaced with PodSecurity, which is now in Beta. The PodSecurity feature gate is now enabled by default.
For more information about Kubernetes v1.23, see Kubernetes 1.23: The Next Frontier in the Kubernetes documentation.
Support for the manually installed vSphere CSI Driver has been entirely removed.
Warning: Before upgrading to TKGI v1.14, you must prepare your clusters for removing the manually installed vSphere CSI Driver. For information on how to migrate to automatic vSphere CSI Driver installation, see Switch From the Manually Installed vSphere CSI Driver to the Automatic CSI Driver in Deploying and Managing Cloud Native Storage (CNS) on vSphere.
The kube-apiserver audit.log file is in a new location:
- Location in TKGI v1.13.4 and earlier: /var/vcap/sys/log/kube-apiserver/audit.log
- Location in TKGI v1.14.0 and later: /var/vcap/sys/log/kube-apiserver/audit/log/audit.log

Features

TKGI v1.14.0 has the following features and enhancements:

Security Features

Passes additional CIS Kubernetes Benchmarks:

2.1.7: Ensure that the --protect-kernel-defaults argument is set to true.
For more information, see CIS Kubernetes Benchmarks.

Manage Host and Cluster Logs

Supports managing host log customization:

Supports managing host log customization by configuring RSyslog. For more information, see Syslog in the Installing TKGI topic for your environment.

Supports managing cluster log filtering:

Supports filtering LogSink and ClusterLogSink logging. For more information, see Create a Fluent Bit ClusterLogSink or LogSink Filter in Creating and Managing Sink Resources.

Supports the vSphere CSI Driver Topology Features

Supports using the vSphere CSI Driver Topology feature. For more information, see Configure Topology in Deploying and Managing Cloud Native Storage (CNS) on vSphere.

Enhanced Container Runtime Support

Windows Workers Support Containerd-Runtime Containers

Supports Windows workers with containerd-runtime containers.

Migrates Container Runtimes for Linux and Windows Clusters from Docker to Containerd

Supports migrating existing Linux and Windows clusters from a Docker container runtime to the containerd-runtime:

Automatically migrates Docker-runtime containers to the containerd-runtime while upgrading clusters from TKGI v1.13 to v1.14.
Supports manually migrating a single cluster from a Docker container runtime to the containerd container runtime.

Note: Linux and Windows clusters that are not locked to the Docker-runtime will automatically migrate to the containerd container runtime during the TKGI v1.14 cluster upgrade. All Docker-runtime clusters must be migrated to containerd prior to upgrading to TKGI v1.15.

For more information, see Prepare Clusters for Automatic Container Runtime Migration in Upgrade Preparation Checklist for Tanzu Kubernetes Grid Integrated Edition.

Network Profiles Enhancements

Supports the following Network Profiles enhancements:

Supports Configuring Additional NCP Parameters

Network Profiles configuration now supports configuring almost all NCP parameters configurable within BOSH. For more information, see extensions in Creating and Managing Network Profiles.

Supports Updating Additional Network Profile Parameters

Network Profiles configuration now supports updating almost all Network Profile cni_configurations parameters on an existing cluster. For more information, see cni_configurations Parameters in Creating and Managing Network Profiles.

Additional Features

TKGI v1.14.0 includes the following additional features:

Supports configuring the TKGI API Operation Timeout length. For more information, see Networking in Installing TKGI on vSphere with NSX-T.
Supports modifying the floating IP pool of an existing cluster. For more information, see fip_pool_ids in Creating and Managing Network Profiles.
Reuses a TKGI API user’s existing service account and ClusterRoleBinding when executing tkgi get-credentials.
No longer starts Wavefront Pods while Wavefront is deactivated.
Fluent Bit has been upgraded from v1.5.7 to v1.9.0. For more information, see Upgrade Notes in the Fluent Bit documentation.
OpenJDK has been upgraded to v11.0.14. For more information, see OpenJDK v11.0.14 Released in the OpenJDK notification archives.
Antrea has been upgraded from v1.2.2+vmware.0 to v1.5.2+vmware.0. For more information, see Integration of Antrea Container Clusters.

Resolved Issues

TKGI v1.14.0 resolves the following issues:

[Security Fix] Fixes the following CVEs:
- Open SSL vulnerability CVE-2021-3711.
- Open SSL vulnerability CVE-2021-3712.
- Apache Log4j vulnerability CVE-2021-44228.
- Apache Log4j vulnerability CVE-2021-45046.
- Apache Log4j vulnerability CVE-2021-45105.
- Spring Application Remote Code Execution Vulnerability CVE-2022-22965.
[Security Fix] Component bumps fix the following:
- containerd:
  - Fixes containerd CVE-2021-43816: An unprivileged pod might bind mount any privileged regular file on disk for complete read/write access.
- OpenJDK:
  - For information on the resolved security issues, see OpenJDK v11.0.14 Released in the OpenJDK notification archives.
Fixes upgrade errand pre-start script failures in environments where an NSX-T logical port is tagged with a blank string.
Fixes Cluster Upgrading Fails After Upgrading NSX-T and TKGI.
Fixes Multiple CoreDNS Pods Might Run On the Same Worker Node after Upgrading a Cluster.
Fixes CoreDNS Pods Fail While Upgrading to TKGI v1.12.5.
Fixes Docker Service Fails While Upgrading Clusters to TKGI v1.13.3.
Fixes TKGI Does Not Support Updating Dedicated Tier 1 Clusters.
Fixes Cluster Migration from Flannel to Antrea Can Fail for Certain CIDR Addresses.
Fixes Cluster Migration to the vSphere CSI Driver Does Not Support Air-Gapped Environments.
Fixes Cluster Deletion Fails to Delete All NSX-T Objects in Certain Circumstances.
Fixes Using a New Compute Profile to Add and Remove Node Pools Deletes and Recreates the Node Pool VMs.
Fixes Rotate-Certificates Returns ’Could Not Fetch VMs Info’ When Rotating Certificates for More than 1000 Clusters.
Fixes Certificate Rotation Fails If the Cluster Has a Kubernetes Profile.
Fixes The Tier-1 Gateway and Static Routes for the LoadBalancer CRD Are Not Deleted While Deleting Policy API-Based Clusters.
Fixes The TKGI API FQDN Must Not Include Trailing Whitespace.
Fixes Switching Cluster Container Runtimes Two or More Times Fails on Clusters with vROps Enabled.
Fixes kube-apiserver Logs Occupy More Disk Space Than Allocated.
Component bumps fix the following Known Issues:
- etcd:
  - Fixes etcd v3.5.0 Data Corruption Under Heavy Loads.
- Fluent Bit:
  - Fixes Fluent Bit DNS Resolution Timeout Failure.
  - Fixes The logging prefix on containerd runtime cluster log entries is different than the logging prefix on Docker runtime cluster log entries.
- NCP:
  - Fixes Existing NSX-T Load Balancer Domain Ingress Rules Override Newer Subdomain Ingress Rules.

Deprecations

The following TKGI features have been deprecated or removed from TKGI v1.14:

The log_dropped_traffic CNI Configuration parameter: In TKGI v1.15.0 and later, the log_dropped_traffic CNI Configuration parameter will be deprecated and ignored.

To configure logging in a Network Profile, modify the log_firewall_traffic parameter. For more information, see log_settings in cni_configurations Parameters in Creating and Managing Network Profiles.
Manual vSphere CSI Driver Installation Support: Support for manually installing the vSphere CSI Driver has been entirely removed in TKGI v1.14. For information on automatic vSphere CSI Driver installation, see Deploying and Managing Cloud Native Storage (CNS) on vSphere.
In-Tree vSphere Storage Volume Support: In-Tree vSphere Storage volume support has been deprecated and will be entirely removed in a future Kubernetes version. For information on how to manually migrate In-Tree vSphere Storage volumes on existing TKGI clusters from In-Tree vSphere Storage to the automatically installed vSphere CSI Driver, see Migrate from In-Tree vSphere Storage to the vSphere CSI Driver in Deploying and Managing Cloud Native Storage (CNS) on vSphere.
Flannel Support: Support for the Flannel Container Networking Interface (CNI) is deprecated. VMware recommends that you switch your Flannel CNI-configured clusters to the Antrea CNI. For more information about Flannel CNI deprecation, see About Switching from the Flannel CNI to the Antrea CNI in About Tanzu Kubernetes Grid Integrated Edition Upgrades.
Docker Support: Kubernetes support for the Docker container runtime has been deprecated, and support for the Docker container runtime will be entirely removed in Kubernetes v1.24. TKGI v1.14 supports both the Docker and containerd container runtimes.
Pod Security Policy Support: Kubernetes Pod Security Policy (PSP) support has been deprecated and PSP support will be entirely removed in Kubernetes v1.25. Kubernetes v1.23 and v1.24 provide beta support for Pod Security Admission. For more information, see Pod Security Admission and Enforce Pod Security Standards with Namespace Labels in the Kubernetes documentation.

Known Issues

TKGI v1.14.0 has the following known issues.

For Known Issues in NCP v3.2.1, see NSX Container Plugin 3.2.1.1 Release Notes.

TKGI version upgrade without new stemcell fails for Containerd runtime clusters with Istio CNI

Symptom

On clusters configured to use a containerd registry and Istio CNI, upgrading the TKGI version without also upgrading the stemcell fails with errors kubelet cannot find istio-cni binary and nsx fails to recieve message header.

This error does not occur when you upgrade to a new stemcell along with the new TKGI version.

Explanation

When TKGI cluster upgrades and drains the node during upgrade, it leaves the cluster nodes’ Istio CNI agent and CNI configuration in a corrupted state.

If the cluster nodes are not automatically re-created by a stemcell change, the corrupted Istio CNI state remains.

Workaround

For clusters that use both Containerd and Istio CNI:

If you have already encountered this issue, re-create all worker nodes using the bosh recreate command:
1. Run the bosh vms command to list the cluster VMs:
```
bosh -d service-instance-DEPLOYMENT-ID vms
```
  Where DEPLOYMENT-ID is the BOSH-generated ID of your Kubernetes cluster deployment.
2. For each VM instance listed as worker/UUID in the output, run bosh recreate VM-NAME:
```
bosh -d service-instance-DEPLOYMENT-ID recreate worker/UUID
```
In the future, you can avoid this issue by upgrading a cluster’s stemcell whenever you upgrade its TKGI version.

Certificate Rotation Removes the SNI Certificate IDs on the HTTPS Load Balancer Virtual Server

When a security certificate is updated through the Tanzu Kubernetes Grid Integrated Edition CLI, the rotation workflow updates the default certificate IDs in the HTTPS Load Balancer (LB) virtual server. When this process runs, the workflow accidentally removes the server name indicator (SNI) certificate IDs on the LB Virtual Server.

Kubernetes Services Cannot Be Accessed from a Windows Pod in Some Windows Clusters

This issue is fixed in TKGI v1.14.2.

When accessing Kubernetes services from a Windows Pod, the following error is logged in some Windows clusters:

curl: (7) Failed to connect to 192.168.1.1 port 8080: Timed out

The Windows Pod is unable to communicate with the Kubernetes service.

TKGI MC Unable to Manage TKGI after Restoring the TKGI Control Plane from Backup

Symptom

After you restore Ops Manager and the TKGI API VM from backup, TKGI functions normally, but your TKGI MC tabs include the following error: “…product ‘pivotal-container service’ is not deployed…”.

Explanation

TKGI MC is associated with an Ops Manager with a specific name. If you rename Ops Manager with a new name while restoring, your TKGI MC will not recognize the restored Ops Manager and cannot manage it.

Kubernetes Pods on NSX-T Become Stuck in a Creating State

This issue is fixed by using NSX-T v3.2 or later.

Symptom

The pods in your TKGI Kubernetes clusters on NSX-T become stuck in a creating state. The connections between nsx-node-agent and hyperbus repeatedly close, log Couldn't connect to 'tcp://...' (error: 111-Connection refused), and have a status of COMMUNICATION_ERROR.

Explanation

For information and workaround steps for this Known Issue, see Issue 2795268: Connection between nsx-node-agent and hyperbus flips and Kubernetes pod is stuck at creating state in NSX Container Plugin 3.1.2 Release Notes in the VMware documentation.

Error: Could Not Execute “Apply-Changes” in Azure Environment

Symptom

After clicking Apply Changes on the TKGI tile in an Azure environment, you experience an error ‘…could not execute “apply-changes”…’ with either of the following descriptions:

{“errors”:{“base”:[“undefined method ‘location’ for nil:NilClass”]}}
FailedError.new(“Resource Groups in region ‘#{location}’ do not support Availability Zones”))

For example:

INFO | 2020-09-21 03:46:49 +0000 | Vessel::Workflows::Installer#run | Install product (apply changes)
2020/09/21 03:47:02 could not execute "apply-changes": installation failed to trigger: request failed: unexpected response from /api/v0/installations:
HTTP/1.1 500 Internal Server Error
Transfer-Encoding: chunked
Cache-Control: no-cache, no-store
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Mon, 21 Sep 2020 17:51:50 GMT
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Pragma: no-cache
Referrer-Policy: strict-origin-when-cross-origin
Server: Ops Manager
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Download-Options: noopen
X-Frame-Options: SAMEORIGIN
X-Permitted-Cross-Domain-Policies: none
X-Request-Id: f5fc99c1-21a7-45c3-7f39
X-Runtime: 9.905591
X-Xss-Protection: 1; mode=block

44
{"errors":{"base":["undefined method `location' for nil:NilClass"]}}
0

Explanation

The Azure CPI endpoint used by Ops Manager has been changed and your installed version of Ops Manager is not compatible with the new endpoint.

Workaround

Run the following Ops Manager CLI command:

om --skip-ssl-validation --username USERNAME --password PASSWORD --target https://OPSMAN-API curl --silent --path /api/v0/staged/director/verifiers/install_time/IaasConfigurationVerifier -x PUT -d '{ "enabled": false }'

Where:

USERNAME is the account to use to run Ops Manager API commands.
PASSWORD is the password for the account.
OPSMAN-API is the IP address for the Ops Manager API

For more information, see Error ‘undefined method location’ is received when running Apply Change on Azure in the VMware Tanzu Knowledge Base.

VMware vRealize Operations Does Not Support Windows Worker-Based Kubernetes Clusters

VMware vRealize Operations (vROPs) does not support Windows worker-based Kubernetes clusters and cannot be used to manage TKGI-provisioned Windows workers.

TKGI Wavefront Requires Manual Installation for Windows Workers

To monitor Windows-based worker node clusters with a Wavefront collector and proxy, you must first install Wavefront on the clusters manually, using Helm. For instructions, see the Wavefront section of the Monitoring Windows Worker Clusters and Nodes topic.

Pinging Windows Worker Kubernetes Clusters Does Not Work

TKGI-provisioned Windows worker-based Kubernetes clusters inherit a Kubernetes limitation that prevents outbound ICMP communication from workers. As a result, pinging Windows workers does not work.

For information about this limitation, see Limitations > Networking in the Windows in Kubernetes documentation.

Velero Does Not Support Backing Up Stateful Windows Workloads

You can use Velero to back up stateless TKGI-provisioned Windows workers only. You cannot use Velero to back up stateful Windows applications. For more information, see Velero on Windows in Basic Install in the Velero documentation.

Tanzu Mission Control Integration Not Supported on GCP

TKGI on Google Cloud Platform (GCP) does not support Tanzu Mission Control (TMC) integration, which is configured in the Tanzu Kubernetes Grid Integrated Edition tile > the Tanzu Mission Control pane.

If you intend to run TKGI on GCP, skip this pane when configuring the Tanzu Kubernetes Grid Integrated Edition tile.

TMC Data Protection Feature Requires Privileged TKGI Containers

TMC Data Protection feature supports privileged TKGI containers only. For more information, see Plans in the Installing TKGI topic for your IaaS.

Windows Worker Kubernetes Clusters with Group Managed Service Account Do Not Support Compute Profiles

Windows worker-based Kubernetes clusters integrated with group Managed Service Account (gMSA) cannot be managed using compute profiles.

Windows Worker Kubernetes Clusters on Flannel Do Not Support Compute Profiles

On vSphere with NSX-T networking you can use compute profiles with both Linux and Windows worker‑based Kubernetes clusters. On vSphere with Flannel networking, you can apply compute profiles only to Linux clusters.

TKGI CLI Does Not Prevent Reducing the Control Plane Node Count

TKGI CLI does not prevent accidentally reducing a cluster’s control plane node count using a compute profile.

Warning: Reducing a cluster’s control plane node count can destroy the cluster. Do not scale out or scale in existing control plane nodes by reconfiguring the TKGI tile or by using a compute profile. Reducing a cluster’s number of control plane nodes might remove a control plane node and cause the cluster to become inactive.

Windows Cluster Nodes Not Deleted After VM Deleted

Symptom

After you delete a VM using the management console of your infrastructure provider, you notice a Windows worker node that had been on that VM is now in a notReady state.

Solution

To identify the leftover node:
```
kubectl get no -o wide
```
Locate nodes on the returned list that are in a notReady state and have the same IP address as another node in the list.
To manually delete a notReady node:
```
kubectl delete node NODE-NAME
```
Where NODE-NAME is the name of the node in the notReady state.

502 Bad Gateway After OIDC Login

Symptom

You experience a “502 Bad Gateway” error from the NSX load balancer after you log in to OIDC.

Explanation

A large response header has exceeded your NSX-T load balancer maximum response header size. The default maximum response header size is 10,240 characters and should be resized to 16,384.

Workaround

If you experience this issue, manually reconfigure your NSX-T request_header_size to 4096 characters and your response_header_size to 16384. For information about configuring NSX default header sizes, see OIDC Response Header Overflow in the Knowledge Base.

Difficulty Changing Proxy for Windows Workers

You must configure a global proxy in the Tanzu Kubernetes Grid Integrated Edition tile > Networking pane before you create any Windows workers that use the proxy.

You cannot change the proxy configuration for Windows workers in an existing cluster.

Character Limitations in HTTP Proxy Password

For vSphere with NSX-T, the HTTP Proxy password field does not support the following special characters: & or ;.

Error After Modifying Your Harbor Storage Configuration

Symptom

You receive the following error after modifying your existing Harbor installation’s storage configuration:

Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown

Explanation

Harbor does not support modifying an existing Harbor installation’s storage configuration.

Workaround

To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.

Ingress Controller Statefulset Fails to Start After Resizing Worker Nodes

Symptom

Permissions are removed from your cluster’s files and processes after resizing the persistent disk during a cluster upgrade. The ingress controller statefulset fails to start.

Explanation

When resizing a persistent disk, Bosh migrates the data from the old disk to the new disk but does not copy the files’ extended attributes.

Workaround

To resolve the problem, complete the steps in [Ingress controller statefulset fails to start after resize of worker nodes with permission denied] (https://knowledge.broadcom.com/external/article/298618/ingress-controller-statefulset-fails-to.html?language=en_US) in the VMware Tanzu Knowledge Base.

Azure Default Security Group Is Not Automatically Assigned to Cluster VMs

Symptom

You experience issues when configuring a load balancer for a multi-control plane node Kubernetes cluster or creating a service of type LoadBalancer. Additionally, in the Azure portal, the VM > Networking page does not display any inbound and outbound traffic rules for your cluster VMs.

Explanation

As part of configuring the Tanzu Kubernetes Grid Integrated Edition tile for Azure, you enter Default Security Group in the Kubernetes Cloud Provider pane. When you create a Kubernetes cluster, Tanzu Kubernetes Grid Integrated Edition automatically assigns this security group to each VM in the cluster. However, on Azure the automatic assignment might not occur.

As a result, your inbound and outbound traffic rules defined in the security group are not applied to the cluster VMs.

Workaround

If you experience this issue, manually assign the default security group to each VM NIC in your cluster.

One Plan ID Longer than Other Plan IDs

Symptom

One of your plan IDs is one character longer than your other plan IDs.

Explanation

In TKGI, each plan has a unique plan ID. A plan ID is normally a UUID consisting of 32 alphanumeric characters and 4 hyphens. However, the Plan 4 ID consists of 33 alphanumeric characters and 4 hyphens.

Solution

You can safely configure and use Plan 4. The length of the Plan 4 ID does not affect the functionality of Plan 4 clusters.

If you require all plan IDs to have identical length, do not activate or use Plan 4.

Database Cluster Stops After a Database Instance is Stopped

Symptom

After you stop one instance in a multiple-instance database cluster, the cluster stops, or communication between the remaining databases times out, and the entire cluster becomes unreachable.

The following might be in your UAA log:

WSREP has not yet prepared node for application use

Explanation

The database cluster is unable to recover automatically because a member is no longer available to reconcile quorum.

Velero Back Up Fails for vSphere PVs Attached to Clusters on Kubernetes v1.20 and Later

Symptom

Backing up vSphere persistent volumes using Velero fails and your Velero backup log includes the following error:

rpc error: code = Unknown desc = Failed during IsObjectBlocked check: Could not translate selfLink to CRD name

Explanation

This is a known issue when backing up clusters on Kubernetes v1.20 and later using the Velero Plugin for vSphere v1.1.0 or earlier.

Workaround

To resolve the problem, complete the steps in Velero backups of vSphere persistent volumes fail on Kubernetes clusters version 1.20 or higher (83314) in the VMware Tanzu Knowledge Base.

Creating Two Windows Clusters at the Same Time Fails

Symptom

The first time that you try to create two Windows clusters at the same time, the creation of one of the clusters fails. If you run pks cluster CLUSTER-NAME to examine the last action taken on the cluster, you see the following:

 Last Action: Create Last Action State: failed Last Action Description: Instance provisioning failed: There was a problem completing your request. … operation: create, error-message: Failed to acquire lock … locking task id is 111, description: ‘create deployment’

Explanation

This is a known issue that occurs the first time that you create two Windows clusters concurrently.

Workaround

Recreate the failed cluster. This issue only occurs the first time that you create two Windows clusters concurrently.

Deleted Clusters are Listed in Cluster Lists

Symptom

After running tkgi delete-cluster and cluster deletion has completed, the deleted cluster continues to be listed when running tkgi clusters.

Workaround

You must manually remove the deleted cluster using a customized version of the ncp_cleanup script. For more information, see Deleting a Tanzu Kubernetes Grid Integrated Edition cluster with “tkgi delete-cluster” stuck “in progress” status in the VMware Tanzu Knowledge Base.

BOSH Director Logs the Error ‘Duplicate vm extension name’

Symptom

After you uninstall TKGI, then reinstall TKGI in the same environment, BOSH Director logs errors similar to the following:

.../gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/cloud_manifest_parser.rb:120:in `parse_vm_extensions': Duplicate vm extension name 'disk_enable_uuid' (Bosh::Director::DeploymentDuplicateVmExtensionName)

Explanation

The pivotal-container-service cloud-config was not removed when you uninstalled the TKGI tile, and it remained active. When you reinstalled the TKGI tile, an additional pivotal-container-service cloud-config was created, causing the metrics_server to fall into a crash-loop state.

Workaround

You must manually remove the pivotal-container-service cloud-config after removing your TKGI deployment, including after removing the TKGI tile from Ops Manager.

For more information, see “Duplicate vm extension name” error when metrics_server runs on Director VM in Tanzu Kubernetes Grid Integrated Edition in the VMware Tanzu Community Knowledge Base.

The TKGI API FQDN Must Not Include Trailing Whitespace

Symptom

Your TKGI logs include the following error:

'uaa'. Errors are:- Error filling in template 'uaa.yml.erb' (line 59: Client redirect-uri is invalid: uaa.clients.pks_cli.redirect-uri Client redirect-uri is invalid: uaa.clients.pks_cluster_client.redirect-uri)

Explanation

The TKGI API fully-qualified domain name (FQDN) for your cluster contains leading or trailing whitespace.

Workaround

Do not include whitespace in the TKGI tile API Hostname (FQDN) field.

TMC Cluster Data Protection Backup Fails After Upgrading TKGI

The TMC Cluster Data Protection Backup fails in TKGI environments upgraded from an earlier version.

Symptom

The TMC Cluster Data Protection Backup fails to back up your existing clusters and logs the following error:

error executing custom action (groupResource=customresourcedefinitions.apiextensions.k8s.io, namespace=, name=ncpconfigs.nsx.vmware.com): rpc error: code = Unknown desc = error fetching v1beta1 version of ncpconfigs.nsx.vmware.com: the server could not find the requested resource

Explanation

Kubernetes v1.22 disallows the spec.preserveUnknownFields: true configuration in your existing clusters and the creation of a v1 CustomResourceDefinitions configuration fails.

TMC Cluster Data Protection Restore Fails When Using Antrea CNI

The TMC Cluster Data Protection Restore operation can fail when restoring multiple Antea resources.

Symptom

The TMC Cluster Data Protection Restore fails and logs errors that requests to restore the admission webhook have been denied.

Explanation

Velero has encountered a race condition while operating a resource. For more information, see Allow customizing restore order for Kubernetes controllers and their managed resources in the Velero GitHub repository.

TKGI Does Not Support CVDS / NVDS Mixed Environments

TKGI does not support environments where there are multiple matching networks, such as a mixed CVDS/NVDS environment.

Symptom

TKGI logs errors similar to the following in an environment with multiple matching networks:

LastOperationstatus='failed', description='Instance provisioning failed:
There was a problem completing your request. Please contact your operations team providing the following information:
service: p.pks, service-instance-guid: ..., broker-request-id: ..., task-id: ..., operation: create,
error-message: Unknown CPI error 'Unknown' with message 'undefined method `mob' for <VimSdk::Vim::OpaqueNetwork:' in create_vm' CPI method

Explanation

TKGI cannot identify which of the matching networks you intend to use and has selected the wrong network.

Occasionally update-cluster Does Not Complete for Windows Workers

Occasionally, tkgi update-cluster hangs while updating a Windows worker node instance and the BOSH task cannot finish and exits.

Symptom

The ovsdb-server service has stopped but other processes report that it is running.

Explanation

The ovsdb-server.pid file uses the pid for a process that is not the ovsdb-server.

To confirm that this is the root cause for tkgi update-cluster to hang:

To verify the ovsdb-server service has actually stopped, run the PowerShell Get-services command on the Windows worker node.

To verify that other processes report the ovsdb-server service is still running:

Review the ovsdb-server job-service-wrapper.err.log log file.
The job-service-wrapper.err.log log file is located at:
```
C:\var\vcap\sys\log\openvswitch-windows\ovsdb-server\job-service-wrapper.err.log
```

Confirm that after the flushing processes, the log includes an error similar to the following:

Pid-Guard : ovsdb-server is already runing, please stop it first
At C:\var\vcap\jobs\openvswitch-windows\bin\ovsdb-server_ctl.ps1:30 char:5
+     Pid-Guard $PIDFILE "ovsdb-server"
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: ( [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Pid-Guard

To verify the root cause:

Run the following PowerShell commands on the Windows worker node:

$RUN_DIR = "C:\var\vcap\sys\run\openvswitch-windows"
$PIDFILE = "$RUN_DIR\ovsdb-server.pid"
$pid1 = Get-Content $PidFile -First 1
echo $pid1
$rst = Get-Process -Id $pid1 -ErrorAction SilentlyContinue
echo $rst

Confirm the returned ProcessName is not ovsdb-server.

Workaround

To resolve this issue for a single Windows worker:

SSH to the affected worker node.

Run the following:

rm C:\var\vcap\sys\run\openvswitch-windows\ovsdb-server.pid

Wait for the ovsdb-server process to start.
Confirm the dependent services also start.

Harbor Private Projects Are Inaccessible after Upgrading to TKGI v1.13.0

If LDAP is enabled, Harbor private projects are inaccessible after upgrading to TKGI v1.13.0. For more information, see Private projects become inaccessible after upgrading Harbor for TKGI to v2.4.x with LDAP feature enabled in the VMware Tanzu Knowledge Base.

Deployments Fail on TKGI Windows Worker-based Kubernetes Clusters after the January 2022 Microsoft Windows Security Patch

Microsoft changed Microsoft Windows’ support for tar file commands in the January 2022 Microsoft Windows security patch.

Packaging scripts that use tar commands for Windows worker-based Kubernetes Cluster deployments can fail after the Microsoft tar command patch update has been applied.

The BOSH agent used by vSphere stemcells built by stembuild v2019.43 and earlier use tar commands that are no longer supported and will fail if the Microsoft Windows security patch has been applied.

Workaround

Stembuild v2019.44 and later include a version of the BOSH agent that does not use unsupported tar commands.

If you use vSphere stemcells, use Stembuild 2019.44 or later to avoid the BOSH agent tar error.

Automatic vSphere CSI Driver Integration Ignores Proxy Configuration

This issue is fixed in TKGI v1.14.1.

In an environment configured with a proxy, automatic vSphere CSI Driver Integration ignores the configured proxy and fails.

Symptom

In environments configured with a Proxy and automatic vSphere CSI Driver Integration enabled, the csi-controller and csi-syncer services log errors similar to the following and fail:

"caller":"vsphere/virtualcenter.go:154","msg":"failed to create new client with err: Post \"...\": dial tcp: lookup ...: no such host"

Workaround

To configure the vSphere CSI Driver to use your proxy:

SSH to TKGI Control Plane VM.
Open the /var/vcap/jobs/csi-controller/bin/csi_controller_ctl configuration file for editing.

Locate the start_csi_controller function in the configuration file:

start_csi_controller()
{
    ...
    export ...
    export ...
    export ...
    ...
}

Locate the export commands in the start_csi_controller function and add the following to the group of export commands:
```
export HTTP_PROXY=HTTP-PROXY
export HTTPS_PROXY=HTTPS-PROXY
export NO_PROXY=NO-PROXY
```
Where:
- HTTP-PROXY is the HTTP Proxy that the vSphere CSI Driver must use.
- HTTPS-PROXY is the HTTPS Proxy that the vSphere CSI Driver must use.
- NO-PROXY is a list of host names that the vSphere CSI Driver should not use a proxy for.
Restart the csi_controller:
```
sudo monit restart csi_controller
```

Fluent Bit Does Not Merge Containerd Runtime Cluster Multi-Line Entries

This issue is fixed in TKGI v1.14.3.

The Fluent Bit Docker, CRI, Go, Java, and Python multi-line parser does not merge containerd runtime cluster log entries belonging to the same context into a single log entry.

Windows Pause Image Not Pulled from Private Harbor Registry

This issue is fixed in TKGI v1.14.1.

Symptom

Your Windows clusters using the containerd container runtime in an air-gapped environment always attempt to pull the Windows pause image from projects-stg.registry.vmware.com/v2/tkg/pause/manifests/3.4.1-windows-amd64 instead of your designated Harbor registry.

Errors similar to the following are logged when the cluster fails to pull the Windows pause image:

Warning FailedCreatePodSandBox 4s (x3 over 75s) kubelet Failed to create pod ...:
rpc error: code = Unknown desc = failed to get ... image "projects-stg.registry.vmware.com/tkg/pause:3.4.1-windows-amd64":
failed to pull image "projects-stg.registry.vmware.com/tkg/pause:3.4.1-windows-amd64":
failed to pull and unpack image "projects-stg.registry.vmware.com/tkg/pause:3.4.1-windows-amd64":
failed to resolve reference "projects-stg.registry.vmware.com/tkg/pause:3.4.1-windows-amd64":
failed to do request: Head "https://projects-stg.registry.vmware.com/v2/tkg/pause/manifests/3.4.1-windows-amd64":
dial tcp ...: connectex: A connection attempt failed because the connected party did not properly respond after a period of time,
or established connection failed because connected host has failed to respond.

Workaround

You can work around this issue using your existing container runtime or swapping container runtimes.

For each Windows Worker:

To continue using the containerd container runtime in your cluster:
1. Manually pull the desired Windows image to the worker.
2. Tag the image: projects-stg.registry.vmware.com/v2/tkg/pause/manifests/3.4.1-windows-amd64.
To switch to the Docker container runtime to resolve this issue:
1. Complete the steps in Switch a Cluster to a Different Container Runtime in Upgrade Preparation Checklist for Tanzu Kubernetes Grid Integrated Edition.

Docker Commands No Longer Work on Worker VMs

This issue is fixed in TKGI v1.14.1.

Clusters using the Docker container runtime no longer default to supporting Docker command line commands.

Supporting containerd as the default container runtime requires Docker-specific features incompatible with containerd are deactivated by default.

Workaround

In an environment where you want to run Docker commands, complete one of the following:

Export the Docker environments variables before using Docker commands:

source /var/vcap/jobs/docker/bin/envrc

For example:

source /var/vcap/jobs/docker/bin/envrc

docker images
REPOSITORY                                                                   TAG                                        IMAGE ID       CREATED         SIZE
...                                                                          1a3337bb81890b6bb1848b5dd4565dfa5d124f38   ffb57751a939   3 months ago    182MB

Use absolute Docker paths when referencing Docker:

/var/vcap/packages/docker/bin/docker --host unix:///var/vcap/sys/run/docker/docker.sock

For example:

/var/vcap/packages/docker/bin/docker --host unix:///var/vcap/sys/run/docker/docker.sock images
REPOSITORY                                                                   TAG                                        IMAGE ID       CREATED         SIZE
...                                                                          1a3337bb81890b6bb1848b5dd4565dfa5d124f38   ffb57751a939   3 months ago    182MB

Custom BOSH vm_extensions Configuration Settings Applied Inconsistently to Cluster VMs

This issue is fixed in TKGI v1.14.1.

A vm_extensions configuration will not be applied consistently to the VMs in a cluster if the initial application of the configuration was interrupted by an error.

Explanation

A cluster will have an inconsistency between its cloud-config and BOSH manifest if the initial application of a vm_extensions configuration is interrupted and fails. The inconsistency results because BOSH rolls back manifest changes after an error occurs but there is not a corresponding function that rolls back the cluster cloud-config changes.

Attempting to re-apply the vm_extensions configuration to the cluster has no effect because the cluster cloud-config indicates the cluster has already been updated.

Workaround

To export the cloud-config of the cluster with vm_extensions configuration inconsistencies:

bosh configs --type=cloud --name=BOSH-DEPLOYMENT > YAML-FILENAME

Where:

BOSH-DEPLOYMENT is the service-instance_-prefixed BOSH deployment name.
YAML-FILENAME is the output YAML file name.

For example:

 bosh config –type=cloud –name=service-instance_cdd7f46b-3151-4d67-85a6-5531fa456b71 > service-instance_cdd7f46b-3151-4d67-85a6-5531fa456b71-cloud-config.yaml

(Optional) Create a backup of the cloud-config you just exported:

For example:

 cp service-instance_cdd7f46b-3151-4d67-85a6-5531fa456b71-cloud-config.yaml service-instance_cdd7f46b-3151-4d67-85a6-5531fa456b71-cloud-config-withextension.yaml

Open the exported cloud-config in an editor.
Remove the vm_extensions added by the interrupted vm_extensions configuration that are tagged with “master-vmext” and “worker-vmext”.

You do not need to delete the default configuration tagged with “master-nsgroup” that was also added by the interrupted vm_extensions configuration.
Save your modified exported cloud-config.
Locate the UAA credentials for the environment. See Credentials > uaa_client_credentials in the Tanzu Kubernetes Grid Integrated Edition Tile.

Export BOSH_CLIENT and BOSH_CLIENT_SECRET using the UAA credentials collected above.

For example:

export BOSH_CLIENT=pivotal-container-service-5531fa456b7185a646bc export BOSH_CLIENT_SECRET=‘4d6785a65531fa456c’

To update cluster specific cloud-config:

bosh update-config --type=cloud --name=BOSH-DEPLOYMENT YAML-FILENAME

Where:

BOSH-DEPLOYMENT is the service-instance_-prefixed BOSH deployment name.
YAML-FILENAME is your modified exported cloud-config YAML file name.

For example:

bosh update-config –type=cloud –name=service-instance_cdd7f46b-3151-4d67-85a6-5531fa456b71 service-instance_cdd7f46b-3151-4d67-85a6-5531fa456b71-cloud-config.yaml

To update the cluster recreate VMs with the correct manifest:
```
tkgi update-cluster CLUSTER-NAME --config-file CONFIG-FILENAME
```
Where:
- CLUSTER-NAME is the name of your cluster.
- CONFIG-FILENAME is the filename of your vm_extensions configuration file.
For example:
```
tkgi update-cluster pks-cluster-55 --config-file bosh-extension.json
```

Switching Your Default CNI to Antrea is Not Supported

This issue is fixed in TKGI v1.14.3.

You cannot switch your default CNI from Flannel to Antrea during TKGI upgrade if TKGI is running on Ops Manager v2.10.40 or later.

VMDKs Are Deleted during Migration from In-Tree Storage to CSI

This issue is fixed in TKGI v1.14.1.

While migrating a cluster from using the In-Tree vSphere Storage Driver to the automatically installed vSphere CSI Driver, VMDKs attached to worker VMs might be deleted.

For more information, see vsphere-csi-driver the vSphere CSI Driver Release Notes in GitHub.

Healthwatch Unable to Scrape Kube Scheduler Metrics

This issue is fixed in TKGI v1.14.2.

Symptom

After upgrading to TKGI v1.14, Healthwatch fails to scrape Kube Scheduler metrics and logs errors similar to the following:

caller=scrape.go:1294 level=debug component="scrape manager" scrape_pool=tkgi-cluster-scheduler target=... msg="Scrape failed" err="server returned HTTP status 403 Forbidden"

Explanation

Changes in Kubernetes 1.23 require the addition of new parameters in order to make the Kube Scheduler metrics endpoint available externally. The Kube Scheduler process also uses a different port than previous TKGI versions. Healthwatch is unable to scrape Kube Scheduler metrics using the existing default Healthwatch configuration.

Workaround

On each Kubernetes cluster Control Plane node that you want to monitor:

Make backup copies of the /var/vcap/jobs/kube-scheduler/config/bpm.yml and /var/vcap/jobs/metrics-sink/config/telegraf.conf configuration files.
Edit the /var/vcap/jobs/kube-scheduler/config/bpm.yml YAML file.

Add the following to the args: section in the configuration file:

"--authentication-kubeconfig=/var/vcap/jobs/kube-scheduler/config/kubeconfig"
"--authorization-kubeconfig=/var/vcap/jobs/kube-scheduler/config/kubeconfig"

Save your changes.
Edit the /var/vcap/jobs/metrics-sink/config/telegraf.conf configuration file.

Add a new Kube Scheduler metrics scraper under the existing inputs section in the configuration file:

## Kube Scheduler Metrics
[[inputs.prometheus]]
urls = ["
[https://localhost:10259/metrics]
"]
metric_version = 2
insecure_skip_verify = true
tls_cert = "/var/vcap/jobs/metric-sink/config/monitoring_metric_cert.pem"
tls_key = "/var/vcap/jobs/metric-sink/config/monitoring_metric_cert.key"

Save your changes.
In Healthwatch’s Grafana, import the modified “Kubernetes Scheduler” dashboard configuration file.
In Healthwatch’s Grafana, import the modified “Kubernetes Cluster Detail” dashboard configuration file.

If BOSH later updates the Kubernetes Cluster Control Plane, you must reconfigure the kube-scheduler bpm.yml and the metrics-sink telegraf.conf file for every master node in each cluster.

Cluster Workloads Experience Downtime While Upgrading and Switching Container Runtimes

The workloads on a cluster will experience a period of downtime if the cluster runtime is automatically switched from Docker to containerd during a TKGI upgrade.

Explanation

By default, clusters are switched from the Docker container runtime to containerd while upgrading TKGI to a newer TKGI version.

If a TKGI upgrade switches a cluster container runtime, the workloads on that cluster will experience a period of downtime.

Administrators have the option to switch container runtimes manually before upgrading TKGI. Administrators can also tag a cluster to not switch container runtimes automatically during a TKGI upgrade.

Workaround

To avoid workload downtime, manually switch the cluster’s container runtime to containerd before upgrading to TKGI v1.14 or configure the cluster so that the TKGI upgrade does not switch the cluster’s container runtime.

If you must allow the container runtime switch to occur during a TKGI upgrade, avoid workload downtime by completing the following steps before upgrading to TKGI v1.14:

Pull down the coreDNS image used in your TKGI v1.13 Kubernetes clusters.
- If you are upgrading TKGI from v1.13.2 through v1.13.6 to TKGI v1.14.0 and later:
```
docker pull projects.registry.vmware.com/tkg/coredns:v1.8.4_vmware.7
```
- If you are upgrading from TKGI v1.13.0 or v1.13.1 to TKGI v1.14.0 and later:
  1. SSH to one of the worker nodes.
  2. Use the following alias docker command:
```
alias docker="/var/vcap/packages/docker/bin/docker --host unix:///var/vcap/sys/run/docker/docker.sock"
```
  3. Export the coreDNS image using docker save.
Upload the coreDNS image to your custom image registry.
Modify the coreDNS deployment to use the uploaded image. Repeat this step for all of the TKGI Kubernetes clusters with workloads that should remain up during the upgrade.
Upgrade TKGI from v1.13 to TKGI v1.14.

During worker upgrades the coreDNS image in the custom image registry is available, avoiding coreDNS downtime. After upgrading a cluster, the workers use the coreDNS version installed by the TKGI upgrade.

The Docker Image Remains after Switching to the Containerd Container Runtime

This issue is fixed in TKGI v1.14.2.

The Docker image is not automatically removed after successfully switching a cluster from the Docker container runtime to containerd.

Workaround

To remove a Docker image after switching a cluster’s container runtime:

Manually remove the cluster’s /var/vcap/store/docker directory.

vROPs cAdvisor Daemonset Is Not Running after Upgrading TKGI

This issue is fixed in TKGI v1.14.2.

After upgrading TKGI to TKGI v1.14.0, the vROPs cAdvisor daemonset cannot be deployed on clusters with Pod Security Policy (PSP) enabled.

Symptom

The vROPs cAdvisor daemonset does not deploy on a cluster and errors similar to the following are logged:

Warning FailedCreate... daemonset-controller Error creating: pods "..." is forbidden: PodSecurityPolicy: unable to admit pod:

Workaround

To restart vROPs cAdvisor daemonset:

Open the cluster’s PSP configuration in an editor.

Configure privileged: true and allowPrivilegeEscalation: true in the spec settings.

For example:

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: a-vrops-psp
spec:
  privileged: true
  allowPrivilegeEscalation: true
....

Save your edits and update the cluster with the revised configuration file.

For more information on configuring PSP, see Enabling and Configuring Pod Security Policies.

Timeout While Switching Container Runtimes If the Docker Directory Is Too Large

This issue is fixed in TKGI v1.14.3.

Switching a cluster’s container runtime from Docker to containerd can timeout and fail if the Docker directory contains many files.

Description

A timeout occurs during the remove Docker step while switching a cluster’s container runtime from Docker to containerd if the Docker directory contains too many files to delete within the 180-second timeout interval.

Workaround

To work around this issue:

Manually remove the /var/vcap/store/docker directory.
Re-start the process that stopped due to the time out.

Persistent Volumes Fail to Detach from Nodes

This issue is fixed in TKGI v1.14.3.

If a Pod is recreated in a new instance node, the persistent volume might remain attached to the old node.

Symptom

A persistent volume remains attached to an old node, and attachment errors similar to the following are logged:

Warning FailedMount... kubelet Unable to attach or mount volumes: unmounted volumes=..., unattached volumes=...: timed out waiting for the condition
Warning FailedMount... kubelet Unable to attach or mount volumes: unmounted volumes=..., unattached volumes=...: timed out waiting for the condition
Warning FailedAttachVolume... attachdetach-controller AttachVolume.Attach failed for volume...: 
rpc error: code = Internal desc = failed to attach disk:... with node:... err failed to attach cns volume:... to node vm:.... 
fault: "(*types.LocalizedMethodFault)(0xc000c88d80)({\n DynamicData: (types.DynamicData)

For more information, see Persistent volume fails to be detached from a node in VMware vSphere Container Storage Plug-in 2.5 Release Notes.

Pods on Clusters Using the containerd-Runtime Enter a CrashLoopBackOff State

This issue is fixed in TKGI v1.14.3.

Pods in a cluster that has been switched from the Docker container runtime to containerd might enter a CrashLoopBackOff state. If the container runtime switch is part of a cluster upgrade, the upgrade halts.

Symptom

The Pods that have entered the CrashLoopBackOff state log the following:

Warning FailedCreatePodSandBox... Failed to create pod sandbox: rpc error: code = 
Unknown desc = failed to create containerd task: failed to start shim: 
write /var/vcap/sys/run/containerd/io.containerd.runtime.v2.task/.../config.json: 
no space left on device: unknown

The /var/vcap/data/sys/run directory on the instance node with Pods that have entered the CrashLoopBackOff state is full.

CSI Driver Image Missing After High Disk Utilization

This issue is fixed in TKGI v1.14.5.

The CSI driver might be missing on a worker node in an air-gapped environment.

Explanation

A garbage collection event is triggered when a worker node experiences low available disk capacity. To increase storage capacity, the garbage collector deletes unused image files, deleting the CSI driver image.

Usually, a worker node automatically pulls a replacement CSI driver image if the image is missing, but in an air-gapped environment, it cannot.

‘Input not an X.509 certificate’ When Applying Change on the TKGI Tile

This issue is fixed in TKGI v1.14.5.

The TKGI tile might report an error similar to the following when Applying Changes with a correctly formatted certificate.

Setting up key store, trust store and installing certs.
keytool error: java.lang.Exception: Input not an X.509 certificate
pre-start.stdout.log

Explanation

The certificate contains one or more certificate keywords, for example, BEGIN or END, and does not validate.

TKGI Clusters Fail after NSX Upgrade If They Use NSGroup Policy API Resources

TKGI supports clusters that use NSGroup Policy API resources, but Policy API NSGroups created in one NSX version will be empty after upgrading NSX to a newer version.

Workaround

BOSH reconfigures a deployment’s NSGroup members if the deployment is redeployed.

After upgrading NSX, redeploy affected deployments to reconfigure their NSGroup members:

Re-Apply Changes on the Ops Manager UI to redeploy TKGI tile deployments.
Re-deploy the affected cluster deployments.

HTTPS Ingress Outage During VMware NSX Certificate Rotation

This issue is fixed in TKGI v1.14.7.

When the TKGI CLI rotates VMware NSX certificates, HTTPS Ingress with customer-defined TLS experiences a brief outage.

Explanation

When the TKGI CLI rotates VMware NSX certificates, it updates the default certificate IDs in the HTTPS Load Balancer (LB) virtual server, and removes the server name indicator (SNI) certificate IDs on the LB Virtual Server. This causes a brief outage of HTTPS Ingress with customer-defined TLS. After the certificate rotation, NSX Container Plugin (NCP) restarts and resets the removed SNI.

Limitations on Using the VMware vSphere CSI Driver

The VMware vSphere CSI Driver supports a limited set of VMware vSphere features. Before enabling the vSphere CSI Driver on a TKGI cluster, confirm the cluster and storage configuration are supported by the driver. For more information, see Unsupported Features and Limitations in Deploying and Managing Cloud Native Storage (CNS) on vSphere.

Limitations on Using a Public Cloud CSI Driver

If you enable a public cloud CSI Driver on a TKGI cluster, you must take additional steps before deleting，upgrading, or updating the cluster.

Deleting a Cluster on a Public Cloud

When deleting a cluster that uses a public cloud CSI Driver:

Manually delete the workload PVCs and PVs before deleting the cluster.
Delete the cluster. For more information on deleting clusters, see Deleting Clusters.

Upgrading a Cluster on a Public Cloud

When upgrading a cluster that uses a public cloud CSI Driver:

No preparation steps are needed when upgrading a multi-worker node cluster.
To prepare a single-worker node cluster for upgrading:
1. Resize the cluster to two or more worker nodes before upgrading the cluster. For more information, see Scaling Existing Clusters.
2. Upgrade the cluster. For more information on upgrading clusters, see Upgrading Clusters.

Updating a Cluster on a Public Cloud

When updating a cluster that uses a public cloud CSI Driver:

No preparation step are needed when updating a multi-worker node cluster.
To prepare a single-worker node cluster for updating:
1. Resize the cluster to two or more worker nodes before updating the cluster. For more information, see Scaling Existing Clusters.
2. Update the cluster.

Kubernetes API Server and etcd Daemon Occasionally Fail to Start During BBR Restore

The Kubernetes API server or the etcd daemon on a cluster control plane node might not start during a BBR restore, stopping the restore.

Symptom

During a BBR restore, the post-restore-unlock script occasionally times out while starting the etcd daemon or Kubernetes API server.

For example, the post-restore-unlock script shows the following when the etcd daemon fails to start:

Error attempting to run post-restore-unlock for job bbr-etcd on master...
+ NAME=post-restore-unlock
+ LOG_DIR=/var/vcap/sys/log/bbr-etcd
+ exec
++ tee -a /var/vcap/sys/log/bbr-etcd/post-restore-unlock.stdout.log
...
monit has started etcd
+ timeout 1200 /bin/bash
waiting for etcd daemon to start
Process 'etcd'     not monitored - start pending
...
waiting for etcd daemon to start
Process 'etcd'     initializing
etcd daemon was unable to start after 1200 seconds
+ exit 1 - exit code 1

Workaround

Restart the BBR restore if the Kubernetes API server or the etcd daemon fails to start.

Rotated TKGI Certificates Remain Listed as Expiring on the Ops Manager Certificates List

This issue is fixed in TKGI v1.14.7.

After rotating certificates, the Ops Manager list of certificates shows the pks_api_internal_2018certificate on each cluster remains expiring on the original expiration date.

Explanation

The Ops Manager list of certificates is displaying stale data for pks_api_internal_2018 certificates.

TKGI Certificate Rotation Might Remove NSX Ingress Certificates from TKGI

This issue is fixed in TKGI v1.14.7.

When rotating certificates using the tkgi rotate-certificates --only-nsx command, TKGI certificate rotation might remove the TKGI certificates used for ingress from the NSX virtual server. For example, rotate-certificates --only-nsx might remove the NSX Load Balancer certificate or the NSX Manager Superuser Principal Identity certificate from TKGI when this issue occurs.

Explanation

When tkgi rotate-certificates --only-nsx rotates the NSX ingress certificates, rotate-certificates removes and then replaces the certificates. After removing the ingress certificates, rotate-certificates relies on NCP for a list of the NSX ingress certificates to restore, but NCP v3.2.1 provides an incomplete list of certificates.

This occurs because NCP v3.2.1 does not collate the paginated certificate list returned by the NSX API and might provide tkgi rotate-certificates with an incomplete list of certificates:

On NSX-T v3.2: NCP returns a maximum of 50 NSX-T certificates.
On NSX-T v3.1.3: NCP returns a maximum of 1000 NSX-T certificates.

Workaround

To manually rotate an admin-defined NSX-T certificate that is not rotated using the tkgi rotate-certificates --only-nsx command:

Delete the TLS secret.
Recreate the TLS secret.

The Validator Secret Certificate Is Not Rotated

This issue is fixed in TKGI v1.14.7.

The certificates signed by pks-ca, for example, the pks-system namespace validator, event-controller, and fluent-bit secret certificates, are not rotated by running tkgi rotate-certificates and are not automatically rotated during cluster upgrades.

Workaround

To rotate the certificates signed by pks-ca:

Delete the event-controller, fluent-bit, and pks-system namespace validator secrets.
If you also want to rotate the pks-ca certificate, delete the pks-ca secret.
To generate a new pks-ca certificate and or leaf certificates, apply the cert-generator job:
1. Backup the cert-generator job as yaml.
2. Delete the cert-generator job.
3. Apply the backup cert-generator yaml.
4. Restart event-controller, fluent-bit, and the pks-system namespace validator.

TKGI Management Console v1.14.0

Release Date: May 12, 2022

Product Snapshot

Element	Details
Version	v1.14.0
Release date	May 12, 2022
Installed Tanzu Kubernetes Grid Integrated Edition version	v1.14.0
Installed Ops Manager version	v2.10.39	Release Notes
Component	Version
Installed Kubernetes version	v1.23.4*	Release Notes
Installed Harbor Registry version	v2.5.0*	Release Notes
Linux stemcell	v621.236*

* Components marked with an asterisk have been updated.

Upgrade Path

The supported upgrade path to Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.0 is from TKGI Management Console v1.13.2, v1.13.1, and v1.13.0.

Features and Resolved Issues

TKGI Management Console v1.14.0 includes the following features and enhancements:

Supports configuring the TKGI API Operation Timeout length.
Configure the TKGI API Operation Timeout by modifying the nsx_feign_client_read_timeout property in the TKGI MC Configuration File. For more information, see Generate Configuration File and Deploy Tanzu Kubernetes Grid Integrated Edition in Deploy TKGI by Using the Configuration Wizard.
Fixes TKGI MC Validation Error If the Deployment DNS IP Address Is Reconfigured after Installing TKGI in a BYOT Environment.
Fixes TKGI MC Does Not Support Multi-Data Center Environments.
Fixes TKGI MC Does Not Support Single-Edge Node Configurations in Automated NAT Mode.

Deprecations

The following TKGI features have been deprecated or removed from TKGI Management Console v1.14:

Known Issues

The Tanzu Kubernetes Grid Integrated Edition Management Console v1.14.0 has the following known issues:

vRealize Log Insight Integration Does Not Support HTTPS Connections

Symptom

The Tanzu Kubernetes Grid Integrated Edition Management Console integration to vRealize Log Insight does not support connections to the HTTPS port on the vRealize Log Insight server.

Workaround

Use SSH to log in to the Tanzu Kubernetes Grid Integrated Edition Management Console appliance VM.
Open the file /lib/systemd/system/pks-loginsight.service in a text editor.
Add -e LOG_SERVER_ENABLE_SSL_VERIFY=false.

Set -e LOG_SERVER_USE_SSL=true.

The resulting file should look like the following example:

ExecStart=/bin/docker run --privileged --restart=always --network=pks
-v /var/log/journal:/var/log/journal
--name=pks-loginsight
-e TYPE=gear2-vm
-e LOG_SERVER_HOST=${LOGINSIGHT_HOST}
-e LOG_SERVER_PORT=${LOGINSIGHT_PORT}
-e LOG_SERVER_ENABLE_SSL_VERIFY=false
-e LOG_SERVER_USE_SSL=true
-e LOG_SERVER_AGENT_ID=${LOGINSIGHT_ID}
pksoctopus/vrli-journald:v07092019

Save the file and run systemctl daemon-reload.
To restart the vRealize Log Insight service, run systemctl restart pks-loginsight.service.

Tanzu Kubernetes Grid Integrated Edition Management Console can now send logs to the HTTPS port on the vRealize Log Insight server.

vSphere HA causes Management Console ovfenv Data Corruption

Symptom

If you enable vSphere HA on a cluster, if the TKGI Management Console appliance VM is running on a host in that cluster, and if the host reboots, vSphere HA recreates a new TKGI Management Console appliance VM on another host in the cluster. Due to an issue with vSphere HA, the ovfenv data for the newly created appliance VM is corrupted and the new appliance VM does not boot up with the correct network configuration.

Workaround

In the vSphere Client, right-click the appliance VM and select Power > Shut Down Guest OS.
Right-click the appliance again and select Edit Settings.
Select VM Options and click OK.
Verify under Recent Tasks that a Reconfigure virtual machine task has run on the appliance VM.
Power on the appliance VM.

Base64 encoded file arguments are not decoded in Kubernetes profiles

Symptom

Some file arguments in Kubernetes profiles are base64 encoded. When the management console displays the Kubernetes profile, some file arguments are not decoded.

Workaround

Run echo "$content" | base64 --decode

Network profiles not immediately selectable

Symptom

If you create network profiles and then try to apply them in the Create Cluster page, the new profiles are not available for selection.

Workaround

Log out of the management console and log back in again.

Real-Time IP information not displayed for network profiles

Symptom

In the cluster summary page, only default IP pool, pod IP block, node IP block values are displayed, rather than the real-time values from the associated network profile.

Workaround

None

Error After Modifying Your Harbor Storage Configuration

Symptom

You receive the following error after modifying your existing Harbor installation’s storage configuration:

Error response from daemon: manifest for ... not found: manifest unknown: manifest unknown

Explanation

Harbor does not support modifying an existing Harbor installation’s storage configuration.

Workaround

To modify your Harbor storage configuration, re-install Harbor. Before starting Harbor, configure the new Harbor installation with the desired configuration.

Windows Stemcells Must be Re-Imported After Upgrading Ops Manager

Symptom

After upgrading Ops Manager, your Management Console does not recognize a Windows stemcell imported when using the prior version of Ops Manager.

Workaround

If your Management Console does not recognize a Windows stemcell after upgrading Ops Manager:

Re-import your previously imported Windows stemcell.
Apply Changes to TKGI MC.

Your New Clusters Are Not Shown In Tanzu Mission Control

Symptom

After you create a cluster, Tanzu Mission Control does not include the cluster in cluster lists. You have a “Resource not found” error similar to the following in your BOSH logs:

Cluster Name in TMC: cluster-1
Cluster Name Prefix: tkgi-my-prefix-
Group Name in TMC: my-prefix-clusters
Cluster Description in TMC: VMware Enterprise PKS Attaching cluster ''tkgi-my-prefix-cluster-1'' to TMC
Fetching token successful
request POST:/v1alpha1/clusters,
response 404 Not Found:{"error":"Resource not found - clustergroup(my-prefix-clusters)
org id(d859dc9f-g622-426d-8c91-939a9f13dea9)",
"code":5,"message":"Resource not found - clustergroup(my-prefix-clusters)

Explanation

The cluster group you assign a cluster to must be defined in Tanzu Mission Control before you assign your cluster to the cluster group in the TKGI Management Console.

Workaround

To resolve the problem, complete the steps in Attaching a Tanzu Kubernetes Grid Integrated (TKGI) cluster to Tanzu Mission Control (TMC) fails with “Resource not found - clustergroup(cluster-group-name)” in the VMware Tanzu Knowledge Base.

Previous nsx-t-superuser-certificate Is Restored during TKGI MC Upgrade

This issue is fixed in TKGI MC v1.14.7.

Upgrading the TKGI MC after rotating the nsx-t-superuser-certificate certificate restores the previous nsx-t-superuser-certificate certificate. For example, this issue occurs if you upgrade TKGI MC after following the steps in How to renew the nsx-t-superuser-certificate used by Principal Identity user (80355).

TKGI MC Unable to Create a Network Profile Configured with Source IP Ingress Persistence

This issue is fixed in TKGI MC v1.14.7.

The TKGI MC halts and returns the following error when creating a network profile which includes an ingress_persistence_settings.persistence_type configuration:

Failed to save network profile. ingress_persistence_settings.persistence_type in body should be one of [none cookie]