What's in the Release Notes
Updated on: April 06, 2021
The release notes cover the following topics:
What's New
VMware vSphere with Tanzu has monthly patches to introduce new features and capabilities, provide updates to Kubernetes and other services, keep up with upstream, and to resolve reported issues. Here we document what each monthly patch delivers.
- What's New March 09, 2021
- What's New February 02, 2021
- What's New December 17, 2020
- What's New October 6, 2020
- What's New August 25, 2020
- What's New July 30, 2020
- What's New June 23, 2020
- What's New May 19, 2020
What's New March 09, 2021
March 09, 2021 Build Information ESXi 7.0 | 09 MAR 2020 | ISO Build 17630552 vCenter Server 7.0 | 09 MAR 2021 | ISO Build 17694817 VMware NSX Advanced Load Balancer | 12 OCT 2020 | 20.1.X |
New Features
- Supervisor Cluster
- Support of NSX Advanced Load Balancer for a Supervisor Cluster configured with VDS networking - You can now enable a Supervisor Cluster with NSX Advanced Load Balancer (Avi Networks) for L4 load balancing, as well as load balancing for the control plane nodes of Supervisor and Tanzu Kubernetes clusters. Checkout the documentation page for guidance on configuring the NSX Advanced Load Balancer.
- Upgrade of the Supervisor Cluster to Kubernetes 1.19 with auto-upgrade of a Supervisor Cluster running Kubernetes 1.16 - You can upgrade the Supervisor Cluster to Kubernetes 1.19. With this update, the following Supervisor Cluster versions are supported: 1.19, 1.18, and 1.17. Supervisor Clusters running Kubernetes 1.16 will be automatically upgraded to 1.17 once vCenter Server is updated. This will ensure all your Supervisor Clusters are running with the supported version of Kubernetes.
- Expansion of PersistentVolumeClaims (PVCs) - You can now expand existing volumes by modifying the PersistentVolumeClaim object, even when the volume is in active use. This applies to volumes in the Supervisor Cluster and Tanzu Kubernetes clusters.
- Management of Supervisor Cluster lifecycle using vSphere Lifecycle Manager – For Supervisor Clusters configured with NSX-T networking, you can use vSphere Lifecycle Manager for infrastructure configuration and lifecycle management.
- Tanzu Kubernetes Grid Service for vSphere
- Support for private container registries – vSphere administrators and Kubernetes platform operators can now define additional Certificate Authority certificates (CAs) to use in Tanzu Kubernetes clusters for trusting private container registries. This feature enables Tanzu Kubernetes clusters to pull container images from container registries that have enterprise or self-signed certificates. You can configure private CAs as a default for Tanzu Kubernetes clusters on a Supervisor Cluster-wide basis or per-Tanzu Kubernetes Cluster. Read more about how to configure support for private container registries to Tanzu Kubernetes clusters by visiting the documentation page.
- User-defined IPs for Service type: LoadBalancer with NSX-T and NSX Advanced Load Balancer – Kubernetes application operators can now provide a user-defined LoadBalancerIP when configuring a Service type: LoadBalancer allowing for a static IP endpoint for the service. This advanced feature requires either NSX-T load balancing or the NSX Advanced Load Balancer with the Supervisor Cluster. Learn how to configure this feature by visiting the documentation page.
- ExternalTrafficPolicy and LoadBalancerSourceRanges for Service type: LoadBalancer with NSX-T – Kubernetes application operators can now configure the ExternalTrafficPolicy of 'local' for Services to propagate client IP address to the end pods. You also can define loadBalancerSourceRanges for Services to restrict which client IPs can access the load balanced service. These two advanced features require NSX-T load balancing with the Supervisor Cluster.
- Kubernetes version management and indications – You can now use kubectl to inspect the compatibility of TanzuKubernetesReleases with the underlying Supervisor Cluster environment. Tanzu Kubernetes clusters now also indicate whether there is a Kubernetes upgrade available and recommend the next TanzuKubernetesRelease(s) to use. For more information on using this new feature, see the documentation page.
- Improved Cluster Status at a Glance – In a previous release, VMware expanded WCPCluster and WCPMachine CRDs by implementing conditional status reporting to surface common problems and errors. With vSphere 7.0 Update 2 release, we enhanced TanzuKubernetesCluster CRDs to summarize conditional status reporting for subsystem components, supplying immediate answers and fine-grained guidance to help you investigate issues. Learn how to configure this feature by visitng the documentation page.
- Per-Tanzu Kubernetes cluster HTTP Proxy Configuration – You can now define the HTTP/HTTPS Proxy configuration on a per-Tanzu Kubernetes cluster basis or, alternately, define it on a Supervisor Cluster-wide through a default configuration. For information on configuring this feature, see the documentation page.
- Support for Tanzu Kubernetes Grid Extensions – In-cluster extensions are now fully supported on Tanzu Kubernetes Grid Service, including Fluent Bit, Contour, Prometheus, AlertManager, and Grafana.
Update Considerations for Tanzu Kubernetes Clusters
The vSphere 7.0 Update 2 release includes functionality that automatically upgrades the Supervisor Cluster when vCenter Server is updated. If you have Tanzu Kubernetes clusters provisioned in your environment, read Knowledge Base Article 82592 before upgrading to vCenter Server 7.0 Update 2. The article provides guidance on running a pre-check to determine whether any Tanzu Kubernetes cluster will become incompatible after the Supervisor Cluster is auto-upgraded.
Resolved Issues
- The embedded container registry SSL certificate is not copied to Tanzu Kubernetes cluster nodes
- When the embedded container registry is enabled for a Supervisor Cluster, the Harbor SSL certificate is not included in any Tanzu Kubernetes cluster nodes created on that SC, and you cannot connect to the registry from those nodes.
- Post upgrade from Tanzu Kubernetes Grid 1.16.8 to 1.17.4, the “guest-cluster-auth-svc” pod on one of the control plane nodes is stuck at “Container Creating” state
- After updating a Tanzu Kubernetes Cluster from Tanzu Kubernetes Grid Service 1.16.8 to 1.17.4, the "guest-cluster-auth-svc" pod on one of the cluster control plane nodes is stuck at "Container Creating" state
- User is unable to manage existing pods on a Tanzu Kubernetes cluster during or after performing a cluster update
- User is unable to manage existing pods on a Tanzu Kubernetes cluster during or after performing a cluster update.
- Tanzu Kubernetes cluster Upgrade Job fails with “timed out waiting for etcd health check to pass.”
- The upgrade job in the vmware-system-tkg namespace associated with the upgrade of a Tanzu Kubernetes cluster fails with the following error message "timed out waiting for etcd health check to pass." The issue is caused by the missing PodIP addresses for the etcd pods.
- Antrea CNI not supported in current TKC version
-
While provisioning a Tanzu Kubernetes cluster, you receive the error "Antrea CNI not supported in current TKC version."
Option 1 (recommended): Update the Tanzu Kubernetes cluster to use the OVA version that supports Antrea (v1.17.8 or later).
Option 2: In the Tanzu Kubernetes cluster specification YAML, enter "calico" in the spec.settings.network.cni section.
Option 3: Change the default CNI to Calico. Refer to the topic in the documentation on how to do this.
-
What's New February 02, 2021
February 02, 2021 Build Information ESXi 7.0 | 17 DEC 2020 | ISO Build 17325551 vCenter Server 7.0 | 02 FEB 2021 | ISO Build 17491101 |
New Features
- Supervisor Cluster
- Update of Supervisor Clusters with vSphere networking: You can now update Supervisor Clusters that use vSphere networking from an older version of Kubernetes to the newest version available. New Supervisor Cluster versions will make latest Tanzu Kubernetes Grid Service features available as well.
Resolved Issues
-
New Tanzu Kubernetes Grid Service features were unavailable in existing Supervisors with vSphere networking
-
In the previous release, new Tanzu Kubernetes Grid Service capabilities and bug-fixes were only available in newly created Supervisor Clusters when vSphere networking was used. In this release, users can now update Supervisor Clusters with vSphere networking to take advantage of the latest Tanzu Kubernetes Grid Service features and bug-fixes.
-
What's New December 17, 2020
December 17, 2020 Build Information ESXi 7.0 | 17 DEC 2020 | ISO Build 17325551 vCenter Server 7.0 | 17 DEC 2020 | ISO Build 17327517 |
Note: To take advantage of new Tanzu Kubernetes Grid Service capabilities and bug-fixes in this release, you need to create a new Supervisor cluster if vSphere networking is used.
New Features
- Supervisor Cluster
- Supervisor Namespace Isolation with Dedicated T1 Router – Supervisor Clusters using NSX-T network uses a new topology where each namespace has its own dedicated T1 router.
- Newly created Supervisor Clusters uses this new topology automatically.
- Existing Supervisor Clusters are migrated to this new topology during an upgrade.
- Supervisor Clusters Support NSX-T 3.1.0 – Supervisor Clusters is compatible with NSX-T 3.1.0.
- Supervisor Cluster Version 1.16.x Support Removed – Supervisor Cluster Version 1.16.x is now removed. Supervisor Clusters running 1.16.x should be upgraded to a new version.
- Supervisor Namespace Isolation with Dedicated T1 Router – Supervisor Clusters using NSX-T network uses a new topology where each namespace has its own dedicated T1 router.
- Tanzu Kubernetes Grid Service for vSphere
- HTTP/HTTPS Proxy Support – Newly created Tanzu Kubernetes clusters can use a global HTTP/HTTPS Proxy for egress traffic as well as for pulling container images from internet registries.
- Integration with Registry Service – Newly created Tanzu Kubernetes clusters work out of the box with the vSphere Registry Service. Existing clusters, once updated to a new version, also work with the Registry Service.
- Configurable Node Storage – Tanzu Kubernetes clusters can now mount an additional storage volume to virtual machines thereby increasing available node storage capacity. This enables users to deploy larger container images that might exceed the default 16GB root volume size.
- Improved status information – WCPCluster and WCPMachine Custom Resource Definitions now implement conditional status reporting. Successful Tanzu Kubernetes cluster lifecycle management depends on a number of subsystems (for example, Supervisor, storage, networking) and understanding failures can be challenging. Now WCPCluster and WCPMachine CRDs surface common status and failure conditions to ease troubleshooting.
Resolved Issues
-
Missing new default VM Classes introduced in vSphere 7.0 U1
-
After upgrading to vSphere 7.0.1, and then performing a vSphere Namespaces update of the Supervisor Cluster, running the command "kubectl get virtualmachineclasses" did not list the new VM class sizes 2x-large, 4x-large, 8x-large. This has been resolved and all Supervisor Clusters will be configured with the correct set of default VM Classes.
-
What's New October 6, 2020
October 6, 2020 Build Information ESXi 7.0 | 06 OCT 2020 | ISO Build 16850804 vCenter Server 7.0 | 06 OCT 2020 | ISO Build 16860138 |
New Features
- Supervisor Cluster
- Configuration of Supervisor Clusters with vSphere networking – We introduced vSphere networking for Supervisor Clusters, enabling you to deliver a developer-ready platform using your existing network infrastructure.
- Support of HAproxy load balancer for setting up Supervisor Clusters with vSphere networking – If you configure Supervisor Clusters with vSphere networking, you need to add a load balancer to handle your modern workloads. You can deploy and set up your load balancer with an HAproxy OVA.
- Management of Supervisor Cluster lifecycle using vSphere Lifecycle Manager – For Supervisor Clusters configured with vSphere networking, you can use vSphere Lifecycle Manager for infrastructure configuration and lifecycle management.
- Opportunity to try vSphere with Tanzu on your hardware – We now offer you an in-product-trial if you want to enable a Supervisor Cluster on your hardware and test this modern application platform at no additional cost.
- Tanzu Kubernetes Grid Service for vSphere
- Exposure of Kubernetes versions to DevOps users — We introduced a new 'TanzuKubernetesRelease' custom resource definition in the Supervisor Cluster. This custom resource definition provides detailed information to the DevOps user about the Kubernetes versions they can use in their Tanzu Kubernetes clusters.
- Integration of VMware Container Networking with Antrea for Kubernetes – We integrated a commercially supported version Antrea as the default Container Network Interface (CNI) for new Tanzu Kubernetes clusters. Antrea brings a comprehensive suite of enterprise network policy features to Tanzu Kubernetes Grid Service. For more details, read the release announcement. While Antrea is the default CNI, vSphere administrators and DevOps users can still choose Calico as the CNI for Tanzu Kubernetes clusters.
- Support of Supervisor cluster environments that use vSphere networking – We now support Supervisor Cluster environments that use vSphere networking so you can leverage your existing network infrastructure.
Resolved Issues
- No listing. This is a feature release.
What's New August 25, 2020
August 25, 2020 Build Information ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942 vCenter Server 7.0 | 25 AUG 2020 | ISO Build 16749653 |
New Features
- None, this is simply a bug-fix release.
Resolved Issues
- High CPU utilization upon upgrading to the July 30 patch
- vCenter Server generates a high CPU utilization after upgrade to the July 30 patch. This issue is now fixed.
- Supervisor cluster enablement failure due to certificate with Windows line endings
- Enabling supervisor cluster can fail if there are Windows line endings in the certificate. This issue is now fixed.
What's New July 30, 2020
July 30, 2020 Build Information ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942 vCenter Server 7.0 | 30 JUL 2020 | ISO Build 16620007 |
New Features
- Supervisor cluster: new version of Kubernetes, support for custom certificates and PNID changes
- The Supervisor cluster now supports Kubernetes 1.18.2 (along with 1.16.7 and 1.17.4)
- Replacing machine SSL certificates with custom certificates is now supported
- vCenter PNID update is now supported when there are Supervisor clusters in vCenter Server
- Tanzu Kubernetes Grid Service for vSphere: new features added for cluster scale-in, networking and storage
- Cluster scale-in operation is now supported for Tanzu Kubernetes Grid service clusters
- Ingress firewall rules are now enforced by default for all Tanzu Kubernetes Grid service clusters
- New versions of Kubernetes shipping regularly asynchronously to vSphere patches, current versions are 1.16.8, 1.16.12, 1.17.7, 1.17.8
- Network service: new version of NCP
- SessionAffinity is now supported for ClusterIP services
- IngressClass, PathType, and Wildcard domain are supported for Ingress in Kubernetes 1.18
- Client Auth is now supported in Ingress Controller
- Registry service: new version of Harbor
- The Registry service now is upgraded to 1.10.3
For more information and instructions on how to upgrade, refer to the Updating vSphere with Tanzu Clusters documentation.
Resolved Issues
- Tanzu Kubernetes Grid Service cluster NTP sync issue
What's New June 23, 2020
June 23, 2020 Build Information ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942 vCenter Server 7.0 | 23 JUN 2020 | ISO Build 16386292 Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd |
New Features
- None, this is simply a bug-fix release.
Resolved Issues
- Tanzu Kubernetes Grid Service cluster upgrade failure
- We have resolved an issue where upgrade a Tanzu Kubernetes Grid service cluster can failed due to "Error: unknown previous node"
- Supervisor cluster upgrade failure
- We have resolved an issue where a Supervisor cluster update may get stuck if the embedded Harbor is in a failed state
What's New May 19, 2020
May 19, 2020 Build Information ESXi 7.0 | 2 APR 2020 | ISO Build 15843807 vCenter Server 7.0 | 19 MAY 2020 | ISO Build 16189094 Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd |
New Features
- Tanzu Kubernetes Grid Service for vSphere: rolling upgrade and services upgrade
- Customers can now perform rolling upgrades over their worker nodes and control plane nodes for the Tanzu Kubernetes Grid Service for vSphere, and upgrade the pvCSI, Calico, and authsvc services. This includes pre-checks and upgrade compatibility for this matrix of services.
- Rolling upgrades can be used to vertically scale worker nodes, i.e. change the VM class of your worker nodes to a smaller or larger size.
- Supervisor cluster: new versions of Kubernetes, upgrade supported
- The Supervisor cluster now supports Kubernetes 1.17.4
- The Supervisor cluster now supports upgrading from Kubernetes 1.16.x to 1.17.x
Resolved Issues
- Naming conflict for deleted namespaces
- We have resolved an issue where, if a user deleted a vSphere namespace and then created a new vSphere namespace with the same name, we had a naming collision that resulted in being unable to create Tanzu Kubernetes clusters.
- Improved distribution names
- We have made clearer which version of Kubernetes you are running by moving OVF versioning information to a separate column.
Build Information for the Initial vSphere with Kubernetes Release
April 2, 2020 Build Information ESXi 7.0 | 2 APR 2020 | ISO Build 15843807 vCenter Server 7.0 | 2 APR 2020 | ISO Build 15952498 Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd |
Learn About vSphere with Tanzu
VMware provides a variety of resources you can use to learn about vSphere with Tanzu.
-
Learn how to configure, manage, and use vSphere with Tanzu by reading vSphere with Tanzu Configuration and Mananagement. Designed for vSphere system administrators and DevOps teams, this guide provides details on vSphere with Tanzu architecture, services, licensing, system requirements, set up, and usage.
-
Use the VMware Compatibility Guides to learn about hardware compatibility and product interoperability for vSphere with Tanzu. vSphere with Tanzu has the same hardware requirements as vSphere 7.0. For certain configurations, it also requires the use of NSX-T Edge virtual machines, and those VMs have their own smaller subset of CPU compatibility. See the NSX-T Data Center Installation Guide for more information.
-
Find out what languages vSphere with Tanzu is available in by visiting the Internationalization section of the vSphere 7.0 Release Notes. These are the same languages VMware provides for vSphere.
-
View the copyrights and licenses for vSphere with Tanzu open source components by visiting the Open Source section of the vSphere 7.0 Release Notes. The vSphere 7.0 Release Notes also tell you where to download vSphere open source components.
Known Issues
The known issues are grouped as follows.
Supervisor Cluster- Pod creation sometimes fails on a Supervisor Cluster when DRS is set to Manual mode
Clusters where you enable workload management also must have HA and automated DRS enabled. Enabling workload management on clusters where HA and DRS are not enabled or where DRS is running in manual mode can lead to inconsistent behavior and Pod creation failures.
Workaround: Enable DRS on the cluster and set it to Fully Automate or Partially Automate. Also ensure that HA is enabled on the cluster.
- Storage class appears when you run kubectl get sc even after you remove the corresponding storage policy
If you run
kubectl get sc
after you create storage policy, add the policy to a namespace, and then remove the policy, the command response will still list the corresponding storage class.Workaround: Run
kubectl describe namespace
to see the storage classes actually associated with the namespace. - All storage classes returned when you run kubectl describe storage-class or kubectl get storage-class on a Supervisor Cluster instead of just the ones for the Supervisor namespace
When you run the
kubectl describe storage-class
orkubectl get storage-class
command on a Supervisor Cluster, the command returns all storage classes instead of just the ones for the Supervisor namespace.Workaround: Infer the storage class names associated with the namespace from the verbose name of the quota.
- Share Kubernetes API endpoint button ignores FQDN even if it is configured
Even if FQDN is configured for the Kubernetes control plane IP for Supervisor Cluster namespace, the share namespace button gives the IP address instead of the FQDN.
Workaround: Manually share Supervisor Cluster namespace with FQDN.
- During Supervisor cluster upgrade, extra vSphere Pods might be created and stuck at pending status if Daemon set is used
During Supervisor cluster upgrade, Daemon set controller creates extra vSphere Pods for each Supervisor control plane node. This is caused by an upstream Kubernetes issue.
Workaround: Add NodeSelector/NodeAffinity to vSphere Pod spec, so the Daemon set controller can skip the control plane nodes for pods creation.
- Unable to access the load balancer via kubectl vSphere login
You cannot access the api server via kubectl vSphere login when using a load balanced endpoint.
Workaround: This issue can manifest in two ways.
-
Check whether the api server is accessible through the control plane <curl -k https://vip:6443 (or 443)>
-
If you are unable to access the load balancer from the api server, then the api server is not up yet.
-
Workaround: Wait a few minutes for the api server to become accessible.
-
-
Check if the edge virtual machine node status is up.
-
Log in to the NSX Manager.
-
Go to System > Fabric > Nodes > Edge Transport Nodes. The node status should be up.
-
Go to Networking > Load Balancers > Virtual Servers. Find the vips that end with kube-apiserver-lb-svc-6443 and kube-apiserver-lb-svc-443. If their status is not up, use the following workaround.
-
Workaround: Reboot the edge VM. The edge VM should reconfigure after the reboot.
-
-
- Cluster configuration of vSphere with Tanzu shows timeout errors during configuration
During the configuration of the cluster, you may see the following error messages:
Api request to param0 failed
or
Config operation for param0 node VM timed out
Workaround: None. Enabling vSphere with Tanzu can take from 30 to 60 minutes. If you see these or similar
param0
timeout messages, they are not errors and can be safely ignored. - Enabling the container registry fails with error
When the user enables the container registry from the UI, the enable action fails after 10 minutes with a timeout error.
Workaround: Disable the container registry and retry to enable. Note that the timeout error may occur again.
- Enabling a cluster after disabling it fails with error
Enabling a cluster shortly after disabling the cluster may create a conflict in the service account password reset process. The enable action fails with an error.
Workaround: Restart with the command
vmon-cli --restart wcp
. - Deleting a container image tag in an embedded container registry might delete all image tags that share the same physical container image
Multiple images with different tags can be pushed to a project in an embedded container registry from the same container image. If one of the images on the project is deleted, all other images with different tags that are pushed from the same image will be deleted.
Workaround: The operation cannot be undone. Push the image to the project again.
- Failed purge operation on a registry project results in project being in 'error' state
When you perform a purge operation on a registry project, the project temporarily displays as being in an error state. You will not be able to push or pull images from such project. At regular intervals, the project will be checked and all projects which are in error state will be deleted and recreated. When this happens, all previous project members will be added back to the recreated project and all the repositories and images which previously existed in the project will be deleted, effectively completing the purge operation.
Workaround: None.
- Container registry enablement fails when the storage capacity is less than 2000 mebibytes
There is a minimum total storage capacity requirement for the container registry, addressed as the "limit" field in VMODL. This is because some Kubernetes pods need enough storage space to work properly. To achieve container registry functionality, there is a minimum capacity of 5 Gigabytes. Note that this limit offers no guarantee of improved performance or increased number or size of images that can be supported.
Workaround: This issue can be avoided by deploying the container registry with a larger total capacity. The recommended storage volume is no less than 5 gigabytes.
- If you replace the TLS certificate of the NSX load balancer for Kubernetes cluster you might fail to log in to the embedded Harbor registry from a docker client or the Harbor UI
To replace the TLS certificate of the NSX load balancer for Kubernetes cluster, from the vSphere UI navigate to Configure > Namespaces > Certificates > NSX Load Balancer > Actions and click Replace Certificate. When you replace the NSX certificate, the login operation to the embedded Harbor registry from a docker client or the Harbor UI might fail with the
unauthorized: authentication required
orInvalid user name or password
error.Workaround: Restart the registry agent pod in the
vmware-system-registry
namespace:- Run the
kubectl get pod -n vmware-system-registry
command. - Delete the pod output by running the
kubectl delete pod vmware-registry-controller-manager-xxxxxx -n vmware-system-registry
command. - Wait until pod restarts.
- Run the
- Pods deployed with DNSDefault will use the clusterDNS settings
Any vSphere pod deployed in supervisor clusters that makes use of the DNSDefault will fallback to using the clusterDNS configured for the cluster
Workaround: None.
- All hosts in a cluster might be updated simultaneously when upgrading a Supervisor Cluster
In certain cases, all hosts in a cluster will be updated in parallel during the Supervisor Cluster upgrade process. This will cause downtime for all pods running on this cluster.
Workaround: During Supervisor Cluster upgrade, don't restart wcpsvc or remove/add hosts.
- Supervisor Cluster upgrade can be stuck indefinitely if VMCA is used as an intermediate CA
Supervisor Cluster upgrade can be stuck indefinitely in "configuring" if VMCA is being used as an intermediate CA.
Workaround: Switch to a non-intermediate CA for VMCA and delete any control plane VMs stuck in "configuring".
- vSphere Pod deployment will failed if a Storage Policy with encryption enabled is assigned for Pod Ephemeral Disks
If a Storage Policy with encryption enabled is used for Pod Ephemeral Disks, vSphere Pod creation will be failed with an “AttachVolume.Attach failed for volume” error.
Workaround: Use a storage policy with no encryption for Pod Ephemeral Disks.
- Supervisor Cluster upgrade hangs at 50% during "Namespaces cluster upgrade is in upgrade host step"
The problem occurs when a vSphere Pod hangs at TERMINATING state during the upgrade of the Kubernetes control plane node. The controller of control plane node tries to upgrade the Spherelet process and during that phase vSphere Pods are being evicted or killed on that control plane node to unregister the node from the Kubernetes control plane. Because of this reason, the Supervisor Cluster upgrade hangs at an older version until vSphere Pods in TERMINATING state are removed from inventory.
Workaround:
1. Login to the ESXi host on which vSphere Pod is hanging in TERMINATING state.
2. Remove the TERMINATING vSphere Pods by using following commands:
# vim-cmd vmsvc/getallvms
# vim-cmd vmsvc/destroy
After this step, the vSphere Pods display in orphaned state in the vSphere Client.
3. Delete the orphaned vSphere Pods by first adding a user to the ServiceProviderUsers group.
a.) Login to the vSphere client, select Administration -> Users and Groups -> Create User, and click Groups.
b.) Search for ServiceProviderUsers or the Administrators group and add a user to the group.
4. Login to the vSphere Client by using the just created user and delete the orphaned vSphere Pods.
5. In kubectl, use the following command:
kubectl patch pod
-p -n '{"metadata":{"finalizers":null}}' - Workload Management UI throws the following license error: None of the hosts connected to this vCenter are licensed for Workload Management
After successfully enabling Workload Management on a vSphere Cluster, you might see the following licensing error after rebooting vCenter Server or upgrading ESXI hosts where Workload Management is enabled: None of the hosts connected to this vCenter are licensed for Workload Management. This is a cosmetic UI error. Your license should still be valid and your workloads should still be running.
Workaround: Users should clear their browser cache for the vSphere Client.
- Large vSphere environments might take long to sync on a cloud with the VMware NSX Advanced Load Balancer Controller
vSphere environments with inventories that contain more than 2,000 ESXi hosts and 45,000 virtual machines might take as much as 2 hours to sync on a cloud by using an NSX Advanced Load Balancer Controller.
Workaround: none
- The private container registry of the Supervisor Cluster might become unhealthy after VMware Certificate Authority (VMCA) root certificate is changed on a vCenter Server 7.0 Update 2
After you change the VMware Certificate Authority (VMCA) root certificate on a vCenter Server system 7.0 Update 2, the private container registry of the Supervisor Cluster might become unhealthy and the registry operations might stop working as expected. The following health status message for the container registry is displayed on the cluster configuration UI:
Harbor registry harbor-1560339792 on cluster domain-c8 is unhealthy. Reason: failed to get harbor health: Get https://30.0.248.2/api/health: x509: certificate signed by unknown authority
Workaround:
Restart the registry agent pod manually in the vmware-system-registry namespace on the vSphere kubernetes cluster:
- Run the
kubectl get pod -n vmware-system-registry
command to get registry agent pod. - Delete the pod output by running the
kubectl delete pod vmware-registry-controller-manager-xxxxxx -n vmware-system-registry
command. - Wait until the pod restarts.
- Refresh the image registry on the cluster configuration UI, and the health status should show as running shortly.
- Run the
- Projects for newly-created namespaces on the Supervisor Cluster are not automatically created on the private container registry
Projects might not be automatically created on the private container registry for newly-created namespaces on a Supervisor Cluster. The status of the container registry still displays as healthy, but no projects are shown on the container registry of the cluster when a new namespace is created. You cannot push or pull images to the projects of the new namespaces on the container registry.
Workaround:
- Run the kubectl get pod -n vmware-system-registry command to get the registry agent pod.
- Delete the pod output by running the kubectl delete pod vmware-registry-controller-manager-xxxxxx -n vmware-system-registry command.
- Wait until pod restarts.
- Log in to the private container registry to verify that projects are created for namespaces on the cluster.
- You might observe ErrImgPull while creating pods with 10 replicas
You might get this issue, when trying to use a deployment with 10 replica pods in a YAML. When you try to create with this YAML by using the private container registry, out of 10 replicas, at least 7 might pass and 3 might fail with the "ErrImgPull" issue.
Workaround: Use fewer replica sets, maximum 5.
- The NSX Advanced Load Balancer Controller is not supported when vCenter Server is deployed with a custom port
You cannot register vCenter Server with the NSX Advanced Load Balancer Controller as no option exists for providing a custom vCenter Server port in NSX Advanced Load Balanced Controller UI while registering.
NSX Advanced Load Balancer Controller works only when vCenter Server is deployed with default ports 80 and 443.
- When performing domain repointing on vCenter Server that already contains running Supervisor Clusters, the Supervisor Clusters will go in Configuring state
Domain repointing is not supported on vCenter Server that has Supervisor Clusters. When trying to perform domain repointing, Existing Supervisor Clusters will go in Configuring state and control plane VMs and Tanzu Kubernetes cluster VMs stop appearing in the inventory under the Hosts and Clusters view.
Workaround: None
- NSX Edge virtual machine deployment fails on slow networks
There is a combined 60 minute timeout for NSX Edge OVF deployment and NSX Edge VM registration. In slower networks or environments with slower storage, if the time elapsed for Edge deployment and registration exceeds this 60 minute timeout, the operation will fail.
Workaround: Clean up edges and restart the deployment.
- NSX Edges are not updated if vCenter Server DNS, NTP, or Syslog settings are changed after cluster configuration
DNS, NTP, and Syslog settings are copied from vCenter Server to NSX Edge virtual machines during cluster configuration. If any of these vCenter Server settings are changed after configuration, the NSX Edges are not updated.
Workaround: Use the NSX Manager APIs to update the DNS, NTP, and Syslog settings of your NSX Edges.
- NSX Edge Management Network Configuration only provides subnet and gateway configuration on select portgroups
The NSX Edge management network compatibility drop down list will show subnet and gateway information only if there are ESXi VMKnics configured on the host that are backed by a DVPG on the selected VDS. If you select a Distributed Portgroup without a VMKnic attached to it, you must provide a subnet and gateway for the network configuration.
Workaround: Use one of the following configurations:
Discreet Portgroup: This is where no VMKs currently reside. You must supply the appropriate subnet and gateway information for this portgroup.
Shared Management Portgroup: This is where the ESXi hosts' Management VMK resides. Subnet and gateway information will be pulled automatically.
- Unable to use VLAN 0 during cluster configuration
When attempting to use VLAN 0 for overlay Tunnel Endpoints or uplink configuration, the operation fails with the message:
Argument 'uplink_network vlan' is not a valid VLAN ID for an uplink network. Please use a VLAN ID between 1-4094
Workaround: Manually enable VLAN 0 support using one of the following processes:
1. SSH into your deployed VC (root/vmware).
2. Open /etc/vmware/wcp/nsxdsvc.yaml. It will have content similar to:
logging:
level: debug
maxsizemb: 10
a. To enable VLAN0 support for NSX Cluster Overlay Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.
experimental:
supportedvlan:
hostoverlay:
min: 0
max: 4094
edgeoverlay:
min: 1
max: 4094
edgeuplink:
min: 1
max: 4094
b. To enable VLAN0 support for NSX Edge Overlay Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.
experimental:
supportedvlan:
hostoverlay:
min: 1
max: 4094
edgeoverlay:
min: 0
max: 4094
edgeuplink:
min: 1
max: 4094
c. To enable VLAN0 support for NSX Edge Uplink Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.
experimental:
supportedvlan:
hostoverlay:
min: 1
max: 4094
edgeoverlay:
min: 1
max: 4094
edgeuplink:
min: 0
max: 4094
3. Restart the workload management service with
vmon-cli --restart wcp
. - vSphere with Tanzu and NSX-T cannot be enabled on a cluster where vSphere Lifecycle Manager Image is enabled
vSphere with Tanzu and NSX-T are not compatible with vSphere Lifecycle Manager Image. They are only compatible with vSphere Lifecycle Manage Baselines. When vSphere Lifecycle Manager Image is enabled on a cluster, you cannot enable vSphere with Tanzu or NSX-T on that cluster.
Workaround: Move hosts to a cluster where vSphere Lifecycle Manager Image is disabled. You must use a cluster with vSphere Lifecycle Manager Baselines. Once the hosts are moved, you can enable NSX-T and then vSphere with Tanzu on that new cluster.
- When vSphere with Tanzu networking is configured with NSX-T, "ExternalTrafficPolicy: local" not supported
For Kubernetes service of type LoadBalancer, the "ExternalTrafficPolicy: local" configuration is not supported.
Workaround: None.
- When vSphere with Tanzu networking is configured with NSX-T, the number of services of type LoadBalancer that a Tanzu Kuberetes cluster can support is limited by the NodePort range of the Supervisor Cluster
Each VirtualMachineService of type LoadBalancer is translated to one Kubernetes service of type LoadBalancer and one Kubernetes endpoint. The maximum number of Kubernetes services of type LoadBalancer that can be created in a Supervisor Cluster is 2767, this includes those created on the Supervisor Cluster itself and those created in Tanzu Kubernetes clusters.
Workaround: None.
- NSX Advanced Load Balancer Controller does not support changing the vCenter Server PNID
Once you configure the Supervisor Cluster with the NSX Advanced Load Balancer, you cannot change the vCenter Server PNID.
Workaround: If you must change the PNID of vCenter Server, remove the NSX Advanced Load Balancer Controller and change for vCenter Server PNID, then redeploy and configure NSX Advanced Load Balancer Controller with new PNID of vCenter Server.
- In vSphere Distributed Switch (vDS) environments, it is possible to configure Tanzu Kubernetes clusters with network CIDR ranges that overlap or conflict with those of the Supervisor Cluster, and vice versa, resulting in components not being able to communicate.
In vDS environments, there is no design-time network validation done when you configure the CIDR ranges for the Supervisor Cluster, or when you configure the CIDR ranges for Tanzu Kubernetes clusters. As a result, two problems can arise:
1) You create a Supervisor Cluster with CIDR ranges that conflict with the default CIDR ranges reserved for Tanzu Kubernetes clusters.
2) You create a Tanzu Kubernetes cluster with a custom CIDR range that overlaps with the CIDR range used for the Supervisor Clusters.
Workaround: For vDS environments, when you configure a Supervisor Cluster, do not use either of the default CIDR ranges used for Tanzu Kubernetes clusters, including 192.168.0.0/16, which is reserved for services, and 10.96.0.0/12, which is reserved for pods. See also "Configuration Parameters for Tanzu Kubernetes Clusters" in the vSphere with Tanzu documentation.
For vDS environments, when you create a Tanzu Kubernetes cluster, do not use the same CIDR range that is used for the Supervisor Cluster.
- A Tanzu Kubernetes cluster hangs in "Updating" state after Supervisor Cluster upgrade
When a Supervisor Cluster is upgraded, it can trigger a rolling update of all the Tanzu Kubernetes clusters to propagate any new configuration settings. During this process, a previously "Running" TKC Cluster might hang in the "Updating" phase. A "Running" Tanzu Kubernetes cluster only indicates the availability of the control plane and it is possible that the required control plane and worker nodes have not been successfully created. Such a Tanzu Kubernetes cluster might fail the health checks that are performed during the rolling update that initiates upon completion of the Supervisor Cluster upgrade. This results in the Tanzu Kubernetes cluster hanging in the "Updating" phase and can be confirmed by looking at the events on the
KubeadmControlPlane
resources associated with the Tanzu Kubernetes Cluster. The events emitted by the resource will be similar to the one below:
Warning ControlPlaneUnhealthy 2m15s (x1026 over 5h42m) kubeadm-control-plane-controller Waiting for control plane to pass control plane health check to continue reconciliation: machine's (gc-ns-1597045889305/tkg-cluster-3-control-plane-4bz9r) node (tkg-cluster-3-control-plane-4bz9r) was not checked
Workaround: None.
- Tanzu Kubernetes cluster continues to access removed storage policy
When a VI Admin deletes a storage class on from the vCenter Server namespace, access to that storage class is not removed for any Tanzu Kubernetes cluster that is already using it.
Workaround:
-
As VI Admin, after deleting a storage class from the vCenter Server namespace, create a new storage policy with the same name.
-
Re-add the existing storage policy or the one you just recreated to the supervisor namespace. TanzuKubernetesCluster instances using this storage class should now be fully-functional.
-
For each TanzuKubernetesCluster resource using the storage class you wish to delete, create a new TanzuKubernetesCluster instance using a different storage class and use Velero to migrate workloads into the new cluster.
-
Once no TanzuKubernetesCluster or PersistentVolume uses the storage class, it can be safely removed.
-
- The embedded container registry SSL certificate is not copied to Tanzu Kubernetes cluster nodes
When the embedded container registry is enabled for a Supervisor Cluster, the Harbor SSL certificate is not included in any Tanzu Kubernetes cluster nodes created on that SC, and you cannot connect to the registry from those nodes.
Workaround: Copy and paste the SSL certificate from the Supervisor Cluster control plane to the Tanzu Kubernetes cluster worker nodes.
- Virtual machine images are not available from the content library
When multiple vCenter Server instances are configured in an Embedded Linked Mode setup, the UI allows the user to select a content library created on a different vCenter Server instance. Selecting such a library results in virtual machine images not being available for DevOps users to provision a Tanzu Kubernetes cluster. In this case, `kubectl get virtualmachineimages` does not return any results.
Workaround: When you associate a content library with the Supervisor Cluster for Tanzu Kubernetes cluster VM images, choose a library that is created in the same vCenter Server instance where the Supervisor Cluster resides. Alternatively, create a local content library which also supports air-gapped provisioning of Tanzu Kubernetes clusters.
- You cannot provision new Tanzu Kubernetes clusters, or scale out existing clusters, because the Content Library subscriber cannot synchronize with the publisher.
When you set up a Subscribed Content Library for Tanzu Kubernetes cluster OVAs, an SSL certificate is generated, and you are prompted to manually trust the certificate by confirming the certificate thumbprint. If the SSL certificate is changed after the initial library setup, the new certificate must be trusted again by updating the thumbprint.
Edit the settings of the Subscribed Content Library. This will initiate a probe of the subscription URL even though no change is requested on the library. The probe will discover that the SSL certificate is not trusted and prompt you to trust it.
- Tanzu Kubernetes Release version 1.16.8 is incompatible with vSphere 7 U1.
The Tanzu Kubernetes Release version 1.16.8 is incompatible with vSphere 7 U1. You must update Tanzu Kubernetes clusters to a later version before performing a vSphere Namespaces update to U1.
Before performing a vSphere Namespaces update to the vSphere 7 U1 release, update each Tanzu Kubernetes cluster running version 1.16.8 to a later version. Refer to the topic "Supported Update Path" in the vSphere with Tanzu documentation for more information.
- After upgrading the Workload Control Plane to vSphere 7 U1, new VM Class sizes are not available.
Description: After upgrading to vSphere 7.0.1, and then performing a vSphere Namespaces update of the Supervisor Cluster, for Tanzu Kubernetes clusters, running the command "kubectl get virtualmachineclasses" does not list the new VM class sizes 2x-large, 4x-large, 8x-large.
Workaround: None. The new VM classes sizes can only be used with a new installation of the Workload Control Plane.
- The Tanzu Kubernetes Release version 1.17.11 vmware.1-tkg.1 times out connecting to the cluster DNS server when using the Calico CNI.
The Tanzu Kubernetes Release version v1.17.11+vmware.1-tkg.1 has a Photon OS kernel issue that prevents the image from working as expected with the Calico CNI.
Workaround: For Tanzu Kubernetes Release version 1.17.11, the image identified as "v1.17.11+vmware.1-tkg.2.ad3d374.516" fixes the issue with Calico. To run Kubernetes 1.17.11, use this version instead of "v1.17.11+vmware.1-tkg.1.15f1e18.489". Alternatively, use a different Tanzu Kubernetes Release, such as version 1.18.5 or 1.17.8 or 1.16.14.
- When vSphere with Tanzu networking is configured with NSX-T Data Center, updating an "ExternalTrafficPolicy: Local" Service to "ExternalTrafficPolicy: Cluster" will render this Service's LB IP inaccessible on SV Masters
When a LoadBalancer type Kubernetes Service is initially created in workload clusters with
ExternalTrafficPolicy: Local
, and later updated toExternalTrafficPolicy: Cluster
, access to this Service's LoadBalancer IP on the Supervisor Cluster VMs will be dropped.Workaround: Delete the Service and recreate it with
ExternalTrafficPolicy: Cluster
. - High CPU usage on Tanzu Kubernetes cluster control plane nodes
A known issue exists in the Kubernetes upstream project where occasionally kube-controller-manager goes into a loop resulting in high CPU usage which might effect functionality of Tanzu Kubernetes clusters. You might notice that the process, kube-controller-manager, is consuming a larger than expected amount of CPU and is outputting repeated logs indicating
failed for updating Node.Spec.PodCIDRs
.Workaround: Delete the kube-controller-manager pod that sits inside the control plane node with such an issue. The pod will be recreated and the issue should not reappear.
- You cannot update Tanzu Kubernetes clusters created with K8s 1.16 to 1.19
Kubelet's configuration file is generated at the time
kubeadm init
is run and then replicated during cluster upgrades. At the time of 1.16,kubeadm init
generates a config file that setresolvConf
to/etc/resolv.conf
that was then overwritten by a the command-line flag--resolv-conf
pointing at/run/systemd/resolve/resolv.conf
. During 1.17 and 1.18,kubeadm
continues to configure Kubelet with the correct--resolv-conf
. As of 1.19,kubeadm
no longer configures the command line flag and instead relies on the Kubelet configuration file. Due to the replication process during cluster upgrades, a 1.19 cluster upgraded from 1.16 will include a config file whereresolvConf
points at/etc/resolv.conf
instead of/run/systemd/resolve/resolv.conf
.Workaround: Before upgrading a Tanzu Kubernetes cluster to 1.19, reconfigure the Kubelet configuration file to point to the correct
resolv.conf
. Manually duplicate theConfigMap
kubelet-config-1.18
tokubelet-config-1.19
in thekube-system
namespace then modify that newConfigMap's
data to pointresolvConf
at/run/systemd/resolve/resolv.conf
. - When the Supervisor Cluster networking is configured with NSX-T, after updating a service from "ExternalTrafficPolicy: Local" to "ExternalTrafficPolicy: Cluster", requests made on the Supervisor Cluster control plane nodes to this service's load balancer IP fail
When you create a service on a Tanzu Kubernetes cluster with
ExternalTrafficPolicy: Local
and later updated the service toExternalTrafficPolicy: Cluster
, kube-proxy creates an IP table rule incorrectly on the Supervisor Cluster control plane nodes to block traffic destined to the service's LoadBalancer IP. For example, if this service has LoadBalancer IP 192.182.40.4, the following IP table rule is created on any one of the control plane nodes:-A KUBE-SERVICES -d 192.182.40.4/32 -p tcp -m comment --comment "antrea-17-171/antrea-17-171-c1-2bfcfe5d9a0cdea4de6eb has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
As a result, access to that IP is dropped.
Workaround: Delete the service and create it anew with
ExternalTrafficPolicy: Cluster
. - After you enable HTTP Proxy and/or Trust settings in the TkgServiceConfiguration specification, all pre-existing clusters without Proxy/Trust settings will inherit the global Proxy/Trust settings when they are updated.
You can edit the TkgServiceConfiguration specification to configure the TKG Service, including specifying the default CNI, HTTP Proxy, and Trust certificates. Any configuration changes you make to the TkgServiceConfiguration specification apply globally to any Tanzu Kuberentes cluster provisioned or updated by that service. You cannot opt-out of the global configuration using per-cluster settings.
For example, if you edit the TkgServiceConfiguration specification and enable an HTTP Proxy, all new clusters provisioned by that cluster inherit those proxy settings. In addition, all pre-existing clusters without a proxy server inherit the global proxy configuration when the cluster is modified or updated. In the case of HTTP/S proxy, which supports per-cluster configuration, you can update the cluster spec with a different proxy server, but you cannot remove the global proxy setting. If the HTTP Proxy is set globally, you must either use it or overwrite it with a different proxy server.
Workaround: Understand that the TkgServiceConfiguration specification applies globally. If you don't want all clusters to use an HTTP Proxy, don't enable it at the global level. Do so at the cluster level.
- In very large Supervisor Cluster deployments with many Tanzu Kubernetes Clusters and VMs, vmop-controller-manager pods might fail due to OutOfMemory resulting in the inability to lifecycle manage Tanzu Kubernetes Clusters
Within the Supervisor Cluster, the vmop-controller-manager pod is responsible for managing the lifecycle of the VMs that make up Tanzu Kubernetes Clusters. At very large numbers of such VMs (>850 VMs per Supervisor Cluster), the vmop-controller-manager pod can go into an OutOfMemory CrashLoopBackoff. When this occurs, lifecycle management of Tanzu Kubernetes Clusters are disrupted until the vmop-controller-manager pod resumes operations.
Reduce the total number Tanzu Kubernetes Cluster worker nodes managed in a Supervisor Cluster either by deleting clusters or scaling down clusters.
- NEW: An expansion of a Supervisor cluster PVC in offline or online mode does not result in an expansion of a corresponding Tanzu Kubernetes cluster PVC
A pod that uses the Tanzu Kubernetes cluster PVC cannot use the expanded capacity of the Supervisor cluster PVC because the filesystem has not been resized.
Workaround: Resize the Tanzu Kubernetes cluster PVC to a size equal or greater than the size of the Supervisor cluster PVC.
- NEW: Size mismatch in statically provisioned TKG PVC when compared to underlying volume
Static provisioning in Kubernetes does not verify if the PV and backing volume sizes are equal. If you statically create a PVC in a Tanzu Kubernetes cluster, and the PVC size is less than the size of the underlying corresponding Supervisor cluster PVC, you might be able to use more space than the space you request in the PV. If the size of the PVC you statically create in the Tanzu Kubernetes cluster is greater than the size of the underlying Supervisor cluster PVC, you might notice
No space left on device
error even before you exhaust the requested size in the Tanzu Kubernetes cluster PV.Workaround:
- In the Tanzu Kubernetes cluster PV, change the
persistentVolumeReclaimPolicy
toRetain
. - Note the
volumeHandle
of the Tanzu Kubernetes cluster PV and then delete the PVC and PV in the Tanzu Kubernetes cluster. - Re-create the Tanzu Kubernetes cluster PVC and PV statically using the volumeHandle and set the storage to the same size as the size of the corresponding Supervisor cluster PVC.
- In the Tanzu Kubernetes cluster PV, change the
- Attempts to create a PVC from a supervisor namespace or a TKG cluster fail if the external csi.vsphere.vmware.com provisioner loses its lease for leader election
When you try to create a PVC from a supervisor namespace or a TKG cluster using the
kubectl
command, your attempts might not succeed. The PVC remains in the Pending state. If you describe the PVC, the Events field displays the following:Type Reason Age From Message ---- ------ --- ---- ------- Normal ExternalProvisioning 56s (x121 over 30m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
Workaround:
- Verify that all containers in the
vsphere-csi-controller
pod inside thevmware-system-csi
namespace are running.
kubectl describe pod vsphere-csi-controller-pod-name -n vmware-system-csi
- Check the external provisioner logs by using the following command.
kubectl logs vsphere-csi-controller-pod-name -n vmware-system-csi -c csi-provisioner
The following entry indicates that the external-provisioner sidecar container lost its leader election:I0817 14:02:59.582663 1 leaderelection.go:263] failed to renew lease vmware-system-csi/csi-vsphere-vmware-com: failed to tryAcquireOrRenew context deadline exceeded F0817 14:02:59.685847 1 leader_election.go:169] stopped leading
- Delete this instance of vsphere-csi-controller.
kubectl delete pod vsphere-csi-controller-pod-name -n vmware-system-csi
Kubernetes will create a new instance of the CSI controller and all sidecars will be reinitialized.
- Verify that all containers in the