What's in the Release Notes

Updated on: April 06, 2021

The release notes cover the following topics:

 

What's New

VMware vSphere with Tanzu has monthly patches to introduce new features and capabilities, provide updates to Kubernetes and other services, keep up with upstream, and to resolve reported issues. Here we document what each monthly patch delivers.

 

What's New March 09, 2021

March 09, 2021 Build Information

ESXi 7.0 | 09 MAR 2020 | ISO Build 17630552

vCenter Server 7.0 | 09 MAR 2021 | ISO Build 17694817

VMware NSX Advanced Load Balancer | 12 OCT 2020 | 20.1.X

New Features

  • Supervisor Cluster
    • Support of NSX Advanced Load Balancer for a Supervisor Cluster configured with VDS networking - You can now enable a Supervisor Cluster with NSX Advanced Load Balancer (Avi Networks) for L4 load balancing, as well as load balancing for the control plane nodes of Supervisor and Tanzu Kubernetes clusters. Checkout the documentation page for guidance on configuring the NSX Advanced Load Balancer.
    • Upgrade of the Supervisor Cluster to Kubernetes 1.19 with auto-upgrade of a Supervisor Cluster running Kubernetes 1.16 - You can upgrade the Supervisor Cluster to Kubernetes 1.19. With this update, the following Supervisor Cluster versions are supported: 1.19, 1.18, and 1.17. Supervisor Clusters running Kubernetes 1.16 will be automatically upgraded to 1.17 once vCenter Server is updated. This will ensure all your Supervisor Clusters are running with the supported version of Kubernetes.
    • Expansion of PersistentVolumeClaims (PVCs) - You can now expand existing volumes by modifying the PersistentVolumeClaim object, even when the volume is in active use. This applies to volumes in the Supervisor Cluster and Tanzu Kubernetes clusters.
    • Management of Supervisor Cluster lifecycle using vSphere Lifecycle Manager – For Supervisor Clusters configured with NSX-T networking, you can use vSphere Lifecycle Manager for infrastructure configuration and lifecycle management.
  • Tanzu Kubernetes Grid Service for vSphere
    • Support for private container registries – vSphere administrators and Kubernetes platform operators can now define additional Certificate Authority certificates (CAs) to use in Tanzu Kubernetes clusters for trusting private container registries. This feature enables Tanzu Kubernetes clusters to pull container images from container registries that have enterprise or self-signed certificates. You can configure private CAs as a default for Tanzu Kubernetes clusters on a Supervisor Cluster-wide basis or per-Tanzu Kubernetes Cluster. Read more about how to configure support for private container registries to Tanzu Kubernetes clusters by visiting the documentation page. 
    • User-defined IPs for Service type: LoadBalancer with NSX-T and NSX Advanced Load Balancer – Kubernetes application operators can now provide a user-defined LoadBalancerIP when configuring a Service type: LoadBalancer allowing for a static IP endpoint for the service.  This advanced feature requires either NSX-T load balancing or the NSX Advanced Load Balancer with the Supervisor Cluster. Learn how to configure this feature by visiting the documentation page. 
    • ExternalTrafficPolicy and LoadBalancerSourceRanges for Service type: LoadBalancer with NSX-T – Kubernetes application operators can now configure the ExternalTrafficPolicy of 'local' for Services to propagate client IP address to the end pods. You also can define loadBalancerSourceRanges for Services to restrict which client IPs can access the load balanced service. These two advanced features require NSX-T load balancing with the Supervisor Cluster.
    • Kubernetes version management and indications – You can now use kubectl to inspect the compatibility of TanzuKubernetesReleases with the underlying Supervisor Cluster environment. Tanzu Kubernetes clusters now also indicate whether there is a Kubernetes upgrade available and recommend the next TanzuKubernetesRelease(s) to use. For more information on using this new feature, see the documentation page. 
    • Improved Cluster Status at a Glance – In a previous release, VMware expanded WCPCluster and WCPMachine CRDs by implementing conditional status reporting to surface common problems and errors. With vSphere 7.0 Update 2 release, we enhanced TanzuKubernetesCluster CRDs to summarize conditional status reporting for subsystem components, supplying immediate answers and fine-grained guidance to help you investigate issues. Learn how to configure this feature by visitng the documentation page. 
    • Per-Tanzu Kubernetes cluster HTTP Proxy Configuration – You can now define the HTTP/HTTPS Proxy configuration on a per-Tanzu Kubernetes cluster basis or, alternately, define it on a Supervisor Cluster-wide through a default configuration. For information on configuring this feature, see the documentation page.
    • Support for Tanzu Kubernetes Grid Extensions –  In-cluster extensions are now fully supported on Tanzu Kubernetes Grid Service, including Fluent Bit, Contour, Prometheus, AlertManager, and Grafana.

Update Considerations for Tanzu Kubernetes Clusters

The vSphere 7.0 Update 2 release includes functionality that automatically upgrades the Supervisor Cluster when vCenter Server is updated. If you have Tanzu Kubernetes clusters provisioned in your environment, read Knowledge Base Article 82592 before upgrading to vCenter Server 7.0 Update 2. The article provides guidance on running a pre-check to determine whether any Tanzu Kubernetes cluster will become incompatible after the Supervisor Cluster is auto-upgraded.

Resolved Issues

  • The embedded container registry SSL certificate is not copied to Tanzu Kubernetes cluster nodes
    • When the embedded container registry is enabled for a Supervisor Cluster, the Harbor SSL certificate is not included in any Tanzu Kubernetes cluster nodes created on that SC, and you cannot connect to the registry from those nodes.
  • Post upgrade from Tanzu Kubernetes Grid 1.16.8 to 1.17.4, the “guest-cluster-auth-svc” pod on one of the control plane nodes is stuck at “Container Creating” state
    • After updating a Tanzu Kubernetes Cluster from Tanzu Kubernetes Grid Service 1.16.8 to 1.17.4, the "guest-cluster-auth-svc" pod on one of the cluster control plane nodes is stuck at "Container Creating" state
  • User is unable to manage existing pods on a Tanzu Kubernetes cluster during or after performing a cluster update
    • User is unable to manage existing pods on a Tanzu Kubernetes cluster during or after performing a cluster update.
  • Tanzu Kubernetes cluster Upgrade Job fails with “timed out waiting for etcd health check to pass.”
    • The upgrade job in the vmware-system-tkg namespace associated with the upgrade of a Tanzu Kubernetes cluster fails with the following error message "timed out waiting for etcd health check to pass." The issue is caused by the missing PodIP addresses for the etcd pods.
  • Antrea CNI not supported in current TKC version
    • While provisioning a Tanzu Kubernetes cluster, you receive the error "Antrea CNI not supported in current TKC version."

      Option 1 (recommended): Update the Tanzu Kubernetes cluster to use the OVA version that supports Antrea (v1.17.8 or later).

      Option 2: In the Tanzu Kubernetes cluster specification YAML, enter "calico" in the spec.settings.network.cni section.

      Option 3: Change the default CNI to Calico. Refer to the topic in the documentation on how to do this.

What's New February 02, 2021

February 02, 2021 Build Information

ESXi 7.0 | 17 DEC 2020 | ISO Build 17325551

vCenter Server 7.0 | 02 FEB 2021 | ISO Build 17491101

New Features

  • Supervisor Cluster
    • Update of Supervisor Clusters with vSphere networking: You can now update Supervisor Clusters that use vSphere networking from an older version of Kubernetes to the newest version available. New Supervisor Cluster versions will make latest Tanzu Kubernetes Grid Service features available as well. 

Resolved Issues

  • New Tanzu Kubernetes Grid Service features were unavailable in existing Supervisors with vSphere networking

    • In the previous release, new Tanzu Kubernetes Grid Service capabilities and bug-fixes were only available in newly created Supervisor Clusters when vSphere networking was used. In this release, users can now update Supervisor Clusters with vSphere networking to take advantage of the latest Tanzu Kubernetes Grid Service features and bug-fixes.

What's New December 17, 2020 

December 17, 2020 Build Information

ESXi 7.0 | 17 DEC 2020 | ISO Build 17325551

vCenter Server 7.0 | 17 DEC 2020 | ISO Build 17327517

Note: To take advantage of new Tanzu Kubernetes Grid Service capabilities and bug-fixes in this release, you need to create a new Supervisor cluster if vSphere networking is used.

New Features

  • Supervisor Cluster
    • Supervisor Namespace Isolation with Dedicated T1 Router – Supervisor Clusters using NSX-T network uses a new topology where each namespace has its own dedicated T1 router. 
      • Newly created Supervisor Clusters uses this new topology automatically.
      • Existing Supervisor Clusters are migrated to this new topology during an upgrade.
    • Supervisor Clusters Support NSX-T 3.1.0 – Supervisor Clusters is compatible with NSX-T 3.1.0.
    • Supervisor Cluster Version 1.16.x Support Removed – Supervisor Cluster Version 1.16.x is now removed. Supervisor Clusters running 1.16.x should be upgraded to a new version.
  • Tanzu Kubernetes Grid Service for vSphere
    • HTTP/HTTPS Proxy Support  – Newly created Tanzu Kubernetes clusters can use a global HTTP/HTTPS Proxy for egress traffic as well as for pulling container images from internet registries.
    • Integration with Registry Service – Newly created Tanzu Kubernetes clusters work out of the box with the vSphere Registry Service. Existing clusters, once updated to a new version, also work with the Registry Service.
    • Configurable Node Storage  – Tanzu Kubernetes clusters can now mount an additional storage volume to virtual machines thereby increasing available node storage capacity. This enables users to deploy larger container images that might exceed the default 16GB root volume size.
    • Improved status information  WCPCluster and WCPMachine Custom Resource Definitions now implement conditional status reporting. Successful Tanzu Kubernetes cluster lifecycle management depends on a number of subsystems (for example, Supervisor, storage, networking) and understanding failures can be challenging. Now WCPCluster and WCPMachine CRDs surface common status and failure conditions to ease troubleshooting.

Resolved Issues

  • Missing new default VM Classes introduced in vSphere 7.0 U1

    • After upgrading to vSphere 7.0.1, and then performing a vSphere Namespaces update of the Supervisor Cluster, running the command "kubectl get virtualmachineclasses" did not list the new VM class sizes 2x-large, 4x-large, 8x-large. This has been resolved and all Supervisor Clusters will be configured with the correct set of default VM Classes.

What's New October 6, 2020 

October 6, 2020 Build Information

ESXi 7.0 | 06 OCT 2020 | ISO Build 16850804

vCenter Server 7.0 | 06 OCT 2020 | ISO Build 16860138

New Features

  • Supervisor Cluster
    • Configuration of Supervisor Clusters with vSphere networking – We introduced vSphere networking for Supervisor Clusters, enabling you to deliver a developer-ready platform using your existing network infrastructure.
    • Support of HAproxy load balancer for setting up Supervisor Clusters with vSphere networking – If you configure Supervisor Clusters with vSphere networking, you need to add a load balancer to handle your modern workloads. You can deploy and set up your load balancer with an HAproxy OVA.
    • Management of Supervisor Cluster lifecycle using vSphere Lifecycle Manager – For Supervisor Clusters configured with vSphere networking, you can use vSphere Lifecycle Manager for infrastructure configuration and lifecycle management.
    • Opportunity to try vSphere with Tanzu on your hardware – We now offer you an in-product-trial if you want to enable a Supervisor Cluster on your hardware and test this modern application platform at no additional cost.
       
  • Tanzu Kubernetes Grid Service for vSphere
    • Exposure of Kubernetes versions to DevOps users — We introduced a new 'TanzuKubernetesRelease' custom resource definition in the Supervisor Cluster. This custom resource definition provides detailed information to the DevOps user about the Kubernetes versions they can use in their Tanzu Kubernetes clusters.
    • Integration of VMware Container Networking with Antrea for Kubernetes – We integrated a commercially supported version Antrea as the default Container Network Interface (CNI) for new Tanzu Kubernetes clusters. Antrea brings a comprehensive suite of enterprise network policy features to Tanzu Kubernetes Grid Service. For more details, read the release announcement. While Antrea is the default CNI, vSphere administrators and DevOps users can still choose Calico as the CNI for Tanzu Kubernetes clusters.
    • Support of Supervisor cluster environments that use vSphere networking – We now support Supervisor Cluster environments that use vSphere networking so you can leverage your existing network infrastructure.

Resolved Issues

  • No listing. This is a feature release.

What's New August 25, 2020 

August 25, 2020 Build Information

ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942

vCenter Server 7.0 | 25 AUG 2020 | ISO Build 16749653

New Features

  • None, this is simply a bug-fix release.

Resolved Issues

  • High CPU utilization upon upgrading to the July 30 patch
    • vCenter Server generates a high CPU utilization after upgrade to the July 30 patch. This issue is now fixed.
  • Supervisor cluster enablement failure due to certificate with Windows line endings
    • Enabling supervisor cluster can fail if there are Windows line endings in the certificate. This issue is now fixed.

What's New July 30, 2020 

July 30, 2020 Build Information

ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942

vCenter Server 7.0 | 30 JUL 2020 | ISO Build 16620007

New Features

  • Supervisor cluster: new version of Kubernetes, support for custom certificates and PNID changes
    • The Supervisor cluster now supports Kubernetes 1.18.2 (along with 1.16.7 and 1.17.4)
    • Replacing machine SSL certificates with custom certificates is now supported
    • vCenter PNID update is now supported when there are Supervisor clusters in vCenter Server
  • Tanzu Kubernetes Grid Service for vSphere: new features added for cluster scale-in, networking and storage
    • Cluster scale-in operation is now supported for Tanzu Kubernetes Grid service clusters
    • Ingress firewall rules are now enforced by default for all Tanzu Kubernetes Grid service clusters
    • New versions of Kubernetes shipping regularly asynchronously to vSphere patches, current versions are 1.16.8, 1.16.12, 1.17.7, 1.17.8
  • Network service: new version of NCP
    • SessionAffinity is now supported for ClusterIP services
    • IngressClass, PathType, and Wildcard domain are supported for Ingress in Kubernetes 1.18
    • Client Auth is now supported in Ingress Controller
  • Registry service: new version of Harbor
    • The Registry service now is upgraded to 1.10.3

For more information and instructions on how to upgrade, refer to the Updating vSphere with Tanzu Clusters documentation.

Resolved Issues

  • Tanzu Kubernetes Grid Service cluster NTP sync issue

What's New June 23, 2020 

June 23, 2020 Build Information

ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942

vCenter Server 7.0 | 23 JUN 2020 | ISO Build 16386292

Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd

New Features

  • None, this is simply a bug-fix release.

Resolved Issues

  • Tanzu Kubernetes Grid Service cluster upgrade failure
    • We have resolved an issue where upgrade a Tanzu Kubernetes Grid service cluster can failed due to "Error: unknown previous node"
  • Supervisor cluster upgrade failure
    • We have resolved an issue where a Supervisor cluster update may get stuck if the embedded Harbor is in a failed state

What's New May 19, 2020 

May 19, 2020 Build Information

ESXi 7.0 | 2 APR 2020 | ISO Build 15843807

vCenter Server 7.0 | 19 MAY 2020 | ISO Build 16189094

Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd

New Features

  • Tanzu Kubernetes Grid Service for vSphere: rolling upgrade and services upgrade
    • Customers can now perform rolling upgrades over their worker nodes and control plane nodes for the Tanzu Kubernetes Grid Service for vSphere, and upgrade the pvCSI, Calico, and authsvc services. This includes pre-checks and upgrade compatibility for this matrix of services.
    • Rolling upgrades can be used to vertically scale worker nodes, i.e. change the VM class of your worker nodes to a smaller or larger size.
  • Supervisor cluster: new versions of Kubernetes, upgrade supported
    • The Supervisor cluster now supports Kubernetes 1.17.4
    • The Supervisor cluster now supports upgrading from Kubernetes 1.16.x to 1.17.x

Resolved Issues

  • Naming conflict for deleted namespaces
    • We have resolved an issue where, if a user deleted a vSphere namespace and then created a new vSphere namespace with the same name, we had a naming collision that resulted in being unable to create Tanzu Kubernetes clusters.
  • Improved distribution names
    • We have made clearer which version of Kubernetes you are running by moving OVF versioning information to a separate column.

Build Information for the Initial vSphere with Kubernetes Release

April 2, 2020 Build Information

ESXi 7.0 | 2 APR 2020 | ISO Build 15843807

vCenter Server 7.0 | 2 APR 2020 | ISO Build 15952498

Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd

Learn About vSphere with Tanzu

VMware provides a variety of resources you can use to learn about vSphere with Tanzu.

  • Learn how to configure, manage, and use vSphere with Tanzu by reading vSphere with Tanzu Configuration and Mananagement. Designed for vSphere system administrators and DevOps teams, this guide provides details on vSphere with Tanzu architecture, services, licensing, system requirements, set up, and usage.

  • Use the VMware Compatibility Guides to learn about hardware compatibility and product interoperability for vSphere with Tanzu. vSphere with Tanzu has the same hardware requirements as vSphere 7.0. For certain configurations, it also requires the use of NSX-T Edge virtual machines, and those VMs have their own smaller subset of CPU compatibility. See the NSX-T Data Center Installation Guide for more information.

  • Find out what languages vSphere with Tanzu is available in by visiting the Internationalization section of the vSphere 7.0 Release Notes. These are the same languages VMware provides for vSphere.

  • View the copyrights and licenses for vSphere with Tanzu open source components by visiting the Open Source section of the vSphere 7.0 Release Notes. The vSphere 7.0 Release Notes also tell you where to download vSphere open source components.

Known Issues

The known issues are grouped as follows.

Supervisor Cluster
  • Pod creation sometimes fails on a Supervisor Cluster when DRS is set to Manual mode

    Clusters where you enable workload management also must have HA and automated DRS enabled. Enabling workload management on clusters where HA and DRS are not enabled or where DRS is running in manual mode can lead to inconsistent behavior and Pod creation failures.

    Workaround: Enable DRS on the cluster and set it to Fully Automate or Partially Automate. Also ensure that HA is enabled on the cluster.

  • Storage class appears when you run kubectl get sc even after you remove the corresponding storage policy

    If you run kubectl get sc after you create storage policy, add the policy to a namespace, and then remove the policy, the command response will still list the corresponding storage class.

    Workaround: Run kubectl describe namespace to see the storage classes actually associated with the namespace.

  • All storage classes returned when you run kubectl describe storage-class or kubectl get storage-class on a Supervisor Cluster instead of just the ones for the Supervisor namespace

    When you run the kubectl describe storage-class or kubectl get storage-class command on a Supervisor Cluster, the command returns all storage classes instead of just the ones for the Supervisor namespace.

    Workaround: Infer the storage class names associated with the namespace from the verbose name of the quota.

  • Share Kubernetes API endpoint button ignores FQDN even if it is configured

    Even if FQDN is configured for the Kubernetes control plane IP for Supervisor Cluster namespace, the share namespace button gives the IP address instead of the FQDN.

    Workaround: Manually share Supervisor Cluster namespace with FQDN.

  • During Supervisor cluster upgrade, extra vSphere Pods might be created and stuck at pending status if Daemon set is used

    During Supervisor cluster upgrade, Daemon set controller creates extra vSphere Pods for each Supervisor control plane node. This is caused by an upstream Kubernetes issue.

    Workaround: Add NodeSelector/NodeAffinity to vSphere Pod spec, so the Daemon set controller can skip the control plane nodes for pods creation.

  • Unable to access the load balancer via kubectl vSphere login

    You cannot access the api server via kubectl vSphere login when using a load balanced endpoint.

    Workaround: This issue can manifest in two ways.

    1. Check whether the api server is accessible through the control plane <curl -k https://vip:6443 (or 443)>

      1. If you are unable to access the load balancer from the api server, then the api server is not up yet.

      2. Workaround: Wait a few minutes for the api server to become accessible.

    2. Check if the edge virtual machine node status is up.

      1. Log in to the NSX Manager.

      2. Go to System > Fabric > Nodes > Edge Transport Nodes. The node status should be up.

      3. Go to Networking > Load Balancers > Virtual Servers. Find the vips that end with kube-apiserver-lb-svc-6443 and kube-apiserver-lb-svc-443. If their status is not up, use the following workaround.

      4. Workaround: Reboot the edge VM. The edge VM should reconfigure after the reboot.

  • Cluster configuration of vSphere with Tanzu shows timeout errors during configuration

    During the configuration of the cluster, you may see the following error messages:

    Api request to param0 failed

    or

    Config operation for param0 node VM timed out

    Workaround: None. Enabling vSphere with Tanzu can take from 30 to 60 minutes. If you see these or similar param0 timeout messages, they are not errors and can be safely ignored.

  • Enabling the container registry fails with error

    When the user enables the container registry from the UI, the enable action fails after 10 minutes with a timeout error.

    Workaround: Disable the container registry and retry to enable. Note that the timeout error may occur again.

  • Enabling a cluster after disabling it fails with error

    Enabling a cluster shortly after disabling the cluster may create a conflict in the service account password reset process. The enable action fails with an error.

    Workaround: Restart with the command vmon-cli --restart wcp.

  • Deleting a container image tag in an embedded container registry might delete all image tags that share the same physical container image

    Multiple images with different tags can be pushed to a project in an embedded container registry from the same container image. If one of the images on the project is deleted, all other images with different tags that are pushed from the same image will be deleted.

    Workaround: The operation cannot be undone. Push the image to the project again.

  • Failed purge operation on a registry project results in project being in 'error' state

    When you perform a purge operation on a registry project, the project temporarily displays as being in an error state. You will not be able to push or pull images from such project. At regular intervals, the project will be checked and all projects which are in error state will be deleted and recreated. When this happens, all previous project members will be added back to the recreated project and all the repositories and images which previously existed in the project will be deleted, effectively completing the purge operation.

    Workaround: None.

  • Container registry enablement fails when the storage capacity is less than 2000 mebibytes

    There is a minimum total storage capacity requirement for the container registry, addressed as the "limit" field in VMODL. This is because some Kubernetes pods need enough storage space to work properly. To achieve container registry functionality, there is a minimum capacity of 5 Gigabytes. Note that this limit offers no guarantee of improved performance or increased number or size of images that can be supported.

    Workaround: This issue can be avoided by deploying the container registry with a larger total capacity. The recommended storage volume is no less than 5 gigabytes.

  • If you replace the TLS certificate of the NSX load balancer for Kubernetes cluster you might fail to log in to the embedded Harbor registry from a docker client or the Harbor UI

    To replace the TLS certificate of the NSX load balancer for Kubernetes cluster, from the vSphere UI navigate to Configure > Namespaces > Certificates > NSX Load Balancer > Actions and click Replace Certificate. When you replace the NSX certificate, the login operation to the embedded Harbor registry from a docker client or the Harbor UI might fail with the unauthorized: authentication required or Invalid user name or password error.

    Workaround: Restart the registry agent pod in the vmware-system-registry namespace:

    1. Run the kubectl get pod -n vmware-system-registry command.
    2. Delete the pod output by running the kubectl delete pod vmware-registry-controller-manager-xxxxxx -n vmware-system-registry command.
    3. Wait until pod restarts.
  • Pods deployed with DNSDefault will use the clusterDNS settings

    Any vSphere pod deployed in supervisor clusters that makes use of the DNSDefault will fallback to using the clusterDNS configured for the cluster

    Workaround: None.

  • All hosts in a cluster might be updated simultaneously when upgrading a Supervisor Cluster

    In certain cases, all hosts in a cluster will be updated in parallel during the Supervisor Cluster upgrade process. This will cause downtime for all pods running on this cluster.

    Workaround: During Supervisor Cluster upgrade, don't restart wcpsvc or remove/add hosts.

  • Supervisor Cluster upgrade can be stuck indefinitely if VMCA is used as an intermediate CA

    Supervisor Cluster upgrade can be stuck indefinitely in "configuring" if VMCA is being used as an intermediate CA.

    Workaround: Switch to a non-intermediate CA for VMCA and delete any control plane VMs stuck in "configuring".

  • vSphere Pod deployment will failed if a Storage Policy with encryption enabled is assigned for Pod Ephemeral Disks

    If a Storage Policy with encryption enabled is used for Pod Ephemeral Disks, vSphere Pod creation will be failed with an “AttachVolume.Attach failed for volume” error.

    Workaround: Use a storage policy with no encryption for Pod Ephemeral Disks.

  • Supervisor Cluster upgrade hangs at 50% during "Namespaces cluster upgrade is in upgrade host step"

    The problem occurs when a vSphere Pod hangs at TERMINATING state during the upgrade of the Kubernetes control plane node. The controller of control plane node tries to upgrade the Spherelet process and during that phase vSphere Pods are being evicted or killed on that control plane node to unregister the node from the Kubernetes control plane. Because of this reason, the Supervisor Cluster upgrade hangs at an older version until vSphere Pods in TERMINATING state are removed from inventory.

    Workaround:

    1. Login to the ESXi host on which vSphere Pod is hanging in TERMINATING state.

    2. Remove the TERMINATING vSphere Pods by using following commands:

      # vim-cmd vmsvc/getallvms

      # vim-cmd vmsvc/destroy

        After this step, the vSphere Pods display in orphaned state in the vSphere Client.

    3. Delete the orphaned vSphere Pods by first adding a user to the ServiceProviderUsers group.

        a.) Login to the vSphere client, select Administration -> Users and Groups -> Create User, and click Groups.

        b.) Search for ServiceProviderUsers or the Administrators group and add a user to the group.

     4. Login to the vSphere Client by using the just created user and delete the orphaned vSphere Pods.

     5. In kubectl, use the following command:

       kubectl patch pod -p -n '{"metadata":{"finalizers":null}}'

  • Workload Management UI throws the following license error: None of the hosts connected to this vCenter are licensed for Workload Management

    After successfully enabling Workload Management on a vSphere Cluster, you might see the following licensing error after rebooting vCenter Server or upgrading ESXI hosts where Workload Management is enabled: None of the hosts connected to this vCenter are licensed for Workload Management.  This is a cosmetic UI error. Your license should still be valid and your workloads should still be running.

    Workaround: Users should clear their browser cache for the vSphere Client.

  • Large vSphere environments might take long to sync on a cloud with the VMware NSX Advanced Load Balancer Controller

    vSphere environments with inventories that contain more than 2,000 ESXi hosts and 45,000 virtual machines might take as much as 2 hours to sync on a cloud by using an NSX Advanced Load Balancer Controller.

    Workaround: none

  • The private container registry of the Supervisor Cluster might become unhealthy after VMware Certificate Authority (VMCA) root certificate is changed on a vCenter Server 7.0 Update 2

    After you change the VMware Certificate Authority (VMCA) root certificate on a vCenter Server system 7.0 Update 2, the private container registry of the Supervisor Cluster might become unhealthy and the registry operations might stop working as expected. The following health status message for the container registry is displayed on the cluster configuration UI:

    Harbor registry harbor-1560339792 on cluster domain-c8 is unhealthy. Reason: failed to get harbor health: Get https://30.0.248.2/api/health: x509: certificate signed by unknown authority

    Workaround:

    Restart the registry agent pod manually in the vmware-system-registry namespace on the vSphere kubernetes cluster:

    1. Run the kubectl get pod -n vmware-system-registry command to get registry agent pod.
    2. Delete the pod output by running the kubectl delete pod vmware-registry-controller-manager-xxxxxx -n vmware-system-registry command.
    3. Wait until the pod restarts.
    4. Refresh the image registry on the cluster configuration UI, and the health status should show as running shortly.
  • Projects for newly-created namespaces on the Supervisor Cluster are not automatically created on the private container registry 

    Projects might not be automatically created on the private container registry for newly-created namespaces on a Supervisor Cluster. The status of the container registry still displays as healthy, but no projects are shown on the container registry of the cluster when a new namespace is created. You cannot push or pull images to the projects of the new namespaces on the container registry.

    Workaround:

    1. Run the kubectl get pod -n vmware-system-registry command to get the registry agent pod.
    2. Delete the pod output by running the kubectl delete pod vmware-registry-controller-manager-xxxxxx -n vmware-system-registry command.
    3. Wait until pod restarts.
    4. Log in to the private container registry to verify that projects are created for namespaces on the cluster.
  • You might observe ErrImgPull while creating pods with 10 replicas

    You might get this issue, when trying to use a deployment with 10 replica pods in a YAML. When you try to create with this YAML by using the private container registry, out of 10 replicas, at least 7 might pass and 3 might fail with the "ErrImgPull" issue.
     

    Workaround: Use fewer replica sets, maximum 5.

  • The NSX Advanced Load Balancer Controller is not supported when vCenter Server is deployed with a custom port

    You cannot register vCenter Server with the NSX Advanced Load Balancer Controller as no option exists for providing a custom vCenter Server port in NSX Advanced Load Balanced Controller UI while registering.

    NSX Advanced Load Balancer Controller works only when vCenter Server is deployed with default ports 80 and 443.

  • When performing domain repointing on vCenter Server that already contains running Supervisor Clusters, the Supervisor Clusters will go in Configuring state

    Domain repointing is not supported on vCenter Server that has Supervisor Clusters. When trying to perform domain repointing, Existing Supervisor Clusters will go in Configuring state and control plane VMs and Tanzu Kubernetes cluster VMs stop appearing in the inventory under the Hosts and Clusters view.
     

    Workaround: None

Networking
  • NSX Edge virtual machine deployment fails on slow networks

    There is a combined 60 minute timeout for NSX Edge OVF deployment and NSX Edge VM registration. In slower networks or environments with slower storage, if the time elapsed for Edge deployment and registration exceeds this 60 minute timeout, the operation will fail.

    Workaround: Clean up edges and restart the deployment.

  • NSX Edges are not updated if vCenter Server DNS, NTP, or Syslog settings are changed after cluster configuration

    DNS, NTP, and Syslog settings are copied from vCenter Server to NSX Edge virtual machines during cluster configuration. If any of these vCenter Server settings are changed after configuration, the NSX Edges are not updated.

    Workaround: Use the NSX Manager APIs to update the DNS, NTP, and Syslog settings of your NSX Edges.

  • NSX Edge Management Network Configuration only provides subnet and gateway configuration on select portgroups

    The NSX Edge management network compatibility drop down list will show subnet and gateway information only if there are ESXi VMKnics configured on the host that are backed by a DVPG on the selected VDS. If you select a Distributed Portgroup without a VMKnic attached to it, you must provide a subnet and gateway for the network configuration.

    Workaround: Use one of the following configurations:

    • Discreet Portgroup: This is where no VMKs currently reside. You must supply the appropriate subnet and gateway information for this portgroup.

    • Shared Management Portgroup: This is where the ESXi hosts' Management VMK resides. Subnet and gateway information will be pulled automatically.

  • Unable to use VLAN 0 during cluster configuration

    When attempting to use VLAN 0 for overlay Tunnel Endpoints or uplink configuration, the operation fails with the message:

    Argument 'uplink_network vlan' is not a valid VLAN ID for an uplink network. Please use a VLAN ID between 1-4094

    Workaround: Manually enable VLAN 0 support using one of the following processes:

    1. SSH into your deployed VC (root/vmware).

    2. Open /etc/vmware/wcp/nsxdsvc.yaml. It will have content similar to:

    logging: 
      level: debug
      maxsizemb: 10 

    a. To enable VLAN0 support for NSX Cluster Overlay Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.

    experimental:
     supportedvlan: 
      hostoverlay: 
        min: 0 
        max: 4094 
      edgeoverlay: 
        min: 1 
        max: 4094 
      edgeuplink: 
        min: 1 
        max: 4094 

    b. To enable VLAN0 support for NSX Edge Overlay Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.

    experimental: 
     supportedvlan: 
      hostoverlay: 
        min: 1 
        max: 4094 
      edgeoverlay: 
        min: 0 
        max: 4094 
      edgeuplink: 
        min: 1 
        max: 4094 

    c. To enable VLAN0 support for NSX Edge Uplink Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.

    experimental: 
     supportedvlan: 
      hostoverlay: 
        min: 1 
        max: 4094 
      edgeoverlay: 
        min: 1 
        max: 4094 
      edgeuplink: 
        min: 0 
        max: 4094 

    3. Restart the workload management service with vmon-cli --restart wcp.

  • vSphere with Tanzu and NSX-T cannot be enabled on a cluster where vSphere Lifecycle Manager Image is enabled

    vSphere with Tanzu and NSX-T are not compatible with vSphere Lifecycle Manager Image. They are only compatible with vSphere Lifecycle Manage Baselines. When vSphere Lifecycle Manager Image is enabled on a cluster, you cannot enable vSphere with Tanzu or NSX-T on that cluster.

    Workaround: Move hosts to a cluster where vSphere Lifecycle Manager Image is disabled. You must use a cluster with vSphere Lifecycle Manager Baselines. Once the hosts are moved, you can enable NSX-T and then vSphere with Tanzu on that new cluster.

  • When vSphere with Tanzu networking is configured with NSX-T, "ExternalTrafficPolicy: local" not supported

    For Kubernetes service of type LoadBalancer, the "ExternalTrafficPolicy: local" configuration is not supported.

    Workaround: None.

  • When vSphere with Tanzu networking is configured with NSX-T, the number of services of type LoadBalancer that a Tanzu Kuberetes cluster can support is limited by the NodePort range of the Supervisor Cluster

    Each VirtualMachineService of type LoadBalancer is translated to one Kubernetes service of type LoadBalancer and one Kubernetes endpoint. The maximum number of Kubernetes services of type LoadBalancer that can be created in a Supervisor Cluster is 2767, this includes those created on the Supervisor Cluster itself and those created in Tanzu Kubernetes clusters.

    Workaround: None.

  • NSX Advanced Load Balancer Controller does not support changing the vCenter Server PNID

    Once you configure the Supervisor Cluster with the NSX Advanced Load Balancer, you cannot change the vCenter Server PNID.

    Workaround: If you must change the PNID of vCenter Server, remove the NSX Advanced Load Balancer Controller and change for vCenter Server PNID, then redeploy and configure NSX Advanced Load Balancer Controller with new PNID of vCenter Server.

  • In vSphere Distributed Switch (vDS) environments, it is possible to configure Tanzu Kubernetes clusters with network CIDR ranges that overlap or conflict with those of the Supervisor Cluster, and vice versa, resulting in components not being able to communicate.

    In vDS environments, there is no design-time network validation done when you configure the CIDR ranges for the Supervisor Cluster, or when you configure the CIDR ranges for Tanzu Kubernetes clusters. As a result, two problems can arise:

    1) You create a Supervisor Cluster with CIDR ranges that conflict with the default CIDR ranges reserved for Tanzu Kubernetes clusters.

    2) You create a Tanzu Kubernetes cluster with a custom CIDR range that overlaps with the CIDR range used for the Supervisor Clusters.

    Workaround: For vDS environments, when you configure a Supervisor Cluster, do not use either of the default CIDR ranges used for Tanzu Kubernetes clusters, including 192.168.0.0/16, which is reserved for services, and 10.96.0.0/12, which is reserved for pods. See also "Configuration Parameters for Tanzu Kubernetes Clusters" in the vSphere with Tanzu documentation.

    For vDS environments, when you create a Tanzu Kubernetes cluster, do not use the same CIDR range that is used for the Supervisor Cluster.

VMware Tanzu Kubernetes Grid Service for vSphere
  • A Tanzu Kubernetes cluster hangs in "Updating" state after Supervisor Cluster upgrade

    When a Supervisor Cluster is upgraded, it can trigger a rolling update of all the Tanzu Kubernetes clusters to propagate any new configuration settings. During this process, a previously "Running" TKC Cluster might hang in the "Updating" phase. A "Running" Tanzu Kubernetes cluster only indicates the availability of the control plane and it is possible that the required control plane and worker nodes have not been successfully created. Such a Tanzu Kubernetes cluster might fail the health checks that are performed during the rolling update that initiates upon completion of the Supervisor Cluster upgrade. This results in the Tanzu Kubernetes cluster hanging in the "Updating" phase and can be confirmed by looking at the events on the KubeadmControlPlane resources associated with the Tanzu Kubernetes Cluster. The events emitted by the resource will be similar to the one below:

    Warning ControlPlaneUnhealthy 2m15s (x1026 over 5h42m) kubeadm-control-plane-controller Waiting for control plane to pass control plane health check to continue reconciliation: machine's (gc-ns-1597045889305/tkg-cluster-3-control-plane-4bz9r) node (tkg-cluster-3-control-plane-4bz9r) was not checked

    Workaround: None.

  • Tanzu Kubernetes cluster continues to access removed storage policy

    When a VI Admin deletes a storage class on from the vCenter Server namespace, access to that storage class is not removed for any Tanzu Kubernetes cluster that is already using it.

    Workaround:

    1. As VI Admin, after deleting a storage class from the vCenter Server namespace, create a new storage policy with the same name.

    2. Re-add the existing storage policy or the one you just recreated to the supervisor namespace. TanzuKubernetesCluster instances using this storage class should now be fully-functional.

    3. For each TanzuKubernetesCluster resource using the storage class you wish to delete, create a new TanzuKubernetesCluster instance using a different storage class and use Velero to migrate workloads into the new cluster.

    4. Once no TanzuKubernetesCluster or PersistentVolume uses the storage class, it can be safely removed.

  • The embedded container registry SSL certificate is not copied to Tanzu Kubernetes cluster nodes

    When the embedded container registry is enabled for a Supervisor Cluster, the Harbor SSL certificate is not included in any Tanzu Kubernetes cluster nodes created on that SC, and you cannot connect to the registry from those nodes.

    Workaround: Copy and paste the SSL certificate from the Supervisor Cluster control plane to the Tanzu Kubernetes cluster worker nodes.

  • Virtual machine images are not available from the content library

    When multiple vCenter Server instances are configured in an Embedded Linked Mode setup, the UI allows the user to select a content library created on a different vCenter Server instance. Selecting such a library results in virtual machine images not being available for DevOps users to provision a Tanzu Kubernetes cluster. In this case, `kubectl get virtualmachineimages` does not return any results.

    Workaround: When you associate a content library with the Supervisor Cluster for Tanzu Kubernetes cluster VM images, choose a library that is created in the same vCenter Server instance where the Supervisor Cluster resides. Alternatively, create a local content library which also supports air-gapped provisioning of Tanzu Kubernetes clusters.

  • You cannot provision new Tanzu Kubernetes clusters, or scale out existing clusters, because the Content Library subscriber cannot synchronize with the publisher.

    When you set up a Subscribed Content Library for Tanzu Kubernetes cluster OVAs, an SSL certificate is generated, and you are prompted to manually trust the certificate by confirming the certificate thumbprint. If the SSL certificate is changed after the initial library setup, the new certificate must be trusted again by updating the thumbprint.

    Edit the settings of the Subscribed Content Library. This will initiate a probe of the subscription URL even though no change is requested on the library. The probe will discover that the SSL certificate is not trusted and prompt you to trust it.

  • Tanzu Kubernetes Release version 1.16.8 is incompatible with vSphere 7 U1.

    The Tanzu Kubernetes Release version 1.16.8 is incompatible with vSphere 7 U1. You must update Tanzu Kubernetes clusters to a later version before performing a vSphere Namespaces update to U1.

    Before performing a vSphere Namespaces update to the vSphere 7 U1 release, update each Tanzu Kubernetes cluster running version 1.16.8 to a later version. Refer to the topic "Supported Update Path" in the vSphere with Tanzu documentation for more information.

  • After upgrading the Workload Control Plane to vSphere 7 U1, new VM Class sizes are not available.

    Description: After upgrading to vSphere 7.0.1, and then performing a vSphere Namespaces update of the Supervisor Cluster, for Tanzu Kubernetes clusters, running the command "kubectl get virtualmachineclasses" does not list the new VM class sizes 2x-large, 4x-large, 8x-large.

    Workaround: None. The new VM classes sizes can only be used with a new installation of the Workload Control Plane.

  • The Tanzu Kubernetes Release version 1.17.11 vmware.1-tkg.1 times out connecting to the cluster DNS server when using the Calico CNI.

    The Tanzu Kubernetes Release version v1.17.11+vmware.1-tkg.1 has a Photon OS kernel issue that prevents the image from working as expected with the Calico CNI.

    Workaround: For Tanzu Kubernetes Release version 1.17.11, the image identified as "v1.17.11+vmware.1-tkg.2.ad3d374.516" fixes the issue with Calico. To run Kubernetes 1.17.11, use this version instead of "v1.17.11+vmware.1-tkg.1.15f1e18.489". Alternatively, use a different Tanzu Kubernetes Release, such as version 1.18.5 or 1.17.8 or 1.16.14.

  • When vSphere with Tanzu networking is configured with NSX-T Data Center, updating an "ExternalTrafficPolicy: Local" Service to "ExternalTrafficPolicy: Cluster" will render this Service's LB IP inaccessible on SV Masters​

    When a LoadBalancer type Kubernetes Service is initially created in workload clusters with ExternalTrafficPolicy: Local, and later updated to ExternalTrafficPolicy: Cluster, access to this Service's LoadBalancer IP on the Supervisor Cluster VMs will be dropped.

    Workaround: Delete the Service and recreate it with ExternalTrafficPolicy: Cluster.

  • High CPU usage on Tanzu Kubernetes cluster control plane nodes

    known issue exists in the Kubernetes upstream project where occasionally kube-controller-manager goes into a loop resulting in high CPU usage which might effect functionality of Tanzu Kubernetes clusters. You might notice that the process, kube-controller-manager, is consuming a larger than expected amount of CPU and is outputting repeated logs indicating failed for updating Node.Spec.PodCIDRs.

    Workaround: Delete the kube-controller-manager pod that sits inside the control plane node with such an issue. The pod will be recreated and the issue should not reappear.

  • You cannot update Tanzu Kubernetes clusters created with K8s 1.16 to 1.19

    Kubelet's configuration file is generated at the time kubeadm init is run and then replicated during cluster upgrades. At the time of 1.16, kubeadm init generates a config file that set resolvConf to /etc/resolv.conf that was then overwritten by a the command-line flag --resolv-conf pointing at /run/systemd/resolve/resolv.conf. During 1.17 and 1.18, kubeadm continues to configure Kubelet with the correct --resolv-conf. As of 1.19, kubeadm no longer configures the command line flag and instead relies on the Kubelet configuration file. Due to the replication process during cluster upgrades,  a 1.19 cluster upgraded from 1.16 will include a config file where resolvConf points at /etc/resolv.conf instead of /run/systemd/resolve/resolv.conf.

    Workaround: Before upgrading a Tanzu Kubernetes cluster to 1.19, reconfigure the Kubelet configuration file to point to the correct resolv.conf. Manually duplicate the ConfigMap kubelet-config-1.18 to kubelet-config-1.19 in the kube-system namespace then modify that new ConfigMap's data to point resolvConf at /run/systemd/resolve/resolv.conf.

  • When the Supervisor Cluster networking is configured with NSX-T, after updating a service from "ExternalTrafficPolicy: Local" to "ExternalTrafficPolicy: Cluster", requests made on the Supervisor Cluster control plane nodes to this service's load balancer IP fail

    When you create a service on a Tanzu Kubernetes cluster with ExternalTrafficPolicy: Local and later updated the service to ExternalTrafficPolicy: Cluster, kube-proxy creates an IP table rule incorrectly on the Supervisor Cluster control plane nodes to block traffic destined to the service's LoadBalancer IP. For example, if this service has LoadBalancer IP 192.182.40.4, the following IP table rule is created on any one of the control plane nodes:

    -A KUBE-SERVICES -d 192.182.40.4/32 -p tcp -m comment --comment "antrea-17-171/antrea-17-171-c1-2bfcfe5d9a0cdea4de6eb has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable

    As a result, access to that IP is dropped.

    Workaround: Delete the service and create it anew with ExternalTrafficPolicy: Cluster.

  • After you enable HTTP Proxy and/or Trust settings in the TkgServiceConfiguration specification, all pre-existing clusters without Proxy/Trust settings will inherit the global Proxy/Trust settings when they are updated.

    You can edit the TkgServiceConfiguration specification to configure the TKG Service, including specifying the default CNI, HTTP Proxy, and Trust certificates. Any configuration changes you make to the TkgServiceConfiguration specification apply globally to any Tanzu Kuberentes cluster provisioned or updated by that service. You cannot opt-out of the global configuration using per-cluster settings.

    For example, if you edit the TkgServiceConfiguration specification and enable an HTTP Proxy, all new clusters provisioned by that cluster inherit those proxy settings. In addition, all pre-existing clusters without a proxy server inherit the global proxy configuration when the cluster is modified or updated. In the case of HTTP/S proxy, which supports per-cluster configuration, you can update the cluster spec with a different proxy server, but you cannot remove the global proxy setting. If the HTTP Proxy is set globally, you must either use it or overwrite it with a different proxy server.

    Workaround: Understand that the TkgServiceConfiguration specification applies globally. If you don't want all clusters to use an HTTP Proxy, don't enable it at the global level. Do so at the cluster level.

  • In very large Supervisor Cluster deployments with many Tanzu Kubernetes Clusters and VMs, vmop-controller-manager pods might fail due to OutOfMemory resulting in the inability to lifecycle manage Tanzu Kubernetes Clusters

    Within the Supervisor Cluster, the vmop-controller-manager pod is responsible for managing the lifecycle of the VMs that make up Tanzu Kubernetes Clusters.  At very large numbers of such VMs (>850 VMs per Supervisor Cluster), the vmop-controller-manager pod can go into an OutOfMemory CrashLoopBackoff.  When this occurs, lifecycle management of Tanzu Kubernetes Clusters are disrupted until the vmop-controller-manager pod resumes operations.

    Reduce the total number Tanzu Kubernetes Cluster worker nodes managed in a Supervisor Cluster either by deleting clusters or scaling down clusters.

Storage
  • NEW: An expansion of a Supervisor cluster PVC in offline or online mode does not result in an expansion of a corresponding Tanzu Kubernetes cluster PVC

    A pod that uses the Tanzu Kubernetes cluster PVC cannot use the expanded capacity of the Supervisor cluster PVC because the filesystem has not been resized.

    Workaround: Resize the Tanzu Kubernetes cluster PVC to a size equal or greater than the size of the Supervisor cluster PVC.

  • NEW: Size mismatch in statically provisioned TKG PVC when compared to underlying volume

    Static provisioning in Kubernetes does not verify if the PV and backing volume sizes are equal. If you statically create a PVC in a Tanzu Kubernetes cluster, and the PVC size is less than the size of the underlying corresponding Supervisor cluster PVC, you might be able to use more space than the space you request in the PV. If the size of the PVC you statically create in the Tanzu Kubernetes cluster is greater than the size of the underlying Supervisor cluster PVC, you might notice No space left on device error even before you exhaust the requested size in the Tanzu Kubernetes cluster PV.

    Workaround

    1. In the Tanzu Kubernetes cluster PV, change the persistentVolumeReclaimPolicy to Retain.
    2. Note the volumeHandle of the Tanzu Kubernetes cluster PV and then delete the PVC and PV in the Tanzu Kubernetes cluster.
    3. Re-create the Tanzu Kubernetes cluster PVC and PV statically using the volumeHandle and set the storage to the same size as the size of the corresponding Supervisor cluster PVC.
  • Attempts to create a PVC from a supervisor namespace or a TKG cluster fail if the external csi.vsphere.vmware.com provisioner loses its lease for leader election

    When you try to create a PVC from a supervisor namespace or a TKG cluster using the kubectl command, your attempts might not succeed. The PVC remains in the Pending state. If you describe the PVC, the Events field displays the following:

    
    Type       Reason                  Age                    From                            Message
    ----       ------                  ---                    ----                            -------
    Normal     ExternalProvisioning    56s (x121 over 30m)    persistentvolume-controller     waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
    

    Workaround:

    1. Verify that all containers in the vsphere-csi-controller pod inside the vmware-system-csi namespace are running.
      kubectl describe pod vsphere-csi-controller-pod-name -n vmware-system-csi
    2. Check the external provisioner logs by using the following command.
      kubectl logs vsphere-csi-controller-pod-name -n vmware-system-csi -c csi-provisioner
      The following entry indicates that the external-provisioner sidecar container lost its leader election:
      I0817 14:02:59.582663       1 leaderelection.go:263] failed to renew lease vmware-system-csi/csi-vsphere-vmware-com: failed to tryAcquireOrRenew context deadline exceeded
      F0817 14:02:59.685847       1 leader_election.go:169] stopped leading
      
    3. Delete this instance of vsphere-csi-controller.
      kubectl delete pod vsphere-csi-controller-pod-name -n vmware-system-csi

    Kubernetes will create a new instance of the CSI controller and all sidecars will be reinitialized.

check-circle-line exclamation-circle-line close-line
Scroll to top icon