check-circle-line exclamation-circle-line close-line

What's in the Release Notes

Updated on: October 14, 2020

The release notes cover the following topics:

 

What's New

VMware vSphere with Tanzu has monthly patches to introduce new features and capabilities, provide updates to Kubernetes and other services, keep up with upstream, and to resolve reported issues. Here we document what each monthly patch introduces.

 

What's New October 6, 2020 

October 6, 2020 Build Information

ESXi 7.0 | 06 OCT 2020 | ISO Build 16850804

vCenter Server 7.0 | 06 OCT 2020 | ISO Build 16860138

New Features

  • Supervisor Cluster
    • Configuration of Supervisor Clusters with vSphere networking – We introduced vSphere networking for Supervisor Clusters, enabling you to deliver a developer-ready platform using your existing network infrastructure.
    • Support of HAproxy load balancer for setting up Supervisor Clusters with vSphere networking – If you configure Supervisor Clusters with vSphere networking, you need to add a load balancer to handle your modern workloads. You can deploy and set up your load balancer with an HAproxy OVA.
    • Management of Supervisor Cluster lifecycle using vSphere Lifecycle Manager – For Supervisor Clusters configured with vSphere networking, you can use vSphere Lifecycle Manager for infrastructure configuration and lifecycle management.
    • Opportunity to try vSphere with Tanzu on your hardware – We now offer you an in-product-trial if you want to enable a Supervisor Cluster on your hardware and test this modern application platform at no additional cost.
       
  • Tanzu Kubernetes Grid Service for vSphere
    • Exposure of Kubernetes versions to DevOps users — We introduced a new 'TanzuKubernetesRelease' custom resource definition in the Supervisor Cluster. This custom resource definition provides detailed information to the DevOps user about the Kubernetes versions they can use in their Tanzu Kubernetes clusters.
    • Integration of VMware Container Networking with Antrea for Kubernetes – We integrated a commercially supported version Antrea as the default Container Network Interface (CNI) for new Tanzu Kubernetes clusters. Antrea brings a comprehensive suite of enterprise network policy features to Tanzu Kubernetes Grid Service. For more details, read the release announcement. While Antrea is the default CNI, vSphere administrators and DevOps users can still choose Calico as the CNI for Tanzu Kubernetes clusters.
    • Support of Supervisor cluster environments that use vSphere networking – We now support Supervisor Cluster environments that use vSphere networking so you can leverage your existing network infrastructure.

Resolved Issues

  • No listing. This is a feature release.

What's New August 25, 2020 

August 25, 2020 Build Information

ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942

vCenter Server 7.0 | 25 AUG 2020 | ISO Build 16749653

New Features

  • None, this is simply a bug-fix release.

Resolved Issues

  • High CPU utilization upon upgrading to the July 30 patch
    • vCenter Server generates a high CPU utilization after upgrade to the July 30 patch. This issue is now fixed.
  • Supervisor cluster enablement failure due to certificate with Windows line endings
    • Enabling supervisor cluster can fail if there are Windows line endings in the certificate. This issue is now fixed.

What's New July 30, 2020 

July 30, 2020 Build Information

ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942

vCenter Server 7.0 | 30 JUL 2020 | ISO Build 16620007

New Features

  • Supervisor cluster: new version of Kubernetes, support for custom certificates and PNID changes
    • The Supervisor cluster now supports Kubernetes 1.18.2 (along with 1.16.7 and 1.17.4)
    • Replacing machine SSL certificates with custom certificates is now supported
    • vCenter PNID update is now supported when there are Supervisor clusters in vCenter Server
  • Tanzu Kubernetes Grid Service for vSphere: new features added for cluster scale-in, networking and storage
    • Cluster scale-in operation is now supported for Tanzu Kubernetes Grid service clusters
    • Ingress firewall rules are now enforced by default for all Tanzu Kubernetes Grid service clusters
    • New versions of Kubernetes shipping regularly asynchronously to vSphere patches, current versions are 1.16.8, 1.16.12, 1.17.7, 1.17.8
  • Network service: new version of NCP
    • SessionAffinity is now supported for ClusterIP services
    • IngressClass, PathType, and Wildcard domain are supported for Ingress in Kubernetes 1.18
    • Client Auth is now supported in Ingress Controller
  • Registry service: new version of Harbor
    • The Registry service now is upgraded to 1.10.3

For more information and instructions on how to upgrade, refer to the Updating vSphere with Tanzu Clusters documentation.

Resolved Issues

  • Tanzu Kubernetes Grid Service cluster NTP sync issue

What's New June 23, 2020 

June 23, 2020 Build Information

ESXi 7.0 | 23 JUN 2020 | ISO Build 16324942

vCenter Server 7.0 | 23 JUN 2020 | ISO Build 16386292

Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd

New Features

  • None, this is simply a bug-fix release.

Resolved Issues

  • Tanzu Kubernetes Grid Service cluster upgrade failure
    • We have resolved an issue where upgrade a Tanzu Kubernetes Grid service cluster can failed due to "Error: unknown previous node"
  • Supervisor cluster upgrade failure
    • We have resolved an issue where a Supervisor cluster update may get stuck if the embedded Harbor is in a failed state

What's New May 19, 2020 

May 19, 2020 Build Information

ESXi 7.0 | 2 APR 2020 | ISO Build 15843807

vCenter Server 7.0 | 19 MAY 2020 | ISO Build 16189094

Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd

New Features

  • Tanzu Kubernetes Grid Service for vSphere: rolling upgrade and services upgrade
    • Customers can now perform rolling upgrades over their worker nodes and control plane nodes for the Tanzu Kubernetes Grid Service for vSphere, and upgrade the pvCSI, Calico, and authsvc services. This includes pre-checks and upgrade compatibility for this matrix of services.
    • Rolling upgrades can be used to vertically scale worker nodes, i.e. change the VM class of your worker nodes to a smaller or larger size.
  • Supervisor cluster: new versions of Kubernetes, upgrade supported
    • The Supervisor cluster now supports Kubernetes 1.17.4
    • The Supervisor cluster now supports upgrading from Kubernetes 1.16.x to 1.17.x

Resolved Issues

  • Naming conflict for deleted namespaces
    • We have resolved an issue where, if a user deleted a vSphere namespace and then created a new vSphere namespace with the same name, we had a naming collision that resulted in being unable to create Tanzu Kubernetes clusters.
  • Improved distribution names
    • We have made clearer which version of Kubernetes you are running by moving OVF versioning information to a separate column.

Build Information for the Initial vSphere with Kubernetes Release

April 2, 2020 Build Information

ESXi 7.0 | 2 APR 2020 | ISO Build 15843807

vCenter Server 7.0 | 2 APR 2020 | ISO Build 15952498

Tanzu Kubernetes clusters OVA: v1.16.8+vmware.1-tkg.3.60d2ffd

Learn About vSphere with Tanzu

VMware provides a variety of resources you can use to learn about vSphere with Tanzu.

  • Learn how to configure, manage, and use vSphere with Tanzu by reading vSphere with Tanzu Configuration and Mananagement. Designed for vSphere system administrators and DevOps teams, this guide provides details on vSphere with Tanzu architecture, services, licensing, system requirements, set up, and usage.

  • Use the VMware Compatibility Guides to learn about hardware compatibility and product interoperability for vSphere with Tanzu. vSphere with Tanzu has the same hardware requirements as vSphere 7.0. For certain configurations, it also requires the use of NSX-T Edge virtual machines, and those VMs have their own smaller subset of CPU compatibility. See the NSX-T Data Center Installation Guide for more information.

  • Find out what languages vSphere with Tanzu is available in by visiting the Internationalization section of the vSphere 7.0 Release Notes. These are the same languages VMware provides for vSphere.

  • View the copyrights and licenses for vSphere with Tanzu open source components by visiting the Open Source section of the vSphere 7.0 Release Notes. The vSphere 7.0 Release Notes also tell you where to download vSphere open source components.

Known Issues

The known issues are grouped as follows.

Supervisor Cluster
  • Pod creation sometimes fails on a Supervisor Cluster when DRS is set to Manual mode

    Clusters where you enable workload management also must have HA and automated DRS enabled. Enabling workload management on clusters where HA and DRS are not enabled or where DRS is running in manual mode can lead to inconsistent behavior and Pod creation failures.

    Workaround: Enable DRS on the cluster and set it to Fully Automate or Partially Automate. Also ensure that HA is enabled on the cluster.

  • Storage class appears when you run kubectl get sc even after you remove the corresponding storage policy

    If you run kubectl get sc after you create storage policy, add the policy to a namespace, and then remove the policy, the command response will still list the corresponding storage class.

    Workaround: Run kubectl describe namespace to see the storage classes actually associated with the namespace.

  • All storage classes returned when you run kubectl describe storage-class or kubectl get storage-class on a Supervisor Cluster instead of just the ones for the Supervisor namespace

    When you run the kubectl describe storage-class or kubectl get storage-class command on a Supervisor Cluster, the command returns all storage classes instead of just the ones for the Supervisor namespace.

    Workaround: Infer the storage class names associated with the namespace from the verbose name of the quota.

  • Share Kubernetes API endpoint button ignores FQDN even if it is configured

    Even if FQDN is configured for the Kubernetes control plane IP for Supervisor Cluster namespace, the share namespace button gives the IP address instead of the FQDN.

    Workaround: Manually share Supervisor Cluster namespace with FQDN.

  • During Supervisor cluster upgrade, extra vSphere Pods might be created and stuck at pending status if Daemon set is used

    During Supervisor cluster upgrade, Daemon set controller creates extra vSphere Pods for each Supervisor control plane node. This is caused by an upstream Kubernetes issue.

    Workaround: Add NodeSelector/NodeAffinity to vSphere Pod spec, so the Daemon set controller can skip the control plane nodes for pods creation.

  • Unable to access the load balancer via kubectl vSphere login

    You cannot access the api server via kubectl vSphere login when using a load balanced endpoint.

    Workaround: This issue can manifest in two ways.

    1. Check whether the api server is accessible through the control plane <curl -k https://vip:6443 (or 443)>

      1. If you are unable to access the load balancer from the api server, then the api server is not up yet.

      2. Workaround: Wait a few minutes for the api server to become accessible.

    2. Check if the edge virtual machine node status is up.

      1. Log in to the NSX Manager.

      2. Go to System > Fabric > Nodes > Edge Transport Nodes. The node status should be up.

      3. Go to Networking > Load Balancers > Virtual Servers. Find the vips that end with kube-apiserver-lb-svc-6443 and kube-apiserver-lb-svc-443. If their status is not up, use the following workaround.

      4. Workaround: Reboot the edge VM. The edge VM should reconfigure after the reboot.

  • Cluster configuration of vSphere with Tanzu shows timeout errors during configuration

    During the configuration of the cluster, you may see the following error messages:

    Api request to param0 failed

    or

    Config operation for param0 node VM timed out

    Workaround: None. Enabling vSphere with Tanzu can take from 30 to 60 minutes. If you see these or similar param0 timeout messages, they are not errors and can be safely ignored.

  • Enabling the container registry fails with error

    When the user enables the container registry from the UI, the enable action fails after 10 minutes with a timeout error.

    Workaround: Disable the container registry and retry to enable. Note that the timeout error may occur again.

  • Enabling a cluster after disabling it fails with error

    Enabling a cluster shortly after disabling the cluster may create a conflict in the service account password reset process. The enable action fails with an error.

    Workaround: Restart with the command vmon-cli --restart wcp.

  • Deleting a container image tag in an embedded container registry might delete all image tags that share the same physical container image

    Multiple images with different tags can be pushed to a project in an embedded container registry from the same container image. If one of the images on the project is deleted, all other images with different tags that are pushed from the same image will be deleted.

    Workaround: The operation cannot be undone. Push the image to the project again.

  • The embedded container registry of a Kubernetes cluster on vSphere might fail to enable with error

    Some pods in the embedded container registry namespace might attach PVC (persistent volume claim) volumes during pod startup. When such a pod fails, new pods might not be able to start up because the pod is unable to attach PVC volume. In these cases, you will see an error message in the pod events, such as Failed to attach cns volume or The resource 'volume' is in use. This can happen with pod failures during the embedded container registry enablement or after the enablement.

    Workaround: Delete all failed pods in the container registry namespace.

  • Failed purge operation on a registry project results in project being in 'error' state

    When you perform a purge operation on a registry project, the project temporarily displays as being in an error state. You will not be able to push or pull images from such project. At regular intervals, the project will be checked and all projects which are in error state will be deleted and recreated. When this happens, all previous project members will be added back to the recreated project and all the repositories and images which previously existed in the project will be deleted, effectively completing the purge operation.

    Workaround: None.

  • Container registry enablement fails when the storage capacity is less than 2000 mebibytes

    There is a minimum total storage capacity requirement for the container registry, addressed as the "limit" field in VMODL. This is because some Kubernetes pods need enough storage space to work properly. To achieve container registry functionality, there is a minimum capacity of 5 Gigabytes. Note that this limit offers no guarantee of improved performance or increased number or size of images that can be supported.

    Workaround: This issue can be avoided by deploying the container registry with a larger total capacity. The recommended storage volume is no less than 5 gigabytes.

  • If you replace the TLS certificate of the NSX load balancer for Kubernetes cluster you might fail to log in to the embedded Harbor registry from a docker client or the Harbor UI

    To replace the TLS certificate of the NSX load balancer for Kubernetes cluster, from the vSphere UI navigate to Configure > Namespaces > Certificates > NSX Load Balancer > Actions and click Replace Certificate. When you replace the NSX certificate, the login operation to the embedded Harbor registry from a docker client or the Harbor UI might fail with the unauthorized: authentication required or Invalid user name or password error.

    Workaround: Restart the registry agent pod in the vmware-system-registry namespace:

    1. Run the kubectl get pod -n vmware-system-registry command.
    2. Delete the pod output by running the kubectl delete pod vmware-registry-controller-manager-xxxxxx -n vmware-system-registry command.
    3. Wait until pod restarts.
  • Pods deployed with DNSDefault will use the clusterDNS settings

    Any vSphere pod deployed in supervisor clusters that makes use of the DNSDefault will fallback to using the clusterDNS configured for the cluster

    Workaround: None.

  • All hosts in a cluster might be updated simultaneously when upgrading a Supervisor Cluster

    In certain cases, all hosts in a cluster will be updated in parallel during the Supervisor Cluster upgrade process. This will cause downtime for all pods running on this cluster.

    Workaround: During Supervisor Cluster upgrade, don't restart wcpsvc or remove/add hosts.

  • Supervisor Cluster upgrade can be stuck indefinitely if VMCA is used as an intermediate CA

    Supervisor Cluster upgrade can be stuck indefinitely in "configuring" if VMCA is being used as an intermediate CA.

    Workaround: Switch to a non-intermediate CA for VMCA and delete any control plane VMs stuck in "configuring".

  • vSphere Pod deployment will be failed if a Storage Policy with encryption enabled is assigned for Pod Ephemeral Disks

    If a Storage Policy with encryption enabled is used for Pod Ephemeral Disks, vSphere Pod creation will be failed with an “AttachVolume.Attach failed for volume” error.

    Workaround: Use a storage policy with no encryption for Pod Ephemeral Disks.

  • Supervisor Cluster upgrade hangs at 50% during "Namespaces cluster upgrade is in upgrade host step"

    The problem occurs when a vSphere Pod hangs at TERMINATING state during the upgrade of the Kubernetes control plane node. The controller of control plane node tries to upgrade the Spherelet process and during that phase vSphere Pods are being evicted or killed on that control plane node to unregister the node from the Kubernetes control plane. Because of this reason, the Supervisor Cluster upgrade hangs at an older version until vSphere Pods in TERMINATING state are removed from inventory.

    Workaround:

    1. Login to the ESXi host on which vSphere Pod is hanging in TERMINATING state.

    2. Remove the TERMINATING vSphere Pods by using following commands:

      # vim-cmd vmsvc/getallvms

      # vim-cmd vmsvc/destroy

        After this step, the vSphere Pods display in orphaned state in the vSphere Client.

    3. Delete the orphaned vSphere Pods by first adding a user to the ServiceProviderUsers group.

        a.) Login to the vSphere client, select Administration -> Users and Groups -> Create User, and click Groups.

        b.) Search for ServiceProviderUsers or the Administrators group and add a user to the group.

     4. Login to the vSphere Client by using the just created user and delete the orphaned vSphere Pods.

     5. In kubectl, use the following command:

       kubectl patch pod -p -n '{"metadata":{"finalizers":null}}'

Networking
  • NSX Edge virtual machine deployment fails on slow networks

    There is a combined 60 minute timeout for NSX Edge OVF deployment and NSX Edge VM registration. In slower networks or environments with slower storage, if the time elapsed for Edge deployment and registration exceeds this 60 minute timeout, the operation will fail.

    Workaround: Clean up edges and restart the deployment.

  • NSX Edges are not updated if vCenter Server DNS, NTP, or Syslog settings are changed after cluster configuration

    DNS, NTP, and Syslog settings are copied from vCenter Server to NSX Edge virtual machines during cluster configuration. If any of these vCenter Server settings are changed after configuration, the NSX Edges are not updated.

    Workaround: Use the NSX Manager APIs to update the DNS, NTP, and Syslog settings of your NSX Edges.

  • NSX Edge Management Network Configuration only provides subnet and gateway configuration on select portgroups

    The NSX Edge management network compatibility drop down list will show subnet and gateway information only if there are ESXi VMKnics configured on the host that are backed by a DVPG on the selected VDS. If you select a Distributed Portgroup without a VMKnic attached to it, you must provide a subnet and gateway for the network configuration.

    Workaround: Use one of the following configurations:

    • Discreet Portgroup: This is where no VMKs currently reside. You must supply the appropriate subnet and gateway information for this portgroup.

    • Shared Management Portgroup: This is where the ESXi hosts' Management VMK resides. Subnet and gateway information will be pulled automatically.

  • Unable to use VLAN 0 during cluster configuration

    When attempting to use VLAN 0 for overlay Tunnel Endpoints or uplink configuration, the operation fails with the message:

    Argument 'uplink_network vlan' is not a valid VLAN ID for an uplink network. Please use a VLAN ID between 1-4094

    Workaround: Manually enable VLAN 0 support using one of the following processes:

    1. SSH into your deployed VC (root/vmware).

    2. Open /etc/vmware/wcp/nsxdsvc.yaml. It will have content similar to:

    logging: 
      level: debug
      maxsizemb: 10 

    a. To enable VLAN0 support for NSX Cluster Overlay Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.

    experimental:
     supportedvlan: 
      hostoverlay: 
        min: 0 
        max: 4094 
      edgeoverlay: 
        min: 1 
        max: 4094 
      edgeuplink: 
        min: 1 
        max: 4094 

    b. To enable VLAN0 support for NSX Edge Overlay Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.

    experimental: 
     supportedvlan: 
      hostoverlay: 
        min: 1 
        max: 4094 
      edgeoverlay: 
        min: 0 
        max: 4094 
      edgeuplink: 
        min: 1 
        max: 4094 

    c. To enable VLAN0 support for NSX Edge Uplink Networks, append the following lines to /etc/vmware/wcp/nsxdsvc.yaml and save the file.

    experimental: 
     supportedvlan: 
      hostoverlay: 
        min: 1 
        max: 4094 
      edgeoverlay: 
        min: 1 
        max: 4094 
      edgeuplink: 
        min: 0 
        max: 4094 

    3. Restart the workload management service with vmon-cli --restart wcp.

  • vSphere with Tanzu and NSX-T cannot be enabled on a cluster where vSphere Lifecycle Manager Image is enabled

    vSphere with Tanzu and NSX-T are not compatible with vSphere Lifecycle Manager Image. They are only compatible with vSphere Lifecycle Manage Baselines. When vSphere Lifecycle Manager Image is enabled on a cluster, you cannot enable vSphere with Tanzu or NSX-T on that cluster.

    Workaround: Move hosts to a cluster where vSphere Lifecycle Manager Image is disabled. You must use a cluster with vSphere Lifecycle Manager Baselines. Once the hosts are moved, you can enable NSX-T and then vSphere with Tanzu on that new cluster.

  • When vSphere with Tanzu networking is configured with NSX-T, "ExternalTrafficPolicy: local" not supported

    For Kubernetes service of type LoadBalancer, the "ExternalTrafficPolicy: local" configuration is not supported.

    Workaround: None.

  • When vSphere with Tanzu networking is configured with NSX-T, the number of services of type LoadBalancer that a Tanzu Kuberetes cluster can support is limited by the NodePort range of the Supervisor Cluster

    Each VirtualMachineService of type LoadBalancer is translated to one Kubernetes service of type LoadBalancer and one Kubernetes endpoint. The maximum number of Kubernetes services of type LoadBalancer that can be created in a Supervisor Cluster is 2767, this includes those created on the Supervisor Cluster itself and those created in Tanzu Kubernetes clusters.

    Workaround: None.

  • In vSphere Distributed Switch (vDS) environments, it is possible to configure Tanzu Kubernetes clusters with network CIDR ranges that overlap or conflict with those of the Supervisor Cluster, and vice versa, resulting in components not being able to communicate.

    In vDS environments, there is no design-time network validation done when you configure the CIDR ranges for the Supervisor Cluster, or when you configure the CIDR ranges for Tanzu Kubernetes clusters. As a result, two problems can arise:

    1) You create a Supervisor Cluster with CIDR ranges that conflict with the default CIDR ranges reserved for Tanzu Kubernetes clusters.

    2) You create a Tanzu Kubernetes cluster with a custom CIDR range that overlaps with the CIDR range used for the Supervisor Clusters.

    Workaround:

    For vDS environments, when you configure a Supervisor Cluster, do not use either of the default CIDR ranges used for Tanzu Kubernetes clusters, including 192.168.0.0/16, which is reserved for services, and 10.96.0.0/12, which is reserved for pods. See also "Configuration Parameters for Tanzu Kubernetes Clusters" in the vSphere with Tanzu documentation.

    For vDS environments, when you create a Tanzu Kubernetes cluster, do not use the same CIDR range that is used for the Supervisor Cluster.

VMware Tanzu Kubernetes Grid Service for vSphere
  • A Tanzu Kubernetes cluster hangs in "Updating" state after Supervisor Cluster upgrade

    When a Supervisor Cluster is upgraded, it can trigger a rolling update of all the Tanzu Kubernetes clusters to propagate any new configuration settings. During this process, a previously "Running" TKC Cluster might hang in the "Updating" phase. A "Running" Tanzu Kubernetes cluster only indicates the availability of the control plane and it is possible that the required control plane and worker nodes have not been successfully created. Such a Tanzu Kubernetes cluster might fail the health checks that are performed during the rolling update that initiates upon completion of the Supervisor Cluster upgrade. This results in the Tanzu Kubernetes cluster hanging in the "Updating" phase and can be confirmed by looking at the events on the KubeadmControlPlane resources associated with the Tanzu Kubernetes Cluster. The events emitted by the resource will be similar to the one below:

    Warning ControlPlaneUnhealthy 2m15s (x1026 over 5h42m) kubeadm-control-plane-controller Waiting for control plane to pass control plane health check to continue reconciliation: machine's (gc-ns-1597045889305/tkg-cluster-3-control-plane-4bz9r) node (tkg-cluster-3-control-plane-4bz9r) was not checked

    Workaround: None.

  • Tanzu Kubernetes cluster continues to access removed storage policy

    When a VI Admin deletes a storage class on from the vCenter Server namespace, access to that storage class is not removed for any Tanzu Kubernetes cluster that is already using it.

    Workaround:

    1. As VI Admin, after deleting a storage class from the vCenter Server namespace, create a new storage policy with the same name.

    2. Re-add the existing storage policy or the one you just recreated to the supervisor namespace. TanzuKubernetesCluster instances using this storage class should now be fully-functional.

    3. For each TanzuKubernetesCluster resource using the storage class you wish to delete, create a new TanzuKubernetesCluster instance using a different storage class and use Velero to migrate workloads into the new cluster.

    4. Once no TanzuKubernetesCluster or PersistentVolume uses the storage class, it can be safely removed.

  • The embedded container registry SSL certificate is not copied to Tanzu Kubernetes cluster nodes

    When the embedded container registry is enabled for a Supervisor Cluster, the Harbor SSL certificate is not included in any Tanzu Kubernetes cluster nodes created on that SC, and you cannot connect to the registry from those nodes.

    Workaround: Copy and paste the SSL certificate from the Supervisor Cluster control plane to the Tanzu Kubernetes cluster worker nodes.

  • Post upgrade from Tanzu Kubernetes Grid 1.16.8 to 1.17.4, the "guest-cluster-auth-svc" pod on one of the control plane nodes is stuck at "Container Creating" state.

    After updating a Tanzu Kubernetes Cluster from Tanzu Kubernetes Grid Service 1.16.8 to 1.17.4, the "guest-cluster-auth-svc" pod on one of the cluster control plane nodes is stuck at "Container Creating" state.

    Workaround:

    1. SSH to one of the Tanzu Kuberenets cluster control plane nodes by following the instructions in the documentation topic titled "SSH to Tanzu Kubernetes Cluster Nodes as the System User."

    2. Once you are logged in as the `vmware-system-user` user, run the command "sudo su -" to switch to the root user.

    3. Run the following command: "KUBECONFIG=/etc/kubernetes/admin.conf /usr/lib/vmware-wcpgc-manifests/generate_key_and_csr.sh"

    4. After a few minutes, all authsvc pods should be be running.

  • User is unable to manage existing pods on a Tanzu Kubernetes cluster during or after performing a cluster update.

    User is unable to manage existing pods on a Tanzu Kubernetes cluster during or after performing a cluster update.

    Workaround:

    1. SSH to one of the Tanzu Kuberenets cluster control plane nodes by following the instructions in the documentation topic titled "SSH to Tanzu Kubernetes Cluster Nodes as the System User."

    2. Once you are logged in as the `vmware-system-user` user, run the command "sudo su -" to switch to the root user.

    3. Run the following command: "KUBECONFIG=/etc/kubernetes/admin.conf /usr/lib/vmware-wcpgc-manifests/generate_key_and_csr.sh"

    4. After a few minutes, all authsvc pods should be be running.

  • Virtual machine images are not available from the content library

    When multiple vCenter Server instances are configured in an Embedded Linked Mode setup, the UI allows the user to select a content library created on a different vCenter Server instance. Selecting such a library results in virtual machine images not being available for DevOps users to provision a Tanzu Kubernetes cluster. In this case, `kubectl get virtualmachineimages` does not return any results.

    Workaround: When you associate a content library with the Supervisor Cluster for Tanzu Kubernetes cluster VM images, choose a library that is created in the same vCenter Server instance where the Supervisor Cluster resides. Alternatively, create a local content library which also supports air-gapped provisioning of Tanzu Kubernetes clusters.

  • Tanzu Kubernetes cluster Upgrade Job fails with "timed out waiting for etcd health check to pass."

    The upgrade job in the vmware-system-tkg namespace associated with the upgrade of a Tanzu Kubernetes cluster fails with the following error message "timed out waiting for etcd health check to pass." The issue is caused by the missing PodIP addresses for the etcd pods.

    Workaround:

    Restart kubelet on the affected nodes, causing the etcd pods to restart and receive a PodIP. Then, run the following recovery steps to recover from a failed upgrade. Before attempting these steps, contact VMware support for guidance.

    1) For any Machine that was upgraded successfully, but didn't have the original Machine removed.

    • Remove the original Machine's node reference from etcd's member list
    • Delete the original Machine (leaving the newly upgraded one) 

    2) For any Machine that is unhealthy:

    • Retrieve the TanzuKubernetesCluster's resource version (.metadata.resourceVersion).
    • Retrieve the list of Machines with the annotation: "upgrade.cluster-api.vmware.com/id". These are the upgraded nodes from the previous upgrade attempt.
    • Update the annotation to match the resource version (not required if there's no difference).
    • Delete the upgrade Job belonging to the cluster.
    • Verify that the upgrade resumes.
  • Antrea CNI not supported in current TKC version

    While provisioning a Tanzu Kubernetes cluster, you receive the error "Antrea CNI not supported in current TKC version."

    Option 1 (recommended): Update the Tanzu Kubernetes cluster to use the OVA version that supports Antrea (v1.17.8 or later).

    Option 2: In the Tanzu Kubernetes cluster specification YAML, enter "calico" in the spec.settings.network.cni section.

    Option 3: Change the default CNI to Calico. Refer to the topic in the documentation on how to do this.

  • You cannot provision new Tanzu Kubernetes clusters, or scale out existing clusters, because the Content Library subscriber cannot synchronize with the publisher.

    When you set up a Subscribed Content Library for Tanzu Kubernetes cluster OVAs, an SSL certificate is generated, and you are prompted to manually trust the certificate by confirming the certificate thumbprint. If the SSL certificate is changed after the initial library setup, the new certificate must be trusted again by updating the thumbprint.

    Edit the settings of the Subscribed Content Library. This will initiate a probe of the subscription URL even though no change is requested on the library. The probe will discover that the SSL certificate is not trusted and prompt you to trust it.

  • Tanzu Kubernetes Release version 1.16.8 is incompatible with vSphere 7 U1.

    The Tanzu Kubernetes Release version 1.16.8 is incompatible with vSphere 7 U1. You must update Tanzu Kubernetes clusters to a later version before performing a vSphere Namespaces update to U1.

    Before performing a vSphere Namespaces update to the vSphere 7 U1 release, update each Tanzu Kubernetes cluster running version 1.16.8 to a later version. Refer to the topic "Supported Update Path" in the vSphere with Tanzu documentation for more information.

  • After upgrading the Workload Control Plane to vSphere 7 U1, new VM Class sizes are not available.

    Description: After upgrading to vSphere 7.0.1, and then performing a vSphere Namespaces update of the Supervisor Cluster, for Tanzu Kubernetes clusters, running the command "kubectl get virtualmachineclasses" does not list the new VM class sizes 2x-large, 4x-large, 8x-large.

    Workaround: None. The new VM classes sizes can only be used with a new installation of the Workload Control Plane.

  • The Tanzu Kubernetes Release version 1.17.11 vmware.1-tkg.1 times out connecting to the cluster DNS server when using the Calico CNI.

    The Tanzu Kubernetes Release version v1.17.11+vmware.1-tkg.1 has a Photon OS kernel issue that prevents the image from working as expected with the Calico CNI.

    Workaround: For Tanzu Kubernetes Release version 1.17.11, the image identified as "v1.17.11+vmware.1-tkg.2.ad3d374.516" fixes the issue with Calico. To run Kubernetes 1.17.11, use this version instead of "v1.17.11+vmware.1-tkg.1.15f1e18.489". Alternatively, use a different Tanzu Kubernetes Release, such as version 1.18.5 or 1.17.8 or 1.16.14.

Storage
  • Attempts to create a PVC from a supervisor namespace or a TKG cluster fail if the external csi.vsphere.vmware.com provisioner loses its lease for leader election

    When you try to create a PVC from a supervisor namespace or a TKG cluster using the kubectl command, your attempts might not succeed. The PVC remains in the Pending state. If you describe the PVC, the Events field displays the following:

    
    Type       Reason                  Age                    From                            Message
    ----       ------                  ---                    ----                            -------
    Normal     ExternalProvisioning    56s (x121 over 30m)    persistentvolume-controller     waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
    

    Workaround:

    1. Verify that all containers in the vsphere-csi-controller pod inside the vmware-system-csi namespace are running.
      kubectl describe pod vsphere-csi-controller-pod-name -n vmware-system-csi
    2. Check the external provisioner logs by using the following command.
      kubectl logs vsphere-csi-controller-pod-name -n vmware-system-csi -c csi-provisioner
      The following entry indicates that the external-provisioner sidecar container lost its leader election:
      I0817 14:02:59.582663       1 leaderelection.go:263] failed to renew lease vmware-system-csi/csi-vsphere-vmware-com: failed to tryAcquireOrRenew context deadline exceeded
      F0817 14:02:59.685847       1 leader_election.go:169] stopped leading
      
    3. Delete this instance of vsphere-csi-controller.
      kubectl delete pod vsphere-csi-controller-pod-name -n vmware-system-csi

    Kubernetes will create a new instance of the CSI controller and all sidecars will be reinitialized.