The release notes cover the following topics:
The release notes cover the following topics:
VMware vSphere 8.0.2 | 21 SEP 2023 ESXi 8.0.2 | 21 SEP 2023 | Build 22380479 vCenter Server 8.0.2 | 21 SEP 2023 | Build 22385739 VMware NSX Advanced Load Balancer avi-22.1.3 | 31 JAN 2023 HAProxy Community Edition 2.2.2, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.2.0 | 18 MAY 2023 |
vSphere with Tanzu 8.0 Update 2 introduces the following new features and enhancements:
Supervisor
VM Service now supports VMs with Windows OS, GPUs, and all other options available for traditional vSphere VMs - The VM Service now supports deploying VMs with any configuration currently supported with vSphere VMs, achieving complete parity with traditional VMs on vSphere but for VMs deployed as part of Infrastructure as a Service on the Supervisor. This includes support for provisioning Windows VMs alongside Linux VMs, as well as any hardware, security, device, custom or multi-NIC support, and passthrough devices that are supported on vSphere. You can now provision workload VMs by using GPUs to support AI/ML workloads.
VM Image Service - DevOps teams, developers, and other VM consumers can publish and manage VM images in a self-service manner by using the VM Image Service. The service allows consumers to publish, modify, and delete images by using K8s APIs within a Supervisor Namespace scoped image registry. The VM Image Service is created automatically in each CCS region and CCS project, and population of images to the image registry is scoped by persona and consumption level, either on global or project level. Images can be used for the deployment of VMs through the VM Service.
Note that this functionality introduces a new format for the VM image name. For information on how to resolve potential issues caused by the name change, see Changes in the VM image name format in vSphere 8.0 U2.
Import and export the Supervisor configuration - In previous versions, activating the Supervisor was a manual step-wise process without the ability to save any configurations. In the current release, you can now export and share the Supervisor configuration with peers in a human-readable format or within a source control system, import configurations to a new Supervisor, and replicate a standard configuration across multiple Supervisors. Checkout the documentation for details on how to export and import the Supervisor configuration.
Improved GPU utilization by reducing fragmentation - Workload placement is now GPU aware, and DRS will try to place workloads with similar profile requirements on the same host. This improves resource utilization, which reduces cost as fewer GPU hardware resources must be acquired to achieve a desired level of performance.
Supervisor supports Kubernetes 1.26 - This release adds support for Kubernetes 1.26 and drops the support for Kubernetes 1.23. The supported versions of Kubernetes in this release are 1.26, 1.25, and 1.24. Supervisors running on Kubernetes version 1.23 will be auto-upgraded to version 1.24 to ensure that all your Supervisors are running on the supported versions of Kubernetes.
Support of NSX Advanced Load Balancer for a Supervisor configured with NSX networking - You can now enable a Supervisor with NSX Advanced Load Balancer (Avi Networks) for L4 load balancing, as well as load balancing for the control plane nodes of Supervisor and Tanzu Kubernetes Grid clusters with NSX networking. Checkout the documentation page for guidance on configuring the NSX Advanced Load Balancer with NSX.
Telegraf Support for Metric and Event Streaming - You can now configure Telegraf via Kubernetes APIs to push Supervisor metrics to any metrics services that are compatible with the embedded Telegraf version. See the documentation for configuring Telegraf.
Tanzu Kubernetes Grid on Supervisor
STIG compliance for TKRs on vSphere 8.0 - With vSphere U2 all the Tanzu Kubernetes Grid clusters above 1.23.x are STIG (Security Technical Implementation Guide) compliant and included documentation for exceptions that you can be find here. These improvements represent a significant step towards compliance process simplification, and make it much easier for you to satisfy compliance requirements so that you can quickly and confidently use Tanzu Kubernetes Grid in the US Federal market and in other regulated industries.
Turn on control plane rollout for expiring certificates – The v1beta1 API for provisioning TKG clusters based on a ClusterClass is updated to enable clusters to automatically renew their control plane node VM certificates before they expire. This configuration can be added as a variable to the cluster specification. Refer to the
Cluster v1beta1 API documentation for more information.
CSI snapshot support for TKRs - TKG clusters provisioned with Tanzu Kubernetes release 1.26.5 and above support CSI volume snapshots, helping you achieve your data protection requirements. Volume snapshots provide you with a standardized way to copy a volume's contents at a particular point in time without creating an entirely new volume.
Installing and Managing Tanzu Packages – A new consolidated repository and publication for installing and managing Tanzu Packages on TKG clusters. Refer to the publication "Installing and Using VMware Tanzu Packages" for all your package needs.
Custom ClusterClass improvements – The workflow for implementing custom ClusterClass clusters is simplified for vSphere 8 U2.
Rolling Updates for TKG clusters – When upgrading to vSphere 8 U2, you can expect rolling updates for provisioned TKG clusters under the following scenarios:
When upgrading from any previously released vSphere 8 version to vSphere 8 U2, because:
vSphere 8 U2 contains Kubernetes-level STIG changes for TKRs as a part of the clusterclass
TKG clusters from 1.23 and above will undergo a rolling update to be made compatible with v8.0 U2
When upgrading from any vSphere 7 version to vSphere 8 U2, because:
Underlying CAPI providers need to be moved from CAPW to CAPV
Existing clusters need to be migrated from classless CAPI clusters to class-based CAPI clusters
Resolved Issues
Audit log files under /var/log/audit/ on the Supervisor control plane VMs may grow to a very large size and fill up the root disk. You should see "no space left on device" errors in journald logs reflecting this state. This can cause various aspects of Supervisor control plane functionality (like kubernetes APIs) to fail.
VMware vSphere 8.0.1c | 27 JUL 2023 ESXi 8.0.1c | 27 JUL 2023 | Build 22088125 vCenter Server 8.0.1c | 27 JUL 2023 | Build 22088981 VMware NSX Advanced Load Balancer avi-22.1.3 | 31 JAN 2023 HAProxy Community Edition 2.2.2, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.2.0 | 18 MAY 2023 |
New Features
Supervisors support Kubernetes 1.25 - This release adds support for Kubernetes 1.25 and drops the support for Kubernetes 1.22.
Tanzu Kubernetes Grid 2.2.0 on Supervisor - Manage Tanzu Kubernetes Grid 2.2.0 clusters on Supervisor.
Supported Kubernetes Versions
The supported versions of Kubernetes in this release are 1.25, 1.24, and 1.23. Supervisors running on Kubernetes version 1.22 will be auto-upgraded to version 1.23 to ensure that all your Supervisors are running on the supported versions of Kubernetes.
Support
vRegistry Creation Deprecation - The creation of embedded Harbor instance, vRegistry, is deprecated. Existing vRegistry instances will continue to function, but the creation of new vRegistry instances is deprecated. The vRegistry creation APIs have been marked as deprecated, and the ability to deploy new vRegistry instances will be removed in an upcoming release. Instead, we recommend using Harbor as a Supervisor Service to manage your container images and repositories for enhanced performance and functionality. To migrate existing vRegistry to Harbor as a Supervisor Service, see “Migrate Images from the Embedded Registry to Harbor” for migration details.
Resolved Issues
A new alert message will be displayed in the vSphere Client to warn about expiring certificated on the Supervisor or TKG clusters. The alert will provide detailed information, including the name of the Supervisor and the certificate expiration date. Additionally, it will contain a link to KB 90627 that explains step-by-step how to replace impacted certificates.
Command kubectl get clustervirtualmachineimages
returning an error or No resources found
. In previous versions, when using the command kubectl get clustervirtualmachineimages
, an error was encountered. However, after upgrading to 8.0 Update 1c, the command now returns the message: No resources found
. To retrieve information about virtual machine images, use the following command instead: kubectl get virtualmachineimages
The antrea-nsx-routed CNI does not work with v1alpha3 Tanzu Kubernetes clusters on vSphere with Tanzu 8.x releases.
Node drain timeout is not propogated correctly for v1beta1 Clusters.
VMware vSphere 8.0.1 | 18 APR 2023 ESXi 8.0.1 | 18 APR 2023 | Build 21495797 vCenter Server 8.0.1 | 18 APR 2023 | Build 21560480 VMware NSX Advanced Load Balancer avi-22.1.3 | 31 JAN 2023 HAProxy Community Edition 2.2.2, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
New Features
Supervisor
Supervisor Services are now available on VDS based Supervisors - Previously the avalaiblity of Supervisor Services was restricted to NSX based Supervisors only. With the current release, you can deploy Harbor, Contour, S3 object storage, and Velero Supervisor Services on VDS based Supervisors. Note: Supervisor Service capabilities on VDS based Supervisors require an ESXi update to 8.0 U1.
VM Service Support for all Linux Images - You can now use CloudInit to customize any linux image in OVF format conformant to the VM Service image specification as well as utilize OVF templating through vAppConfig to enable the deployment of legacy Linux images.
Web-Console Support for VM Service VMs - After deploying a VM Service VM, as a DevOps engineer you will now have the ability to launch a Web-console session for that VM by using kubectl CLI to enable troubleshooting and debugging within the guest OS without involving the vSphere admin to get access to guest VM.
Supervisor Compliance - Security Technical Implementation Guides (STIG) for vSphere with Tanzu 8 Supervisor releases. See Tanzu STIG Hardening for details.
Tanzu Kubernetes Grid 2.0 on Supervisor
Custom cluster class - Bring your own cluster class for TKG clusters on Supervisor. For information, see https://kb.vmware.com/s/article/91826.
Custom image for TKG node - Build your own custom node images using vSphere TKG Image Builder (Ubuntu and Photon).
Note: To use a specific TKR with the v1alpha1 API use the fullVersion.
New TKR Images: Refer to the Tanzu Kubernetes releases Release Notes for details.
CRITICAL REQUIREMENT for vSphere with Tanzu 8.0.0 GA Customers
Note: This requirement does not apply to content libraries you use for VMs provisioned through VM Service. It only applies to the TKR content library.
If you have deployed vSphere with Tanzu 8.0.0 GA, before upgrading to vSphere with Tanzu 8 U1 you must create a temporary TKR content library to avoid a known issue that causes TKG Controller pods to go into CrashLoopBackoff when TKG 2.0 TKrs are pushed to the existing content library. To avoid this issue, complete the following steps.
Create a new subscribed content library with a temporary subscription URL pointing to https://wp-content.vmware.com/v2/8.0.0/lib.json.
Synchronize all the items in the temporary content library.
Associate the temporary content library with each vSphere Namespace where you have deployed a TKG 2 cluster.
Run the command kubectl get tkr
and verify that all the TKrs are created.
At this point the TKG Controller should be in a running state, which you can verify by listing the pods in the Supervisor namesapce.
If the TKG Controller is in CrashLoopBackOff (CLBO) state, restart the TKG Controller deployment using the following command:
kubectl rollout restart deployment -n vmware-system-tkg vmware-system-tkg-controller-manager
Upgrade to vSphere with Tanzu 8 Update 1.
Update each vSphere Namespace to use the original subscribed content library at https://wp-content.vmware.com/v2/latest/lib.json.
Resolved Issues
Tanzu Kubernetes Grid 2.0 clusters provisioned with the v1beta1 API must be based on the default ClusterClass
If you are creating a Tanzu Kubernetes Grid 2.0 cluster on Supervisor by using the v1beta1 API, the Cluster must be based on the default tanzukubernetescluster
ClusterClass. The system does not reconcile a cluster based on a different ClusterClass.
ESXi 8.0.0c | 30 MAR 2023 | Build 21493926 vCenter Server 8.0.0c | 30 MAR 2023 | Build 21457384 VMware NSX Advanced Load Balancer avi-22.1.3 | 31 JAN 2023 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
Resolved Issues:
Harbor insecure default configuration issue
This release resolves the Harbor insecure default configuration issue which is present if you have enabled the embedded Harbor registry on Supervisor 7.0 or 8.0.
Resolved in this vCenter version 8.0.0c. VMware Knowledge Base article 91452 for details of this issue and how to address it by either installing this release or by applying a temporary workaround.
After an upgrade to vCenter Server 8.0b, login attempts to a Supervisor and TKG clusters fail
Components running in vCenter Server 8.0b are not backward compatible with Supervisors deployed using vCenter Server in earlier releases.
Workaround: Upgrade vCenter Server to a newer version or upgrade all deployed Supervisors.
ESXi 8.0.0b | 14 FEB 2023 | Build 21203435 vCenter Server 8.0.0b | 14 FEB 2023 | Build 21216066 VMware NSX Advanced Load Balancer avi-22.1.1-9052 | 15 JULY 2022 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
New Features:
Added Cert-Manager CA cluster issuer support – Gives administrators the ability to define and deploy a Cluster Scoped CA Issuer on a Supervisor as a Supervisor Service. Deploying a Cluster Scoped CA Issuer enables consumers of Supervisor namespace to request and manage certificates signed by the CA issuer.
In addition to this new feature, the release delivers bug fixes.
Resolved Issues:
Supervisor Control plane VMs root disk fills up
Audit log files under /var/log/audit/ on the Supervisor control plane VMs may grow to a very large size and fill up the root disk. You should see "no space left on device" errors in journald logs reflecting this state. This can cause various aspects of Supervisor control plane functionality (like kubernetes APIs) to fail.
Resolved in this version, vSphere 8.0.0b
ESXi 8.0 | 11 OCT 2022 | Build 20513097 vCenter Server 8.0.0a | 15 DEC 2022 | Build 20920323 VMware NSX Advanced Load Balancer avi-22.1.1-9052 | 15 JULY 2022 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
New Features
Added Harbor as a Supervisor Service – Provides a full featured Harbor (OCI Image Registry) instance running on the Supervisor. The Harbor instance gives Harbor administrators the ability to create and manage projects and users as well as set up image scanning.
Deprecation of vRegistry – The embedded Harbor instance known as vRegistry will be removed in a future release. In its place, you can use Harbor as a Supervisor Service. See "Migrate Images from the Embedded Registry to Harbor" for migration details.
Supervisors support Kubernetes 1.24 – This release adds Kubernetes 1.24 support and drops Kubernetes 1.21 support. The Kubernetes versions supported in this release are 1.24, 1.23, and 1.22. Supervisor Clusters running on Kubernetes version 1.21 are auto-upgraded to version 1.22 to ensure that all your Supervisor Clusters run on supported versions.
vSphere Zone APIs – A vSphere Zone is a construct that lets you assign availability zones to vSphere clusters to make highly available Supervisor and Tanzu Kubernetes Clusters. In vSphere 8.0, Creating and managing vSphere zone functionality was available from the vSphere Client. With vSphere 8.0.0a release, users can create and manage vSphere zones using DCLI or vSphere client API Explorer. Full SDK binding support (for example, Java, Python, and so forth) will be available in a future release.
Upgrade Considerations:
A rolling update of Tanzu Kubernetes Clusters will occur in the following Supervisor upgrade scenarios:
When upgrading from 8.0 to 8.0.0a followed by upgrading a Supervisor from any Kubernetes release to Supervisor Kubernetes 1.24, and when one of these conditions are met.
If you are using proxy settings with a nonempty noProxy list on a Tanzu Kubernetes Cluster
If vRegistry is enabled on the Supervisor. This setup is only available on NSX-based Supervisors.
When upgrading from any vSphere 7.0 release to vSphere 8.0.0a.
Resolved Issues:
Tanzu 7 license keys are supported in vSphere 8 instead of Tanzu 8 license keys
vCenter 8.0 supports the Tanzu 7 license keys instead of Tanzu 8 license keys. This issue does not impact your ability to fully use Tanzu features on vCenter 8.0. For more details, see VMware Knowledge Base article 89839 before modifying Tanzu licenses in your vCenter 8.0 deployment.
LoadBalancers and Guest Clusters are not created when two SE Groups exist on NSX-ALB.
If a second SE Group is added to NSX-ALB with or without SEs or virtual services assigned to it, the creation of new supervisor or guest clusters fails and existing supervisor clusters cannot be upgraded. The virtual service creation on NSX-ALB controller fails with the following error:
get() returned more than one ServiceEngineGroup – it returned 2
As a result, new load balancers are unusable and you cannot create new workload clusters successfully. For more information, see VMware Knowledge Base article 90386.
VMware vSphere 8.0 | 11 OCT 2022 ESXi 8.0 | 11 OCT 2022 | Build 20513097 vCenter 8.0 | 11 OCT 2022 | Build 20519528 VMware NSX Advanced Load Balancer 21.1.4 | 07 APR 2022 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
vSphere with Tanzu 8.0 introduces the following new features and enhancements:
Workload Management Configuration Monitoring - You can now track the status of activation, deactivation, and upgrade of the Supervisor in greater detail. Once you initiate a Supervisor activation, deactivation, or upgrade, the Supervisor tries to reach the desired state by reaching various conditions associated with different components of the Supervisor, such as Control Plane VMs. You can track the status of each condition, view the associated warnings, retry the status, view which conditions are reached, and their time stamps.
vSphere Zones - A vSphere Zone is a new construct that lets you assign availability zones to vSphere clusters. Deploying a Supervisor across vSphere Zones lets you provision Tanzu Kubernetes Grid clusters in specific availability zones and failure domains. This allows for workloads to span across multiple clusters, which increases the resiliency to hardware and software failures.
Multi-Cluster Supervisor - By using vSphere Zones, you can deploy a Supervisor across multiple vSphere clusters to provide high availability and failure domains for Tanzu Kubernetes Grid clusters. You can add a vSphere cluster to a separate vSphere zone and activate a Supervisor that spans accross multiple vSphere Zones. This provides failover and resiliency to localized hardware and software failures. When a zone, or vSphere cluster goes offline, the Supervisor detects the failure and restarts workloads on another vSphere zone. You can use vSphere Zones in environments that span distances so long as latency maximums are not exceeded. For more information on latency requirements, see Requirements for Zonal Supervisor Deployment.
Supervisor Supports Kubernetes 1.23 - This release adds support for Kubernetes 1.23 and drops the support for Kubernetes 1.20. The supported versions of Kubernetes in this release are 1.23, 1.22, and 1.21. Supervisors running on Kubernetes version 1.20 will be auto-upgraded to version 1.21 to ensure that all your Supervisors are running on the supported versions of Kubernetes.
Provide consistent network policies with SecurityPolicy CRDs - SecurityPolicy custom resource definition provides the ability to configure network security controls for VMs and vSphere Pods in a Supervisor namespace.
Tanzu Kubernetes Grid 2.0 on Supervisor - Tanzu Kubernetes Grid now has moved to version 2.0. Tanzu Kubernetes Grid 2.0 is the culmination of tremendous innovation in the Tanzu and Kubernetes community, and provides the foundation for a common set of interfaces across private and public clouds with Tanzu Kubernetes Grid. New in this release are the following two major features:
ClusterClass - Tanzu Kubernetes Grid 2.0 now supports ClusterClass. ClusterClass provides an upstream-aligned interface, superseding the Tanzu Kubernetes Cluster, which brings customization capabilities to our Tanzu Kubernetes Grid platform. ClusterClass enables administrators to define templates that will work with their enterprise environment requirements while reducing boilerplate and enabling delegated customization.
Carvel - Carvel provides a set of reliable, single-purpose, composable tools that aid in application building, configuration, and deployment to Kubernetes. In particular, kapp and kapp-controller provide package management, compatibility, and lifecycle through a set of declarative, upstream-aligned tooling. Coupled with ytt for templating, this results in a flexible yet manageable package management capability.
New documentation structure and landing page in vSphere 8 - The vSphere with Tanzu documentation now has improved structure that better reflects the workflows with the product and allows you to have more focused experience with the content. You can also access all the available technical documentation for vSphere with Tanzu from the new documentation landing page.
Read the Installing and Configuring vSphere with Tanzu documentation for guidance about installing and configuring vSphere with Tanzu. For information about updating vSphere with Tanzu, see the Maintaining vSphere with Tanzu documentation.
When you perform your upgrades, keep in mind the following:
Before you update to vCenter Server 8.0, make sure that the Kubernetes version of all Supervisors is of minimum 1.21, preferably the latest supported. The Tanzu Kubernetes release version of the Tanzu Kubernetes Grid clusters must be of 1.21, preferably the latest supported.
Upgrades from legacy TKGS TKr to TKG 2 TKr are allowed starting with vSphere with Tanzu 8.0 MP1 release. Refer to the Tanzu Kuberentes releases Release Notes for the supported versions matrix. Refer to the Using Tanzu Kubernetes Grid 2 with vSphere with Tanzu documentation for upgrade information.
Changes in the VM image name format in vSphere 8.0 U2 might cause problems when old VM image names are used
Prior to vSphere 8.0 Update 2, the name of a VM image resource was derived from the name of a Content Library item. For example, if a Content Library item was named photonos-5-x64, then its corresponding VirtualMachineImage
resource would also be named photonos-5-x64. This caused problems when library items from different libraries had the same names.
In the vSphere 8.0 Update 2 release, the VM Image Service has been introduced to manage VM images in a self-service manner. See Creating and Managing Content Libraries for Stand-Alone VMs in vSphere with Tanzu.
This new functionality allows content libraries to be associated with a namespace or the entire supervisor cluster, and requires that the names of VM image resources in Kubernetes clusters are unique and deterministic. As a result, to avoid potential conflicts with library item names, the VM images now follow the new naming format vmi-xxx
. However, this change might cause issues in the vSphere 8.0 Update 2 release if you use previous VM YAMLs that reference old image names, such as photonos-5-x64 or centos-stream-8.
Workaround:
Use the following commands to retrieve information about VM images:
To display images associated with their namespaces, run kubectl get vmi -n <namespace>
.
To fetch images available in the cluster, run kubectl get cvmi
.
After obtaining the image resource name listed under the NAME column, update the name in your VM deployment spec.
During a Supervisor auto-upgrade, the WCP Service process on vSphere might trigger a panic and stop unexpectedly
You can notice a core dump file generated for the WCP Service process.
Workaround: None. The VMON process automatically restarts the WCP Service, and the WCP Service resumes the upgrade with no further problems.
Supervisor upgrade suspends with ErrImagePull and Provider Failed status in vSphere Pods. Persistent volumes attached to vSphere Pods (including Supervisor Services) may not be detached on ErrImagePull failures.
Persistent volumes might not be detached for vSphere Pods failing with ErrImagePull . This can cause a cascade of failed vSphere Pods, because the required persistent volumes are attached to the failed instance. During the Supervisor upgrade, vSphere Pods within the Supervisor might transition to a provider failed
state, becoming unresponsive.
Workaround: Delete the instances of failed vSphere Pods that have persistent volumes attached to them. It is important to note that Persistent Volume Claims (PVC) and persistent volumes that are associated with vSphere Pods can be retained. After completing the upgrade, recreate the vSphere Pods by using the same PodSpecPVC to maintain data integrity and functionality. To mitigate this issue, create vSphere Pods by using ReplicaSets (DaemonSet, Deployment) to maintain a stable set of replica vSphere Pods running at any given time.
Supervisor upgrade stuck at 50% and Pinniped upgrade fails due to leader election
Pinniped pods stuck in pending state during roll out and the Supervisor upgrade fails during the Pinniped component upgrade.
Steps to workaround are:
Run kubectl get pods -n vmware-system-pinniped
to check the status of the Pinniped pods.
Run kubectl get leases -n vmware-system-pinniped pinniped-supervisor -o yaml
to verify that holderIdentity
is not any of the Pinniped pods in pending state.
Run kubectl delete leases -n vmware-system-pinniped pinniped-supervisor
to remove the lease object for pinniped-supervisor that has a dead pod as holderIdentity
.
Run kubectl get pods -n vmware-system-pinniped
again to ensure that all pods under vmware-system-pinniped are up and running.
An ESXi host node cannot enter maintenance mode with Supervisor control plane VM in powered-on state
In a Supervisor setup with NSX Advanced Load Balancer, ESXi host fails to enter maintenance mode if there is Avi Service Engine VM is in powered-on state, which will impact the ESXi host upgrade and NSX upgrade with maintenance mode.
Workaround: Power off Avi Service Engine VM so ESXi can enter maintenance mode.
Cannot receive looped back traffic using ClusterIP with vSphere Pods on VDIS
If an application within a vSphere Pod tries to reach out to a ClusterIP that is served within the same vSphere Pod (in a different container) the DLB within VSIP is unable to route the traffic back to the vSphere Pod.
Workaround: None.
Network Policies are not enforced in a VDS based Supervisor
Existing service YAML that uses NetworkPolicy does not require any changes. The NetworkPolicy will not be enforced if present in the file.
Workaround: You must set policy-based rules on the networking for VDS. For NetworkPolicy support, NSX networking support is required.
Namespace of a Carvel service might continue to show up in the vSphere Client after you uninstall the service from the Supervisor
If the Carvel service takes over 20 minutes to uninstall from the Supervisor, its namespace might still show up in the vSphere Client after the service is uninstalled.
Attempts to reinstall the service on the same Supervisor fail until the namespace is deleted. And the ns_already_exist_error
message shows up during the reinstallation.
You see the following entry in the log file:
/var/log/vmware/wcp/wcpsvc.log should have the message - "Time out for executing post-delete operation for Supervisor Service with serviceID '<service-id>' from cluster '<cluster-id>'. Last error: <error>"
Workaround: Manually delete the namespace from the vSphere Client.
From the vSphere Client home menu, select Workload Management.
Click the Namespaces tab.
From the list of namespaces, select the uncleared namespace, and click the REMOVE button to delete the namespace.
Upgrading ESXi hosts from 7.0 Update 3 to 8.0 without Supervisor upgrade results in ESXi hosts showing as Not Ready and workloads going offline
When you upgrade the ESXi hosts that are part of a Supervisor from vSphere 7.0 Update 3 to vSphere 8.0 and you do not upgrade the Kubernetes version of the Supervisor, the ESXi hosts are showing as Not Ready and workloads running on the hosts go offline.
Workaround: Upgrade the Kubernetes version of the Supervisor to at least 1.21, 1.22, or 1.23.
Upon one-click upgrade of vCenter Server, the Supervisor will not be auto-upgraded
If the Kubernetes version of the Supervisor is of version earlier than 1.22, upon one-click upgrade of vCenter Server to 8.0, the Supervisor is not able auto-upgrade to the minimum supported version Kubernetes version for 8.0, which is 1.23.
Workaround: Before upgrading vCenter Server for 8.0, upgrade the Kubernetes version of the Supervisor to 1.22. If you have already upgrade vCenter Server to 8.0, manually upgrade the Kubernetes version of the Supervisor to 1.23.
If you change rules in a security policy custom resource, stale rules might not be deleted
This problem might occur when you update the security policy. For example, you create a security policy custom resource that contains rules A and B and then update the policy changing the rules to B and C. As a result, rule A is not deleted. The NSX Management Plane continues to display rule A in addition to B and C.
Workaround: Delete the security policy and then create the same one.
After an upgrade of vCenter Server and vSphere with Tanzu, a Tanzu Kubernetes Grid cluster cannot complete its upgrade because of volumes that appear as attached to the cluster’s nodes
When the Tanzu Kubernetes Grid cluster fails to upgrade, you can notice a volume that shows up as attached to the cluster’s nodes and does not get cleared. This problem might be caused by an issue in the upstream Kubernetes.
Workaround:
Obtain information about the TKG cluster node that has scheduling disabled by using the following command:
kubectl get node tkc_node_name -o yaml
Example:
# kubectl get node gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95 -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
cluster.x-k8s.io/cluster-name: gcm-cluster-antrea-1
cluster.x-k8s.io/cluster-namespace: c1-gcm-ns
cluster.x-k8s.io/machine: gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
cluster.x-k8s.io/owner-kind: MachineSet
cluster.x-k8s.io/owner-name: gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7
csi.volume.kubernetes.io/nodeid: '{"csi.vsphere.vmware.com":"gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95"}'
kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
….
….
volumesAttached:
- devicePath: ""
name: kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
Check if this node has any vSphere CSI volumes in the status.volumeAttached
property. If there are any volumes, proceed to the next step.
Verify that no pods are running on the node identified in Step 1. Use this command:
kubectl get pod -o wide | grep tkc_node_name
Example:
kubectl get pod -o wide | grep gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
The empty output of this command indicates that there are no pods. Proceed to the next step because the output in Step 1 shows a volume attached to the node that does not have any pods.
Obtain information about all node objects to make sure that the same volume is attached to another node:
kubectl get node -o yaml
Example:
The same volume name shows up in two TKG cluster node objects.
On old node - "gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95"
volumesAttached:
- devicePath: ""
name: kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
On new node "gcm-cluster-antrea-1-workers-pvx5v-84486fc97-ndh88"
volumesAttached:
- devicePath: ""
name: kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
...
volumesInUse:
- kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
...
Search for the PV based on the volume handle in the volume name.
In the above example volume name is kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
and the volume handle is 943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
.
Get all the PVs and search for the above volume handle by using this command:
kubectl get pv -o yaml
In the above example, the PV with that volume handle is pvc-59c73cea-4b75-407d-8207-21a9a25f72fd
.
Use the volumeattachment command to search for the PV found in previous step:
kubectl get volumeattachment | grep pv_name
Example:
# kubectl get volumeattachment | grep pvc-59c73cea-4b75-407d-8207-21a9a25f72fd
NAME ATTACHER PV NODE ATTACHED AGE
csi-2ae4c02588d9112d551713e31dba4aab4885c124663ae4fcbb68c632f0f46d3e csi.vsphere.vmware.com pvc-59c73cea-4b75-407d-8207-21a9a25f72fd gcm-cluster-antrea-1-workers-pvx5v-84486fc97-ndh88 true 3d20h
You can observe a volume attachment attached to node gcm-cluster-antrea-1-workers-pvx5v-84486fc97-ndh88
instead of node gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
. This indicates that the status.volumeAttached
in gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
in incorrect.
Delete the stale volumeAttached entry from the node object:
kubectl edit node tkc_node_name
Example:
kubectl edit node gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
Remove the stale volume entry from status.volumeAttached
.
Repeat the above steps for all stale volumes in the status.volumeAttached
property.
If WCPMachines still exists, manually delete it and wait for few minutes to reconcile the cluster.
# kubectl get wcpmachines -n c1-gcm-ns
NAMESPACE NAME ZONE PROVIDERID IPADDR
c1-gcm-ns gcm-cluster-antrea-1-workers-jrc58-zn6wl vsphere://423c2281-d1bd-9f91-0e87-b155a9d291a1 192.168.128.17
# kubectl delete wcpmachine gcm-cluster-antrea-1-workers-jrc58-zn6wl -n c1-gcm-ns
wcpmachine.infrastructure.cluster.vmware.com "gcm-cluster-antrea-1-workers-jrc58-zn6wl" deleted
If a vSphere admistrator configures a self-service namespace template with resource limits that exceed the cluster capacity, an alert is not triggered.
When vSphere admistrators configure resource limits that exceed the cluster capacity, DevOps Engineers might use the template to deploy vSphere Pods with high resources. As a result, the workloads might fail.
Workaround: None
When you delete a Supervisor namespace that contains a Tanzu Kubernetes Grid cluster, persistent volume claims present in the Supervisor might remain in terminating state
You can observe this issue when a resource conflict occurs while the system deletes the namespace and detaches volumes from the pods in the Tanzu Kubernetes Grid cluster.
The deletion of the Supervisor namespace remains incomplete, and the vSphere Client shows the namespace state as terminating. Persistent volume claims that were attached to pods in the Tanzu Kubernetes Grid cluster also remain in terminating state.
If you run the following commands, you can see the Operation cannot be fulfilled on persistentvolumeclaims error:
kubectl vsphere login --server=IP-ADDRESS --vsphere-username USERNAME
kubectl logs vsphere-csi-controller-pod-name -n vmware-system-csi -c vsphere-syncer
failed to update PersistentVolumeClaim: \\\"<pvc-name>\\\" on namespace: \\\"<supervisor-namespace>\\\". Error: Operation cannot be fulfilled on persistentvolumeclaims \\\"<pvc-name>\\\": the object has been modified; please apply your changes to the latest version and try again\
Workaround:
Use the following commands to fix the issue:
kubectl vsphere login --server=IP-ADDRESS --vsphere-username USERNAME
kubectl patch pvc <pvc-name> -n <supervisor-namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge
When deleting multiple FCDs and volumes from shared datastores such as vSAN, you might notice changes in performance
The performance changes can be caused by a fixed issue. While unfixed, the issue caused stale FCDs and volumes to remain in the datastore after an unsuccessful FCD delete operation.
Workaround: None. The delete operation works as usual despite the change in the performance.
If a DevOps user starts volume operations or stateful application deployments while vCenter Server reboots, the operations might fail
The problem occurs because the workload storage management user account gets locked, and the vSphere CSI plug-in that runs on the Supervisor fails to authenticate. As a result, the volume operations fail with InvalidLogin errors.
The log file /var/log/vmware/vpxd/vpxd.log
displays the following message:
Authentication failed: The account of the user trying to authenticate is locked. :: The account of the user trying to authenticate is locked. :: User account locked: {Name: workload_storage_management-<id>, Domain: <domain>})
Workaround:
Obtain the account unlock time.
In the vSphere Client, navigate to Administration, and click Configuration under Single Sign On.
Click the elect Accounts tab.
Under Lockout Policy, get the Unlock time in seconds.
Authenticate with the Supervisor using the vSphere Plugin for kubectl.
kubectl vsphere login –server IP-ADDRESS --vsphere-username USERNAME
Note down the original replica count for vsphere-csi-controller deployment in vmware-system-csi- namespace.
kubectl get deployment vsphere-csi-controller -n vmware-system-csi -o=jsonpath='{.spec.replicas}'
original-replica-count
Scale down the vsphere-csi-controller deployment replica count to zero.
kubectl scale --replicas=0 deployment vsphere-csi-controller -n vmware-system-csi
Wait for the number of seconds indicated under Unlock time.
Scale up the vsphere-csi-controller deployment replica count to the original value.
kubectl scale --replicas=original-replica-count deployment vsphere-csi-controller -n vmware-system-csi
TKG cluster worker nodes fail to powering on with error log VIF restore activity already completed for attachment ID
from the nsx-ncp pod
TKG cluster worker nodes fail to powering on with error the following error:
nsx-container-ncp Generic error occurred during realizing network for VirtualNetworkInterface
NCP logs an error:
VIF restore activity already completed for attachment ID
When a VM of a TKG cluster node created after an NSX backup migrates with vMotion immediately after NSX restore, NCP cannot restore the port for the VM, because the attachment ID will get reused in the vMotion and block the NCP's restore request.
Workaround:
Go to NSX Manager to get the segment ports that should be deleted, they have a display name in the format of <vm name>.vmx@<attachment id>
Before deleting the newly created port, find the host where the VM is running and turn off the ops-agent by running /etc/init.d/nsx-opsagent stop
on the host.
Delete the port by using NSX API https://<nsx-mgr>/api/v1/logical-ports/<logical switch port id>?detach=true
Turn on the ops-agent by running /etc/init.d/nsx-opsagent start
on the host.
Wait until NCP restores the port.
Pods, PVs, and PVCs in a TKG cluster may be stuck in a Terminating
state during TKG cluster clean-up or during recovery from ESXi hosts downtime
As part of the normal TKG cluster delete and cleanup process, all of the deployments, statefulsets, PVCs, PVs and similar objects are deleted. However, for TKG clusters based on TKR v1.24 and lower, some of the PVs may be stuck in a Terminating
state due to DetachVolume errors. The issue occurs when DetachVolume errors on a VolumeAttachment object cause the finalizers on the VolumeAttachment to not be removed, resulting in failure to delete related objects. This scenario can also occur if there is downtime in the ESXi hosts.
Workaround: Run the following command in the TKG cluster to remove the finalizers from volumeattachments with a detachError:
kubectl get volumeattachments \
-o=custom-columns='NAME:.metadata.name,UUID:.metadata.uid,NODE:.spec.nodeName,ERROR:.status.detachError' \
--no-headers | grep -vE '<none>$' | awk '{print $1}' | \
xargs -n1 kubectl patch -p '{"metadata":{"finalizers":[]}}' --type=merge volumeattachments
TGK cluster unreachable after backup and restore
If a vSphere Namespace is created after an NSX backup and configured with a customized ingress/egress CIDR, once NSX is restored from a backup, NCP fails to complete the restore process, and TKG clusters on the vSphere Namespace are not available. NCP fails during the restore process with an error such as the following:
NSX IP Pool ipp_<supervisor_namespace_id>_ingress not found in store
Workaround: In case a backup of the Supervisor was taken around the same time as the NSX backup, but before the affected vSphere Namespace was created, restore the Supervisor as well from the backup. Alternatively, delete the vSphere Namespace and associated TKG clusters, wait for NCP to resync and then re-create deleted resources.
TKG cluster unreachable after NSX backup and restore
When a Supervisor is configured with a customized Ingress CIDR, after a NSX backup restore, TKG clusters may become unavailable as the NCP component is unable to properly validate the TKG clusters' ingress VIP.
Workaround: By using the NSX API, manually configure VIPs for TKG clusters in NSX to restore access.
Existing Tanzu Kubernetes Grid clusters configured with a proxy server cannot be upgraded to a vSphere 8 Supervisor
FIXED: This known issue is fixed in the vSphere 8 with Tanzu MP1 release.
If you have configured an existing Tanzu Kubernetes Grid cluster with a proxy server, you cannot upgrade that cluster from a vSphere 7 Supervisor to Tanzu Kubernetes Grid 2.0 on vSphere 8 Supervisor.
Workaround: The contents of the noProxy field has conflicts with upgrade checks. Because this field is required if the proxy stanza is added to the cluster spec, you must remove the proxy configuration in its entirety before upgrading to vSphere 8.
The antrea-resource-init pod hangs in Pending state
After upgrading the Tanzu Kubernetes release version of a Tanzu Kubernetes Grid cluster, the antrea-resource-init pod might be in Pending state.
Workaround: Restart the pod on the Supervisor.
Tanzu Kubernetes Grid clusters v1.21.6 might enter a into FALSE state and some machines might not migrate
After upgrading to vCenter Server 8 and updating the Supervisor, v1.21.6 Tanzu Kubernetes Grid clusters might enter into a FALSE state and some wcpmachines might not migrate to vspheremachines.
Workaround: none
By default the password for the vmware-system-user account expires in 60 days for TKG clusters running TKR version v1.23.8---vmware.2-tkg.2-zshippable
As part of STIG hardening, for TKG clusters running TKR version v1.23.8---vmware.2-tkg.2-zshippable, the vmware-system-user account is configured to expire in 60 days. This can impact users who are using the vmware-system-user account to SSH to cluster nodes.
Refer to the following knowledgebase article to update the vmware-system-user password expiry, allowing SSH sessions to TKG cluster nodes if required: https://kb.vmware.com/s/article/90469
tanzu-capabilities-controller-manager pod is continually restarting and going to CLBO on TKC in vSphere with Tanzu 8.0.0a
As a result of service account permission issues the tanzu-capabilities-controller-manager pod on the Tanzu Kubernetes Cluster will be stuck in CLBO(Crash Loop back off) when using TKR version v1.23.8+vmware.2-tkg.2-zshippable.
Workaround: Add the required permissions to the capabilities service account tanzu-capabilities-manager-sa on the TKC.
Pause the reconciliation of the capabilities package on the TKC:
kubectl patch -n vmware-system-tkg pkgi tkc-capabilities -p '{"spec":{"paused": true}}' --type=merge
Create a new file capabilities-rbac-patch.yaml:
apiVersion: v1
kind: Secret
metadata:
name: tanzu-capabilities-update-rbac
namespace: vmware-system-tkg
stringData:
patch-capabilities-rbac.yaml: |
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"ClusterRole", "metadata": {"name": "tanzu-capabilities-manager-clusterrole"}}),expects="1+"
---
rules:
- apiGroups:
- core.tanzu.vmware.com
resources:
- capabilities
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- core.tanzu.vmware.com
resources:
- capabilities/status
verbs:
- get
- patch
- update-rbac
Patch the capabilities cluster role on TKC:
//Replace tkc with the name of the Tanzu Kubernetes Grid cluster: kubectl patch -n vmware-system-tkg pkgi tkc-capabilities -p '{"metadata":{"annotations":{"ext.packaging.carvel.dev/ytt-paths-from-secret-name.0":"tanzu-capabilities-update-rbac"}}}' --type=merge
Delete the tanzu-capabilities-controller-manager on the TKC.
Tanzu Kubernetes Grid 2.0 clusters on aSupervisor deployed on three vSphere Zones do not support VMs with vGPU and instance storage.
Tanzu Kubernetes Grid 2.0 clusters provisioned on a Supervisor instance deployed across three vSphere Zones do not support VMs with vGPU and instance storage.
Workaround: none
TKR version v1.22.9 is listed in the content library image but not in kubectl command
The content library for TKR images lists TKR v1.22.9. The command kubectl get tkr
does not list this image as available, because TKR v1.22.9 is not available for use and should not be used. This image appears in the content library by error.
Workaround: Use a TKR other than TKR v1.22.9. Refer to the TKR Release Notes for a list of the available TKRs.
Unable to provision a TKC using the v1alpha1 API and a v1.23.8 TKR in vSphere with Tanzu 8.0.0a
When utilizing TKC v1alpha1 API to provision TKC with version v1.23.8. The request will fail with "unable to find a compatible full version matching version hint "1.23.8" and default OS labels: "os-arch=amd64,os-name=photon,os-type=linux,os-version=3.0".
Workaround: Switch to TKC v1alpha2 or v1alpha3 APIs when provisioning TKCs
Tanzu Kubernetes Grid 2.0 clusters provisioned with the v1beta1 API must be based on the default ClusterClass
If you are creating a Tanzu Kubernetes Grid 2.0 cluster on Supervisor by using the v1beta1 API, the Cluster must be based on the default tanzukubernetescluster
ClusterClass. The system does not reconcile a cluster based on a different ClusterClass.
Workaround: Starting with the vSphere 8 U1 release, you can provision a v1beta1 cluster based on a custom ClusterClass. Refer to the KB article https://kb.vmware.com/s/article/91826.
In an NSX Advanced Load Balancer setup, there is no section usage.ingressCIDRUsage
in clusternetworkinfo
or namespacenetworkinfo
output
In an NSX Advanced Load Balancer setup, ingress IP is allocated by the Avi controller, the usage for ingressCIDR will not be displayed in clusternetworkinfo
or namespacenetworkinfo
output.
Workaround: Get the ingressCIDR
usage from Avi controller UI at Applications > VS VIPs.
Pod CIDR on tier-0 prefix list is not removed after namespace deletion for a routed Supervisor
In Routed Supervisor, pod CIDR in a tier-o prefix list does not get deleted after namespace deletion.
Workaround: Delete the prefix-lists object:
curl -k -u ‘admin:U2.HzP7QZ9Aw’ -X PATCH -d ‘{“prefixes”:[{“network” : “10.246.0.0/16”,“le” : 28,“action” : “PERMIT”}]}’ https://<IP ADDRESS>/policy/api/v1/infra/tier-0s/ContainerT0/prefix-lists/pl_domain-c9:45c1ce8d-d2c1-43cb-904f-c526abd8fffe_deny_t1_subnets -H ‘X-Allow-Overwrite: true’ -H ‘Content-type: application/json
Kubernetes resources clusternetworkinfo
and namespacenetworkinfo
do not contain usage.ingressCIDRUsage
when using NSX Advanced Load Blanacer.
When using NSX Advanced Load Balancer in a NSX based Sueprvisor, the clusternetworkinfo
and namespacenetworkinfo
Kuberentes resources no longer contain the usage.ingressCIDRUsage
fields. This means that running a kubectl get clusternetworkinfo <supervisor-cluster-name> -o json
or kubectl get namespacenetworkinfo <namespace-name> -n <namespace-name> -o json
will not contain the ingressCIDR
usage object in the output.
Workaround: Use the Avi Controller UI page to access the ingressCIDR
usage.
Stale tier-1 segments exist for some namespaces after NSX backup and restore
After an NSX backup and restore procedure, stale tier-1 segments that have Service Engine NICs do not get cleaned up.
When a namespace is deleted after an NSX backup, the restore operation restores stale tier-1 segments that are associated with the NSX Advanced Load Balancer Controller Service Engine NICs.
Workaround: Manually delete the tier-1 segments.
Log in to the NSX Manager.
Select Networking > Segments .
Find the stale segments that are associated with the deleted namespace.
Delete the stale Service Engine NICs from the Ports/Interfaces section.
Load Balancer monitor might stop working, the Supervisor might get stuck in "configuring" state in the vSphere Client
If NSX Advanced Load Balancer is enabled, due to the presence of multiple enforcement points in NSX, NCP might fail to pull the load balancer status. This affects existing Supervisors configured with NSX Advanced Load Balancer once it is enabled on NSX. This issue does affect new Supervisors leveraging NSX Advanced Load Balancer. Supervisors impacted by this issue will still be functional - with the exception of the LB monitor capability.
Workaround: Disable NSX Advanced Load Balancer in NSX. This may limit the ability of deploying Supervisors with NSX Advanced Load Balancer in WCP environments with existing Supervisors running NSX Advanced Load Balancer.
You cannot use NSX Advanced Load Balancer with a vCenter Server using an Embedded Linked Mode topology.
When you configure the NSX Advanced Load Balancer controller, you can configure it on multiple clouds. However, you do not have an option to select multiple clouds while enabling vSphere with Tanzu as it only supports the Default-Cloud option. As a result, you cannot use the NSX Advanced Load Balancer with a vCenter Server version using an Embedded Linked Mode topology.
Configure NSX Load Balancer for each vCenter Server.
The volume allocation type cannot be changed for the disks in a vSAN Direct datastore
Once you decide the volume allocation type for the disks in the vSAN Direct datastore, you cannot change it. This is because the underlying layers do not support the type conversion. However, the volume allocation type change for the new disk is allowed for operations such as clone and relocate.
Workaround: none
Deleted VM causes CNS tasks to get stuck in queued state.
Operations sent to CNS return a task ID, but the task state never changes from queued. The tasks are for volumes attached to a VM that has been just deleted.
Workaround: If application layer can fix the calling order, nothing needs to be done on the CNS side. If not, disable the CNS new serialization by following the steps:
Open /usr/lib/vmware-vsan/VsanVcMgmtConfig.xml
on vCenter.
Change the following configuration to false: <newSerializationEnabled>true</newSerializationEnabled>
Restart vsan-health with vmon-cli -r vsan-health
See the KB 93903 for more details.
PVs remain in terminated state after successfully deleting PVCs
After you delete a PersistentVolumeClaim (PVC), the corresponding PersistentVolume (PV) might remain in a terminated state in Supervisor. Additionally, the vSphere Client might display multiple failed deleteVolume tasks.
Workaround:
Authenticate with the Supervisor:
kubectl vsphere login --server=IP-ADDRESS --vsphere-username USERNAME
Get the name of the persistent volume in terminating state:
kubectl get pv
Note down the volume handle from the persistent volume:
kubectl describe pv <pv-name>
Using the volume handle from previous step, delete the CnsVolumeOperationRequest Custom resource in the Supervisor:
kubectl delete cnsvolumeoperationrequest delete-<volume-handle>
Before deleting a PV, ensure that it is not being used by any other resources in the cluster.