The release notes cover the following topics:
The release notes cover the following topics:
VMware vSphere 8.0 Update 3b | 17 SEP 2024 vCenter Server 8.0 Update 3b | 17 SEP 2024 | ISO Build 24262322 VMware ESXi 8.0 Update 3b | 17 SEP 2024 | ISO Build 24280767 VMware NSX Advanced Load Balancer avi-22.1.5 | 11 OCT 2023 |
Supported Kubernetes Versions for Supervisors
This release bundles Supervisor Kubernetes versions 1.29, 1.28, and 1.27.
The Supervisor Kubernetes version is distinct from the TKG Service Kubernetes cluster versions. For more information, see What Is vSphere IaaS Control Plane?
If your Supervisor Kubernetes version is 1.26, it will be auto-upgraded to version 1.27. Please note that auto-upgrade is not allowed in a VCF environment.
TKG Service for vSphere
Support for TKG Service 3.1.1. This version includes the same features as version 3.1 with additional bug fixes. It is available alongside the vCenter 8.0 Update 3b release. For more information, see VMware TKG Service Release Notes.
VMware vSphere 8.0 Update 3 | 25 JUN 2024 vCenter Server 8.0 Update 3 | 25 JUN 2024 | ISO Build 24022515 VMware ESXi 8.0 Update 3 | 25 JUN 2024 | ISO Build 24022510 VMware NSX Advanced Load Balancer avi-22.1.5 | 11 OCT 2023 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 3.0 | 25 JUN 2024 |
vSphere IaaS control plane 8.0 Update 3 introduces the following new features and enhancements:
Automatic Supervisor and TKG Certificate Rotation - This release brings a new certificate rotation mechanism. Supervisor and TKG cluster certificates are automatically rotated before they expire. If the automatic rotation fails, you are notified in vCenter and you will have to renew the certificates manually.
Support for Supervisor and vSphere Pod, and TKG cluster on Stretched vSAN cluster - Starting from this release, you can configure the Supervisor to run on a vSAN stretched cluster. This configuration is only supported in a greenfield environment. This means that your vSAN cluster should be stretched and Workload Management and content library have not yet been configured. These two components need to be configured with vSAN stretched cluster storage policies. See the documentation for more information.
Support Supervisor Services and vSphere Pods with HA Supervisor - Use Supervisor Services and vSphere Pods on a 3-Zone Supervisor deployment. Allowing for highly-available workloads to leverage Supervisor Services like Harbor and Contour.
VM class for VM Service VMs is now namespace-scoped - In this release, when you use kubectl to list VM classes, you can view just the VM classes that are scoped to the specific vSphere namespace. Previously, VM classes were a cluster-scoped resource and it was difficult to determine which VM classes were assigned and available to a specific namespace.
Supervisor backup and restore - Backup and restore of the Supervisor is now supported on vSphere. You can now configure backups for a Supervisor as part of a vCenter backup, and restore a Supervisor from a backup in the case of disaster recovery, upgrade failure, an outage, and other recovery scenarios. See Backup and Restore the Supervisor Control Plane.
VM Service VM Backup and Restore - Backup and restore of VM Service VMs is now supported on vSphere. You can now use any VADP-based backup vendor to protect your VM Service VMs. In the case of an outage or other scenarios involving data loss, you can restore the VMs in the vSphere namespace from a backup.
VM Operator API v1alpha2 is now available - The next version of the VM Operator API is here with the release of VM Operator v1alpha2 . This release unlocks enhanced bootstrap provider support including inline Cloud-Init and Windows support, enhanced guest networking configuration, augmented status capabilities, support for user-defined readiness gates, new VirtualMachineWebConsoleRequest
API, improved API documentation, and other capabilities.
Leverage Fluent Bit for log forwarding on the Supervisor - Support log forwarding of Supervisor control plane logs and systems logs to an external log monitoring platform is now available. See the documentation for more information.
Private Registry support for Supervisor Services - You can now define private registry details that will allow workloads deployed as Supervisor Service to pull images and packages from a private registry. This is a common requirement for supporting a air-gapped environment. For more information, see the documentation.
Supervisor upgrade workflow improvements - You can now initiate Supervisor upgrade pre-checks when initiating a Supervisor Kubernetes version upgrade. Only when all pre-checks succeed, the actual update is performed. This release brings in the ability to resume the component upgrade process from the point where it previously failed. For more information, see Update the Supervisor.
Supervisor support for Kubernetes 1.28 - This release adds Supervisor support for Kubernetes 1.28 and drops the support for Kubernetes 1.25.
Supported Kubernetes Versions for Supervisors:
The supported versions of Kubernetes in this release are 1.28, 1.27, and 1.26. Supervisors running on Kubernetes version 1.25 will be auto-upgraded to version 1.26 to ensure that all your Supervisors are running on the supported versions of Kubernetes.
Tanzu Kubernetes Grid Service for vSphere
TKG Service as a Supervisor Service – This release decouples the TKG Service components from vCenter and packages them as a Supervisor Service that you can update and manage independent of vCenter and Supervisor releases. You can find the release notes of TKG Service here.
Support for deploying TKG Service clusters on stretched vSAN cluster – This release supports deploying TKG Service clusters on vSAN stretched cluster topology. For more information on how to configure HA for TKG Service clusters on vSAN stretched cluster, see the Running vSphere IaaS Control Plane on vSAN Stretched Cluster documentation.
Autoscaling TKG Service clusters – This release supports installing the cluster autoscaler package on a TKG Service cluster to enable the automated scaling out and in of worker nodes.
Backup and restore TKG clusters – This release supports the backup and restore of the Supervisor database, which would include vSphere Namespace object and TKG cluster VMs. Note that TKG cluster workloads need to be backed up and restored separately.
Support for Antrea-NSX integration and the NSX Management Proxy service – This release supports integrating TKG clusters using the Antrea CNI with NSX Manager for observability and control of cluster networking.
Configurable MachineHealthCheck – This release supports configuring MachineHealthCheck for v1beta1 clusters.
Cluster-wide PSA configuration – This release supports configuring PSA on a cluster-wide basis during cluster creation or update.
Standard package installation updates – This release includes documentation updates for installing standard package on TKG Service clusters.
Updates to the v1beta1 API for provisioning TKG clusters – This release includes the following v1beta1 API changes:
podSecurityStandard is added for cluster-wide PSA implementation
controlPlaneCertificateRotation is updated
Support for scaling control plane node volumes for a TKG cluster.
Internationalization Updates
Beginning with the next major release, we will be reducing the number of supported localization languages. The three supported languages will be:
Japanese
Spanish
French
The following languages will no longer be supported:
Italian, German, Brazilian Portuguese, Traditional Chinese, Korean, Simplified Chinese
Impact:
Users who have been using the deprecated languages will no longer receive updates or support in these languages.
All user interfaces, help documentation, and customer support will be available only in English or in the three supported languages mentioned above.
VMware vSphere 8.0 Update 2c | 26 MAR 2024 ESXi 8.0 Update 2b | 29 FEB 2024 | ISO build 23305546 vCenter Server 8.0 Update 2c | 26 MAR 2024 | ISO Build 23504390 VMware NSX Advanced Load Balancer avi-22.1.5 | 11 OCT 2023 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.2.0 | 18 MAY 2023 |
vSphere IaaS contorl plane 8.0 Update 2c introduces the following new features and enhancements:
New Features
Support for TKr 1.27 - This release adds changes required to support TKr 1.27 on vSphere 8.x, when it is released in the future. For more information, see the VMware Tanzu Kubernetes releases Release Notes.
Support for Upgrade from vCenter 7.x to vCenter 8.x - For users who are running TKr 1.27.6 on vSphere 7.x, this release provides a path to upgrade to vCenter 8.x. Previously, TKr 1.27.6 released for vSphere 7.x was not compatible with vCenter 8.x. See the KB Article 96501.
Resolved Issues
After you upgrade to vCenter 8.0 Update 2b, the vmon-managed service wcp
might be in STOPPED
state and vCenter patching might fail.
Unlinking a library or deleting a namespace with a shared library deletes associated items from the content library.
VMware vSphere 8.0 Update 2b | 29 FEB 2024 ESXi 8.0 Update 2b | 29 FEB 2024 | ISO build 23305546 vCenter Server 8.0 Update 2b | 29 FEB 2024 | ISO Build 23319993 VMware NSX Advanced Load Balancer avi-22.1.5 | 11 OCT 2023 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.2.0 | 18 MAY 2023 |
vSphere IaaS control plane 8.0 Update 2b introduces the following new features and enhancements:
New Features
Support for configuring VM Service VM classes through the vSphere Client - VM Service on the Supervisor supports deploying VMs with any configuration currently supported with vSphere VMs. To configure CPU, memory, hardware, security devices, and passthrough devices for VM classes, you can now use the VM class wizard in the vSphere Client. Using the vSphere Client, you simplify the process of defining and managing VM Service VMs used for running AI/ML workloads.
Supervisors support Avi cloud setting - If you are using the NSX Advanced Load Balancer, also known as Avi Load Balancer, you can now configure an Avi cloud for each Supervisor. This allows multiple vCenter Server instances and data centers to use a single Avi instance, eliminating the need to set up an Avi instance for each vCenter Server or data center deploying a Supervisor.
FQDN login support - For Supervisor and TKG clusters, you can now replace the IP in generated kubeconfigs
with a valid FQDN. The FQDN and IP address must be added the DNS to ensure proper hostname resolution.
Supervisors support Kubernetes 1.27 - Supervisor now supports Kubernetes 1.27 and discontinues support for Kubernetes 1.24.
Supported Kubernetes Versions
Supported Kubernetes versions in this release are 1.27, 1.26, and 1.25. Supervisors running on Kubernetes 1.24 will auto-upgrade to version 1.25 to ensure compatibility.
Upgrade to 8.0 Update 2b
Upgrading from any vSphere 8.x versions less than 8.0 Update 2 to the 8.0 Update 2b release will initiate a rolling update of all provisioned TKGS clusters to propagate the following changes:
8.0 Update 2 contains changes for both vSphere 7 and vSphere 8 TKRs in the TKGS Controller as a part of the ClusterClass.
Because TKGS clusters from 1.23 and above are compatible with 8.0 Update 2b, all TKGS clusters will undergo a rolling upgrade.
Resolved Issues
Supervisor upgrade process stuck at 20%.
VMware vSphere 8.0.2 | 21 SEP 2023 ESXi 8.0.2 | 21 SEP 2023 | Build 22380479 vCenter Server 8.0.2 | 21 SEP 2023 | Build 22385739 VMware NSX Advanced Load Balancer avi-22.1.3 | 31 JAN 2023 HAProxy Community Edition 2.2.2, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.2.0 | 18 MAY 2023 |
vSphere IaaS control plane 8.0 Update 2 introduces the following new features and enhancements:
Supervisor
VM Service now supports VMs with Windows OS, GPUs, and all other options available for traditional vSphere VMs - The VM Service now supports deploying VMs with any configuration currently supported with vSphere VMs, achieving complete parity with traditional VMs on vSphere but for VMs deployed as part of Infrastructure as a Service on the Supervisor. This includes support for provisioning Windows VMs alongside Linux VMs, as well as any hardware, security, device, custom or multi-NIC support, and passthrough devices that are supported on vSphere. You can now provision workload VMs by using GPUs to support AI/ML workloads.
VM Image Service - DevOps teams, developers, and other VM consumers can publish and manage VM images in a self-service manner by using the VM Image Service. The service allows consumers to publish, modify, and delete images by using K8s APIs within a Supervisor Namespace scoped image registry. The VM Image Service is created automatically in each CCS region and CCS project, and population of images to the image registry is scoped by persona and consumption level, either on global or project level. Images can be used for the deployment of VMs through the VM Service.
Note that this functionality introduces a new format for the VM image name. For information on how to resolve potential issues caused by the name change, see Changes in the VM image name format in vSphere 8.0 U2.
Import and export the Supervisor configuration - In previous versions, activating the Supervisor was a manual step-wise process without the ability to save any configurations. In the current release, you can now export and share the Supervisor configuration with peers in a human-readable format or within a source control system, import configurations to a new Supervisor, and replicate a standard configuration across multiple Supervisors. Checkout the documentation for details on how to export and import the Supervisor configuration.
Improved GPU utilization by reducing fragmentation - Workload placement is now GPU aware, and DRS will try to place workloads with similar profile requirements on the same host. This improves resource utilization, which reduces cost as fewer GPU hardware resources must be acquired to achieve a desired level of performance.
Supervisor supports Kubernetes 1.26 - This release adds support for Kubernetes 1.26 and drops the support for Kubernetes 1.23. The supported versions of Kubernetes in this release are 1.26, 1.25, and 1.24. Supervisors running on Kubernetes version 1.23 will be auto-upgraded to version 1.24 to ensure that all your Supervisors are running on the supported versions of Kubernetes.
Support of NSX Advanced Load Balancer for a Supervisor configured with NSX networking - You can now enable a Supervisor with NSX Advanced Load Balancer (Avi Networks) for L4 load balancing, as well as load balancing for the control plane nodes of Supervisor and Tanzu Kubernetes Grid clusters with NSX networking. Checkout the documentation page for guidance on configuring the NSX Advanced Load Balancer with NSX.
Telegraf Support for Metric and Event Streaming - You can now configure Telegraf via Kubernetes APIs to push Supervisor metrics to any metrics services that are compatible with the embedded Telegraf version. See the documentation for configuring Telegraf.
Tanzu Kubernetes Grid on Supervisor
STIG compliance for TKRs on vSphere 8.0 - With vSphere U2 all the Tanzu Kubernetes Grid clusters above 1.23.x are STIG (Security Technical Implementation Guide) compliant and included documentation for exceptions that you can be find here. These improvements represent a significant step towards compliance process simplification, and make it much easier for you to satisfy compliance requirements so that you can quickly and confidently use Tanzu Kubernetes Grid in the US Federal market and in other regulated industries.
Turn on control plane rollout for expiring certificates – The v1beta1 API for provisioning TKG clusters based on a ClusterClass is updated to enable clusters to automatically renew their control plane node VM certificates before they expire. This configuration can be added as a variable to the cluster specification. Refer to the
Cluster v1beta1 API documentation for more information.
CSI snapshot support for TKRs - TKG clusters provisioned with Tanzu Kubernetes release 1.26.5 and above support CSI volume snapshots, helping you achieve your data protection requirements. Volume snapshots provide you with a standardized way to copy a volume's contents at a particular point in time without creating an entirely new volume.
Installing and Managing Tanzu Packages – A new consolidated repository and publication for installing and managing Tanzu Packages on TKG clusters. Refer to the publication "Installing and Using VMware Tanzu Packages" for all your package needs.
Custom ClusterClass improvements – The workflow for implementing custom ClusterClass clusters is simplified for vSphere 8 U2.
Rolling Updates for TKG clusters – When upgrading to vSphere 8 U2, you can expect rolling updates for provisioned TKG clusters under the following scenarios:
When upgrading from any previously released vSphere 8 version to vSphere 8 U2, because:
vSphere 8 U2 contains Kubernetes-level STIG changes for TKRs as a part of the clusterclass
TKG clusters from 1.23 and above will undergo a rolling update to be made compatible with v8.0 U2
When upgrading from any vSphere 7 version to vSphere 8 U2, because:
Underlying CAPI providers need to be moved from CAPW to CAPV
Existing clusters need to be migrated from classless CAPI clusters to class-based CAPI clusters
Resolved Issues
Audit log files under /var/log/audit/ on the Supervisor control plane VMs may grow to a very large size and fill up the root disk. You should see "no space left on device" errors in journald logs reflecting this state. This can cause various aspects of Supervisor control plane functionality (like kubernetes APIs) to fail.
VMware vSphere 8.0.1c | 27 JUL 2023 ESXi 8.0.1c | 27 JUL 2023 | Build 22088125 vCenter Server 8.0.1c | 27 JUL 2023 | Build 22088981 VMware NSX Advanced Load Balancer avi-22.1.3 | 31 JAN 2023 HAProxy Community Edition 2.2.2, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.2.0 | 18 MAY 2023 |
New Features
Supervisors support Kubernetes 1.25 - This release adds support for Kubernetes 1.25 and drops the support for Kubernetes 1.22.
Tanzu Kubernetes Grid 2.2.0 on Supervisor - Manage Tanzu Kubernetes Grid 2.2.0 clusters on Supervisor.
Supported Kubernetes Versions
The supported versions of Kubernetes in this release are 1.25, 1.24, and 1.23. Supervisors running on Kubernetes version 1.22 will be auto-upgraded to version 1.23 to ensure that all your Supervisors are running on the supported versions of Kubernetes.
Support
vRegistry Creation Deprecation - The creation of embedded Harbor instance, vRegistry, is deprecated. Existing vRegistry instances will continue to function, but the creation of new vRegistry instances is deprecated. The vRegistry creation APIs have been marked as deprecated, and the ability to deploy new vRegistry instances will be removed in an upcoming release. Instead, we recommend using Harbor as a Supervisor Service to manage your container images and repositories for enhanced performance and functionality. To migrate existing vRegistry to Harbor as a Supervisor Service, see “Migrate Images from the Embedded Registry to Harbor” for migration details.
Resolved Issues
A new alert message will be displayed in the vSphere Client to warn about expiring certificated on the Supervisor or TKG clusters. The alert will provide detailed information, including the name of the Supervisor and the certificate expiration date. Additionally, it will contain a link to KB 90627 that explains step-by-step how to replace impacted certificates.
Command kubectl get clustervirtualmachineimages
returning an error or No resources found
. In previous versions, when using the command kubectl get clustervirtualmachineimages
, an error was encountered. However, after upgrading to 8.0 Update 1c, the command now returns the message: No resources found
. To retrieve information about virtual machine images, use the following command instead: kubectl get virtualmachineimages
The antrea-nsx-routed CNI does not work with v1alpha3 Tanzu Kubernetes clusters on vSphere IaaS control plane 8.x releases.
Node drain timeout is not propogated correctly for v1beta1 Clusters.
VMware vSphere 8.0.1 | 18 APR 2023 ESXi 8.0.1 | 18 APR 2023 | Build 21495797 vCenter Server 8.0.1 | 18 APR 2023 | Build 21560480 VMware NSX Advanced Load Balancer avi-22.1.3 | 31 JAN 2023 HAProxy Community Edition 2.2.2, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
New Features
Supervisor
Supervisor Services are now available on VDS based Supervisors - Previously the availability of Supervisor Services was restricted to NSX based Supervisors only. With the current release, you can deploy Harbor, Contour, S3 object storage, and Velero Supervisor Services on VDS based Supervisors. Note: Supervisor Service capabilities on VDS based Supervisors require an ESXi update to 8.0 U1.
VM Service Support for all Linux Images - You can now use CloudInit to customize any linux image in OVF format conformant to the VM Service image specification as well as utilize OVF templating through vAppConfig to enable the deployment of legacy Linux images.
Web-Console Support for VM Service VMs - After deploying a VM Service VM, as a DevOps engineer you will now have the ability to launch a Web-console session for that VM by using kubectl CLI to enable troubleshooting and debugging within the guest OS without involving the vSphere admin to get access to guest VM.
Supervisor Compliance - Security Technical Implementation Guides (STIG) for vSphere IaaS control plane 8 Supervisor releases. See Tanzu STIG Hardening for details.
Tanzu Kubernetes Grid 2.0 on Supervisor
Custom cluster class - Bring your own cluster class for TKG clusters on Supervisor. For information, see https://kb.vmware.com/s/article/91826.
Custom image for TKG node - Build your own custom node images using vSphere TKG Image Builder (Ubuntu and Photon).
Note: To use a specific TKR with the v1alpha1 API use the fullVersion.
New TKR Images: Refer to the Tanzu Kubernetes releases Release Notes for details.
CRITICAL REQUIREMENT for vSpere IaaS Control Plane 8.0.0 GA Customers
Note: This requirement does not apply to content libraries you use for VMs provisioned through VM Service. It only applies to the TKR content library.
If you have deployed vSphere IaaS control plane 8.0.0 GA, before upgrading to vSphere IaaS control plane 8 U1 you must create a temporary TKR content library to avoid a known issue that causes TKG Controller pods to go into CrashLoopBackoff when TKG 2.0 TKrs are pushed to the existing content library. To avoid this issue, complete the following steps.
Create a new subscribed content library with a temporary subscription URL pointing to https://wp-content.vmware.com/v2/8.0.0/lib.json.
Synchronize all the items in the temporary content library.
Associate the temporary content library with each vSphere Namespace where you have deployed a TKG 2 cluster.
Run the command kubectl get tkr
and verify that all the TKrs are created.
At this point the TKG Controller should be in a running state, which you can verify by listing the pods in the Supervisor namesapce.
If the TKG Controller is in CrashLoopBackOff (CLBO) state, restart the TKG Controller deployment using the following command:
kubectl rollout restart deployment -n vmware-system-tkg vmware-system-tkg-controller-manager
Upgrade to vSphere IaaS control plane 8 Update 1.
Update each vSphere Namespace to use the original subscribed content library at https://wp-content.vmware.com/v2/latest/lib.json.
Resolved Issues
Tanzu Kubernetes Grid 2.0 clusters provisioned with the v1beta1 API must be based on the default ClusterClass
If you are creating a Tanzu Kubernetes Grid 2.0 cluster on Supervisor by using the v1beta1 API, the Cluster must be based on the default tanzukubernetescluster
ClusterClass. The system does not reconcile a cluster based on a different ClusterClass.
ESXi 8.0.0c | 30 MAR 2023 | Build 21493926 vCenter Server 8.0.0c | 30 MAR 2023 | Build 21457384 VMware NSX Advanced Load Balancer avi-22.1.3 | 31 JAN 2023 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
Resolved Issues:
Harbor insecure default configuration issue
This release resolves the Harbor insecure default configuration issue which is present if you have enabled the embedded Harbor registry on Supervisor 7.0 or 8.0.
Resolved in this vCenter version 8.0.0c. VMware Knowledge Base article 91452 for details of this issue and how to address it by either installing this release or by applying a temporary workaround.
After an upgrade to vCenter Server 8.0b, login attempts to a Supervisor and TKG clusters fail
Components running in vCenter Server 8.0b are not backward compatible with Supervisors deployed using vCenter Server in earlier releases.
Workaround: Upgrade vCenter Server to a newer version or upgrade all deployed Supervisors.
ESXi 8.0.0b | 14 FEB 2023 | Build 21203435 vCenter Server 8.0.0b | 14 FEB 2023 | Build 21216066 VMware NSX Advanced Load Balancer avi-22.1.1-9052 | 15 JULY 2022 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
New Features:
Added Cert-Manager CA cluster issuer support – Gives administrators the ability to define and deploy a Cluster Scoped CA Issuer on a Supervisor as a Supervisor Service. Deploying a Cluster Scoped CA Issuer enables consumers of Supervisor namespace to request and manage certificates signed by the CA issuer.
In addition to this new feature, the release delivers bug fixes.
Resolved Issues:
Supervisor Control plane VMs root disk fills up
Audit log files under /var/log/audit/ on the Supervisor control plane VMs may grow to a very large size and fill up the root disk. You should see "no space left on device" errors in journald logs reflecting this state. This can cause various aspects of Supervisor control plane functionality (like kubernetes APIs) to fail.
Resolved in this version, vSphere 8.0.0b
ESXi 8.0 | 11 OCT 2022 | Build 20513097 vCenter Server 8.0.0a | 15 DEC 2022 | Build 20920323 VMware NSX Advanced Load Balancer avi-22.1.1-9052 | 15 JULY 2022 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
New Features
Added Harbor as a Supervisor Service – Provides a full featured Harbor (OCI Image Registry) instance running on the Supervisor. The Harbor instance gives Harbor administrators the ability to create and manage projects and users as well as set up image scanning.
Deprecation of vRegistry – The embedded Harbor instance known as vRegistry will be removed in a future release. In its place, you can use Harbor as a Supervisor Service. See "Migrate Images from the Embedded Registry to Harbor" for migration details.
Supervisors support Kubernetes 1.24 – This release adds Kubernetes 1.24 support and drops Kubernetes 1.21 support. The Kubernetes versions supported in this release are 1.24, 1.23, and 1.22. Supervisor Clusters running on Kubernetes version 1.21 are auto-upgraded to version 1.22 to ensure that all your Supervisor Clusters run on supported versions.
vSphere Zone APIs – A vSphere Zone is a construct that lets you assign availability zones to vSphere clusters to make highly available Supervisor and Tanzu Kubernetes Clusters. In vSphere 8.0, Creating and managing vSphere zone functionality was available from the vSphere Client. With vSphere 8.0.0a release, users can create and manage vSphere zones using DCLI or vSphere client API Explorer. Full SDK binding support (for example, Java, Python, and so forth) will be available in a future release.
Upgrade Considerations:
A rolling update of Tanzu Kubernetes Clusters will occur in the following Supervisor upgrade scenarios:
When upgrading from 8.0 to 8.0.0a followed by upgrading a Supervisor from any Kubernetes release to Supervisor Kubernetes 1.24, and when one of these conditions are met.
If you are using proxy settings with a nonempty noProxy list on a Tanzu Kubernetes Cluster
If vRegistry is enabled on the Supervisor. This setup is only available on NSX-based Supervisors.
When upgrading from any vSphere 7.0 release to vSphere 8.0.0a.
Resolved Issues:
Tanzu 7 license keys are supported in vSphere 8 instead of Tanzu 8 license keys
vCenter 8.0 supports the Tanzu 7 license keys instead of Tanzu 8 license keys. This issue does not impact your ability to fully use Tanzu features on vCenter 8.0. For more details, see VMware Knowledge Base article 89839 before modifying Tanzu licenses in your vCenter 8.0 deployment.
LoadBalancers and Guest Clusters are not created when two SE Groups exist on NSX-ALB.
If a second SE Group is added to NSX-ALB with or without SEs or virtual services assigned to it, the creation of new supervisor or guest clusters fails and existing supervisor clusters cannot be upgraded. The virtual service creation on NSX-ALB controller fails with the following error:
get() returned more than one ServiceEngineGroup – it returned 2
As a result, new load balancers are unusable and you cannot create new workload clusters successfully. For more information, see VMware Knowledge Base article 90386.
VMware vSphere 8.0 | 11 OCT 2022 ESXi 8.0 | 11 OCT 2022 | Build 20513097 vCenter 8.0 | 11 OCT 2022 | Build 20519528 VMware NSX Advanced Load Balancer 21.1.4 | 07 APR 2022 HAProxy 2.2.2 Community Edition, Data Plane API 2.1.0 Tanzu Kubernetes Grid 2.0 | 11 OCT 2022 |
vSphere IaaS control plane 8.0 introduces the following new features and enhancements:
Workload Management Configuration Monitoring - You can now track the status of activation, deactivation, and upgrade of the Supervisor in greater detail. Once you initiate a Supervisor activation, deactivation, or upgrade, the Supervisor tries to reach the desired state by reaching various conditions associated with different components of the Supervisor, such as Control Plane VMs. You can track the status of each condition, view the associated warnings, retry the status, view which conditions are reached, and their time stamps.
vSphere Zones - A vSphere Zone is a new construct that lets you assign availability zones to vSphere clusters. Deploying a Supervisor across vSphere Zones lets you provision Tanzu Kubernetes Grid clusters in specific availability zones and failure domains. This allows for workloads to span across multiple clusters, which increases the resiliency to hardware and software failures.
Multi-Cluster Supervisor - By using vSphere Zones, you can deploy a Supervisor across multiple vSphere clusters to provide high availability and failure domains for Tanzu Kubernetes Grid clusters. You can add a vSphere cluster to a separate vSphere zone and activate a Supervisor that spans accross multiple vSphere Zones. This provides failover and resiliency to localized hardware and software failures. When a zone, or vSphere cluster goes offline, the Supervisor detects the failure and restarts workloads on another vSphere zone. You can use vSphere Zones in environments that span distances so long as latency maximums are not exceeded. For more information on latency requirements, see Requirements for Zonal Supervisor Deployment.
Supervisor Supports Kubernetes 1.23 - This release adds support for Kubernetes 1.23 and drops the support for Kubernetes 1.20. The supported versions of Kubernetes in this release are 1.23, 1.22, and 1.21. Supervisors running on Kubernetes version 1.20 will be auto-upgraded to version 1.21 to ensure that all your Supervisors are running on the supported versions of Kubernetes.
Provide consistent network policies with SecurityPolicy CRDs - SecurityPolicy custom resource definition provides the ability to configure network security controls for VMs and vSphere Pods in a Supervisor namespace.
Tanzu Kubernetes Grid 2.0 on Supervisor - Tanzu Kubernetes Grid now has moved to version 2.0. Tanzu Kubernetes Grid 2.0 is the culmination of tremendous innovation in the Tanzu and Kubernetes community, and provides the foundation for a common set of interfaces across private and public clouds with Tanzu Kubernetes Grid. New in this release are the following two major features:
ClusterClass - Tanzu Kubernetes Grid 2.0 now supports ClusterClass. ClusterClass provides an upstream-aligned interface, superseding the Tanzu Kubernetes Cluster, which brings customization capabilities to our Tanzu Kubernetes Grid platform. ClusterClass enables administrators to define templates that will work with their enterprise environment requirements while reducing boilerplate and enabling delegated customization.
Carvel - Carvel provides a set of reliable, single-purpose, composable tools that aid in application building, configuration, and deployment to Kubernetes. In particular, kapp and kapp-controller provide package management, compatibility, and lifecycle through a set of declarative, upstream-aligned tooling. Coupled with ytt for templating, this results in a flexible yet manageable package management capability.
New documentation structure and landing page in vSphere 8 - The vSphere IaaS control plane documentation now has improved structure that better reflects the workflows with the product and allows you to have more focused experience with the content. You can also access all the available technical documentation for vSphere IaaS control plane from the new documentation landing page.
Read the Installing and Configuring vSphere IaaS Control Plane documentation for guidance about installing and configuring vSphere IaaS control plane. For information about updating vSphere IaaS control plane, see the Updating vSphere IaaS Control Plane documentation.
When you perform your upgrades, keep in mind the following:
Before you update to vCenter Server 8.0, make sure that the Kubernetes version of all Supervisors is of minimum 1.21, preferably the latest supported. The Tanzu Kubernetes release version of the Tanzu Kubernetes Grid clusters must be of 1.21, preferably the latest supported.
Upgrades from legacy TKGS TKr to TKG 2 TKr are allowed starting with vSphere IaaS control plane 8.0 MP1 release. Refer to the Tanzu Kuberentes releases Release Notes for the supported versions matrix. Refer to the Using TKG on Supervisor with vSphere IaaS Control Plane documentation for upgrade information.
After powering on, the HAProxy appliance used with VDS networking in the vSphere IaaS control plane environment does not receive an IP address
This issue occurs after you deploy the appliance built from https://github.com/haproxytech/vmware-haproxy. The appliance powers on, but never receives an IP address. You can see the following in vmware.log for the appliance VM:
YYYY-MM-DDTHH:MM:SS.NNNZ In(##) vmx - GuestRpc: Permission denied for setting key guestinfo.XXX.
For example,
2024-09-13T19:10:24.512Z In(05) vmx - GuestRpc: Permission denied for setting key guestinfo.metadata
Workaround:
Use one of the following options:
Update the deployed appliance using the steps in the Broadcom KB article 377393.
Rebuild the appliance from the main branch of https://github.com/haproxytech/vmware-haproxy, which includes a workaround to the issue.
During Supervisor upgrade, kubectl exec
command might timeout for running pods
During Supervisor upgrade, the kubectl exec
command might fail for running pods with the message: error: Timeout occurred
. This might occur if the kubectl plugin is updated to latest version before the upgrade is completed.
For example:
# kubectl exec -it test-0 -n ns-1 -- bash
error: Timeout occurred
Workaround:
The issue will resolve automatically once the Supervisor upgrade is completed.
During an upgrade of vSphere IaaS control plane to vSphere 8.0 Update 3 or later, StoragePolicyQuota component might fail to upgrade
You can see errors similar to the following, and you might not be able to complete the upgrade.
Error while upgrading the components: Component StoragePolicyQuotaUpgrade failed: Failed to run command: ['kubectl', 'rollout', 'status', 'deployment', 'storage-quota-controller-manager', '-n', 'kube-system', '--timeout=3m', '--watch=true'] ret=1 out=Waiting for deployment "storage-quota-controller-manager" rollout to finish: 1 out of 3 new replicas have been updated...Waiting for deployment "storage-quota-controller-manager" rollout to finish: 1 out of 3 new replicas have been updated...Waiting for deployment "storage-quota-controller-manager" rollout to finish: 1 out of 3 new replicas have been updated...Waiting for deployment "storage-quota-controller-manager" rollout to finish: 1 out of 3 new replicas have been updated...err=error: timed out waiting for the condition
or
Error while upgrading the components: Component StoragePolicyQuotaUpgrade failed: Failed to run command: ['kubectl', 'rollout', 'status', 'deployment', 'storage-quota-webhook', '-n', 'kube-system', '--timeout=3m', '--watch=true'] ret=1 out=Waiting for deployment "storage-quota-webhook" rollout to finish: 0 out of 1 new replicas have been updated...Waiting for deployment "storage-quota-webhook" rollout to finish: 0 out of 1 new replicas have been updated...Waiting for deployment "storage-quota-webhook" rollout to finish: 0 out of 1 new replicas have been updated...Waiting for deployment "storage-quota-webhook" rollout to finish: 0 out of 1 new replicas have been updated...err=error: timed out waiting for the condition
Workaround:
Use the following steps to edit the rollingUpdate
strategy parameters for both storage-quota-controller-manager
and storage-quota-webhook
deployments.
SSH into the vCenter appliance:
ssh root@<VCSA_IP>
Print the credentials used to login to the Supervisor control plane:
/usr/lib/vmware-wcp/decryptK8Pwd.py
SSH into the Supervisor control plane using the IP and credentials from the previous step:
ssh root@<SUPERVISOR_IP>
From the Supervisor cluster, edit the storage-quota-controller-manager
deployment using the kubectl -n kube-system edit deploy storage-quota-controller-manager
command and set the following parameter:
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
From the Supervisor cluster, edit the storage-quota-webhook
deployment using the kubectl -n kube-system edit deploy storage-quota-webhook
command and set the following parameter:
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
Supervisor deployment might be unsuccessful and remain in Configuring state if you don’t open port 5000 between vCenter Server and Supervisor
During Supervisor deployment, vCenter Server attempts to reach the internal Docker registry of the Supervisor Control Plane to check for available container images. This check verifies compatibility of Supervisor Services that are bundled with vSphere and are typically installed during the deployment. If vCenter Server appliance cannot reach that port on the Supervisor Control Plane nodes, the Supervisor Service compatibility checks fail. Supervisor deployment remains unsuccessful in Configuring state.
Workaround:
Allow network traffic from vCenter Server Appliance to the management network IP address of the Supervisor Control Plane nodes to pass through TCP/5000. The Supervisor should automatically retry the failed compatibility checks and resolve the issue. For information, see VMware Ports and Protocols.
Supervisor login sessions result in an error "Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for ..."
When a Supervisor is upgraded the first time after upgrading vCenter to 8.0.3, existing kubectl login sessions with that Supervisor will result in the following error :
Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for 172.*.*.*, 10.*.*.*, 10.*.*.*, 127.*.*.*, not 192.*.*.*
Upgrading to vCenter 8.0.3 and then upgrading a Supervisor to any supported version available in the vCenter release results in a port change for authentication. This change is required to support vSphere Pod deployment in DHCP Supervisors.
Workaround: After the upgrade, you should re-login by using kubectl with either vCenter Single Sign-On or Pinniped authentication.
After you remove and then reassign a storage policy to a namespace in a Supervisor, attempts to create a VM with this storage policy fail with an error
You can observe the following error message:
Invalid value: "<storagepolicyname>": Storage policy is not associated with the namespace <namespace_name>): error when creating "<vm_creation.yaml>": admission webhook "default.validating.virtualmachine.v1alpha2.vmoperator.vmware.com" denied the request: spec.storageClass: Invalid value: "<storagepolicyname>": Storage policy is not associated with the namespace <namespace_name>', 'reason': None, 'response_data': '', 'status_code': 0}>'
This issue might occur in the following circumstances.
Typically, after you assign a storage policy to a namespace in the Supervisor cluster, a StoragePolicyQuota CR for that policy gets created. A storage quota controller running in the Supervisor cluster populates and maintains the quota usage for that policy per storageclass in the status field of the StoragePolicyQuota CR.
While creating VMs, the VM operator admission webhook checks presence of the storage policy assigned to this namespace from the status field of StoragePolicyQuota CR for the assigned policy.
However, after you remove the storage policy and then assign it back to the namespace, the storage quota controller does not populate the status field of recreated StoragePolicyQuota CR due to no change in quota usage. As StoragePolicyQuota status is not correctly populated, VM creation might fail with the error Storage policy is not associated with the namespace
even if the storage policy has been correctly associated with the namespace.
Workaround:
When this problem occurs, you can restart all storage quota controller pods to re-populate the StoragePolicyQuota CR correctly to allow VM creation. After the storage quota controller pods come up again, retry VM creation to make sure it succeeds.
The following steps describe how to restart storage-quota-controller-manager pods on a Supervisor cluster:
SSH into the vCenter appliance:
ssh root@<VCSA_IP>
Print the credentials used to login to the Supervisor control plane:
/usr/lib/vmware-wcp/decryptK8Pwd.py
SSH into the Supervisor control plane using the IP and credentials from the previous step:
ssh root@<SUPERVISOR_IP>
Scale down running storage-quota-controller-manager pods from the Supervisor cluster using the following command:
kubectl -n kube-system scale deploy storage-quota-controller-manager --replicas=0
Scale up storage-quota-controller-manager pods from the Supervisor cluster back to original replace count, for example, 3:
kubectl -n kube-system scale deploy storage-quota-controller-manager --replicas=3
Make sure that the storage-quota-controller-manager pods are functioning after restarting successfully and the status field for all StoragePolicyQuota CRs is populated correctly. VM creation with the reassigned storage policy in that namespace should succeed.
In vSphere 8.0 Update 3, the network override options are no longer present in the vSphere Namespace creation page
Prior to the 8.0 Update 3 release, a vSphere administrator could provide custom network configuration instead of the default network configuration used when a vSphere Namespace is created. In the 8.0 Update 3 release, the UI option to override the network configuration is not available.
Workaround: You can use DCLI to create the vSphere Namespace with custom network configuration.
Occasionally, ESXi hosts might fail to join a Supervisor cluster because the vds-vsip module can't be loaded
When the ESXi hosts fail to join the Supervisor cluster, you can see the following error in the ESXi hosts log file /var/run/log/spherelet.log:
cannot load module vds-vsip: cmd '/bin/vmkload_mod /usr/lib/vmware/vmkmod/vds-vsip' failed with status 1: stderr: '', stdout:'vmkmod: VMKModLoad: VMKernel_LoadKernelModule(vds-vsip): Failure\nCannot load module /usr/lib/vmware/vmkmod/vds-vsip: Failure\n'
This problem might occur during a Supervisor cluster upgrade, certificate upgrade, or any other Supervisor configuration change that restarts spherelet.
Workaround: Reboot the ESXi hosts that cannot load the vds-vsip module.
Attempts to upgrade your vSphere IaaS control plane environment from 7.0 Update 3o to 8.0 Update 3 with a Supervisor using Tiny control plane VMs fail with an error
After you upgrade vCenter from 7.0 Update 3o to 8.0 Update 3, the Supervisor configured with Tiny control plane VM's cannot upgrade from Kubernetes v1.26.8 to v1.27.5.
You can observe the following errors: Waiting for deployment \"snapshot-validation-deployment\" rollout to finish: 2 out of 3 new replicas have been updated...'kubectl get pods' in namespaces starting with the name 'vmware-system-' show pods in OutOfCpu state.
Workaround: Before the upgrade, scale up the size of control plane VMs from Tiny to Small or above. See Change the Control Plane Size of a Supervisor.
After you upgrade vCenter and your vSphere IaaS control plane environment to vSphere 8.0 U3, attempts to create vSphere pods fail if the Supervisor uses Kubernetes version 1.26
After you upgrade your environment to vSphere 8.0 U3 and upgrade your Supervisor to Kubernetes version 1.26, the vSphere pods creation operations fail while the system attempts to pull the image. You can observe the failed to configure device eth0 with dhcp: getDHCPNetConf failed for interface
error even though the cluster is enabled with static network.
Workaround: Upgrade the vSphere IaaS control plane and Supervisor to Kubernetes version 1.27 or higher.
Occasionally, when you simultaneously attempt to delete a volume snapshot and perform a PVC restore operation, the operations do not complete due to internal dependencies
The problems might occur under the following circumstances. You start a restore operation for a volume snapshot and the restore operation takes longer to complete or gets retried due to different internal reasons. In the meantime, you trigger the deletion of the source snapshot. As a result, both operations collide and remain incomplete. The snapshot deletion keeps failing due to ongoing restore operation for this snapshot, while the restore operation starts failing because the snapshot deletion has been triggered.
Workaround: To resolve this problem, you must delete the restored PVC to stop the restore operation and let the snapshot deletion to proceed. In this case, the snapshot data will be lost and cannot be restored because the snapshot gets deleted after you delete the restored PVC.
VM Service VMs cannot use storage classes with WaitForFirstConsumer
volume binding mode ending with -latebinding
When you use this type of storage class for VM Service VMs, unpredictable behavior and errors occur.
Workaround: Do not use this type of storage class for VM Service VMs. It can only be used for vSphere Pod workloads.
A new license model changes the behavior of NSX Container Plugin (NCP) in the VMware vSphere IaaS control plane environment
In the 8.0 Update 3 release, the NSX Distributed Firewall (DFW) license is offered as a separate add-on license. Without this license, NCP in the VMware vSphere IaaS control plane environment will adjust DFW rules on NSX causing unpredictable behavior.
For example, without the DFW license, new security and network policies created for the VMware vSphere IaaS control plane, do not function because NCP can't add rules on NSX Manager. Also, resources like pods in a newly created namespace can be reachable from another namespace. In contrast, with the DFW license, a newly created namespace should not be accessible from another namespace by default.
Workaround: The NSX Distributed Firewall (DFW) license is offered as a separate add-on license on NSX Manager. To avoid problems, add the license to NSX Manager.
Velero vSphere Operator installs successfully, but attempts to instantiate an instance of Velero might fail
Velero vSphere Operator deploys its operator pods on the Control Plane VMs, while instances of Velero deploy as vSphere Pods.
The deployment problem might occur when vCenter is upgraded to 8.x and the Supervisor uses VDS networking. However, the ESXi hosts for that Supervisor have not been upgraded and are using an asynchronous version earlier than 8.x. In this scenario, the ESXi hosts cannot deploy vSphere Pods. As a result, while Velero vSphere Operator installs successfully, it fails to instantiate an instance of Velero when the administrator attempts to use it.
Workaround: To make sure that the Velero supervisor service works properly, you must first upgrade ESXi to version 8.x and then upgrade vCenter and the Supervisor to the same 8.x version.
The VM Operator pod is crashing due to insufficient resources at scale
If VirtualMachine resources are taking a long time to be realized when at scale, for example, thousands of VMs, it could be due to the VM Operator pod crashing due to insufficient memory.
Workaround: For the workaround and more information, see the Knowledge Base article 88442.
After you upgrade to vCenter 8.0 Update 2b, the vmon-managed service wcp
might be in STOPPED
state and vCenter patching might fail
This issue occurs only when you upgrade vCenter 8.0 Update 1 or newer to vCenter 8.0 Update 2b, and you have at least one Supervisor that is using VDS networking topology and is on Kubernetes 1.24.
Workaround: To avoid this issue, upgrade the Supervisor to Kubernetes 1.25 or newer before upgrading vCenter to 8.0 Update 2b. For more information, contact customer support.
Supervisor operations fail when the size of custom vCenter certificates is greater than 8K
In vSphere 8.0 Update 2b, the maximum key size of a CSR in a vCenter system is down to 8192 bits from 16384 bits. When the key size of your certificate is greater than 8192 bits, you might observe unpredictable behavior of Supervisor operations. For example, such operations as Supervisor enablement or upgrade might fail.
Workaround:
Regenerate any vCenter certificate that has a key size greater than 8192 bits.
Identify any certificates with key size greater than 8192 bits.
for store in TRUSTED_ROOTS MACHINE_SSL_CERT vpxd-extension wcp ; do echo $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store "$store" --text | grep Public-Key; done
Replace any certificates whose key size is greater than 8192 bits.
Re-register vCenter in NSX if using NSX.
Restart WCP service.
For more information, see the Knowledge Base article 96562.
Templates might get deleted from the Content Library in vCenter when the library is linked to multiple vSphere namespaces
You can observe this issue when a library is linked to multiple namespaces in vSphere IaaS control plane, and the library is either unlinked from a namespace or a namespace is deleted.
Workaround:
Avoid using a Local Library if you want to link the library to multiple namespaces. Instead, create a published library on vCenter and upload all the templates to the published library. Then create a subscribed library that will sync the templates from the published library on the same vCenter and link this subscribed library to multiple namespaces. In this case, even when a library is either unlinked from any namespace or a namespace is deleted, the library items won't be deleted from vCenter.
Errors occur when you use VMware vSphere Automation REST APIs to create or update a VM class and include integer fields in the configSpec
You can observe the following errors when the configSpec includes integer fields, such as numCPU or memoryMB:
Unknown error - vcenter.wcp.vmclass.configspec.invalid: json: unsupported discriminator kind: struct.
Unknown error - vcenter.wcp.vmclass.configspec.invalid: json: cannot unmarshal string into Go struct field VirtualMachineConfigSpec.<fieldName> of type int32.
This problem happens because of a known issue in the VAPI endpoint that parses Raw REST configSpec JSON with integers and passes integers as strings causing failures. The problem occurs only when you use the REST APIs through the VMware Developer Center.
Workaround: Instead of the REST APIs, use DCLI or the vSphere Client to create or update the VM classes.
Alternatively, you can use vSphere Automation SDK for python/java. The following example demonstrates how to use the public protocol transcoder service to convert a VirtualMachineConfigSpec (vim.vm.ConfigSpec) object obtained from vCenter using pyvmomi. The last entry of the example shows how to parse a manually crafted JSON. The object can then be sent to the VirtualMachineClasses API.
After you apply a valid vSphere VMware Foundation license to a Supervisor, the Workload Management page continues to show the old evaluation license with an expiration warning
You can encounter this issue if you have the Supervisor enabled with an evaluation license and apply the vSphere VMware Foundation license to the Supervisor. However, the new license does not appear on the Workload Management page in the vSphere Client under vCenter > Workload Management > Supervisors > Supervisor. The vSphere Client continues to show the warning with the old evaluation license expiration date even though the new license key has been successfully applied.
Workaround: None. The new license is displayed correctly in the vSphere Client under Administration > Licenses > Assets > Supervisors.
Updated security policy rule in the security policy CRD does not take effect if the added or deleted rule is not the last one
As a DevOps, you can configure the security policy CRD to apply an NSX based security policy to a Supervisor Cluster namespace. When you update the CRD and add or delete a security policy rule, the updated rule does not take effect unless it is the last rule in the rule list.
Workaround: Add the rule to the end of the rule list, or use a separate security policy for rule addition or deletion.
Changes in the VM image name format in vSphere 8.0 U2 might cause problems when old VM image names are used
Prior to vSphere 8.0 Update 2, the name of a VM image resource was derived from the name of a Content Library item. For example, if a Content Library item was named photonos-5-x64, then its corresponding VirtualMachineImage
resource would also be named photonos-5-x64. This caused problems when library items from different libraries had the same names.
In the vSphere 8.0 Update 2 release, the VM Image Service has been introduced to manage VM images in a self-service manner. See Creating and Managing Content Libraries for Stand-Alone VMs in vSphere IaaS Control Plane.
This new functionality allows content libraries to be associated with a namespace or the entire supervisor cluster, and requires that the names of VM image resources in Kubernetes clusters are unique and deterministic. As a result, to avoid potential conflicts with library item names, the VM images now follow the new naming format vmi-xxx
. However, this change might cause issues in the vSphere 8.0 Update 2 release if you use previous VM YAMLs that reference old image names, such as photonos-5-x64 or centos-stream-8.
Workaround:
Use the following commands to retrieve information about VM images:
To display images associated with their namespaces, run kubectl get vmi -n <namespace>
.
To fetch images available in the cluster, run kubectl get cvmi
.
After obtaining the image resource name listed under the NAME column, update the name in your VM deployment spec.
During a Supervisor auto-upgrade, the WCP Service process on vSphere might trigger a panic and stop unexpectedly
You can notice a core dump file generated for the WCP Service process.
Workaround: None. The VMON process automatically restarts the WCP Service, and the WCP Service resumes the upgrade with no further problems.
Supervisor upgrade suspends with ErrImagePull and Provider Failed status in vSphere Pods. Persistent volumes attached to vSphere Pods (including Supervisor Services) may not be detached on ErrImagePull failures.
Persistent volumes might not be detached for vSphere Pods failing with ErrImagePull . This can cause a cascade of failed vSphere Pods, because the required persistent volumes are attached to the failed instance. During the Supervisor upgrade, vSphere Pods within the Supervisor might transition to a provider failed
state, becoming unresponsive.
Workaround: Delete the instances of failed vSphere Pods that have persistent volumes attached to them. It is important to note that Persistent Volume Claims (PVC) and persistent volumes that are associated with vSphere Pods can be retained. After completing the upgrade, recreate the vSphere Pods by using the same PodSpecPVC to maintain data integrity and functionality. To mitigate this issue, create vSphere Pods by using ReplicaSets (DaemonSet, Deployment) to maintain a stable set of replica vSphere Pods running at any given time.
Supervisor upgrade stuck at 50% and Pinniped upgrade fails due to leader election
Pinniped pods stuck in pending state during roll out and the Supervisor upgrade fails during the Pinniped component upgrade.
Steps to workaround are:
Run kubectl get pods -n vmware-system-pinniped
to check the status of the Pinniped pods.
Run kubectl get leases -n vmware-system-pinniped pinniped-supervisor -o yaml
to verify that holderIdentity
is not any of the Pinniped pods in pending state.
Run kubectl delete leases -n vmware-system-pinniped pinniped-supervisor
to remove the lease object for pinniped-supervisor that has a dead pod as holderIdentity
.
Run kubectl get pods -n vmware-system-pinniped
again to ensure that all pods under vmware-system-pinniped are up and running.
An ESXi host node cannot enter maintenance mode with Supervisor control plane VM in powered-on state
In a Supervisor setup with NSX Advanced Load Balancer, ESXi host fails to enter maintenance mode if there is Avi Service Engine VM is in powered-on state, which will impact the ESXi host upgrade and NSX upgrade with maintenance mode.
Workaround: Power off Avi Service Engine VM so ESXi can enter maintenance mode.
Cannot receive looped back traffic using ClusterIP with vSphere Pods on VDIS
If an application within a vSphere Pod tries to reach out to a ClusterIP that is served within the same vSphere Pod (in a different container) the DLB within VSIP is unable to route the traffic back to the vSphere Pod.
Workaround: None.
Network Policies are not enforced in a VDS based Supervisor
Existing service YAML that uses NetworkPolicy does not require any changes. The NetworkPolicy will not be enforced if present in the file.
Workaround: You must set policy-based rules on the networking for VDS. For NetworkPolicy support, NSX networking support is required.
Namespace of a Carvel service might continue to show up in the vSphere Client after you uninstall the service from the Supervisor
If the Carvel service takes over 20 minutes to uninstall from the Supervisor, its namespace might still show up in the vSphere Client after the service is uninstalled.
Attempts to reinstall the service on the same Supervisor fail until the namespace is deleted. And the ns_already_exist_error
message shows up during the reinstallation.
You see the following entry in the log file:
/var/log/vmware/wcp/wcpsvc.log should have the message - "Time out for executing post-delete operation for Supervisor Service with serviceID '<service-id>' from cluster '<cluster-id>'. Last error: <error>"
Workaround: Manually delete the namespace from the vSphere Client.
From the vSphere Client home menu, select Workload Management.
Click the Namespaces tab.
From the list of namespaces, select the uncleared namespace, and click the REMOVE button to delete the namespace.
Upgrading ESXi hosts from 7.0 Update 3 to 8.0 without Supervisor upgrade results in ESXi hosts showing as Not Ready and workloads going offline
When you upgrade the ESXi hosts that are part of a Supervisor from vSphere 7.0 Update 3 to vSphere 8.0 and you do not upgrade the Kubernetes version of the Supervisor, the ESXi hosts are showing as Not Ready and workloads running on the hosts go offline.
Workaround: Upgrade the Kubernetes version of the Supervisor to at least 1.21, 1.22, or 1.23.
Upon one-click upgrade of vCenter Server, the Supervisor will not be auto-upgraded
If the Kubernetes version of the Supervisor is of version earlier than 1.22, upon one-click upgrade of vCenter Server to 8.0, the Supervisor is not able auto-upgrade to the minimum supported version Kubernetes version for 8.0, which is 1.23.
Workaround: Before upgrading vCenter Server for 8.0, upgrade the Kubernetes version of the Supervisor to 1.22. If you have already upgrade vCenter Server to 8.0, manually upgrade the Kubernetes version of the Supervisor to 1.23.
If you change rules in a security policy custom resource, stale rules might not be deleted
This problem might occur when you update the security policy. For example, you create a security policy custom resource that contains rules A and B and then update the policy changing the rules to B and C. As a result, rule A is not deleted. The NSX Management Plane continues to display rule A in addition to B and C.
Workaround: Delete the security policy and then create the same one.
After an upgrade of vCenter Server and vSphere IaaS control plane, a Tanzu Kubernetes Grid cluster cannot complete its upgrade because of volumes that appear as attached to the cluster’s nodes
When the Tanzu Kubernetes Grid cluster fails to upgrade, you can notice a volume that shows up as attached to the cluster’s nodes and does not get cleared. This problem might be caused by an issue in the upstream Kubernetes.
Workaround:
Obtain information about the TKG cluster node that has scheduling disabled by using the following command:
kubectl get node tkc_node_name -o yaml
Example:
# kubectl get node gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95 -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
cluster.x-k8s.io/cluster-name: gcm-cluster-antrea-1
cluster.x-k8s.io/cluster-namespace: c1-gcm-ns
cluster.x-k8s.io/machine: gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
cluster.x-k8s.io/owner-kind: MachineSet
cluster.x-k8s.io/owner-name: gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7
csi.volume.kubernetes.io/nodeid: '{"csi.vsphere.vmware.com":"gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95"}'
kubeadm.alpha.kubernetes.io/cri-socket: /run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
….
….
volumesAttached:
- devicePath: ""
name: kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
Check if this node has any vSphere CSI volumes in the status.volumeAttached
property. If there are any volumes, proceed to the next step.
Verify that no pods are running on the node identified in Step 1. Use this command:
kubectl get pod -o wide | grep tkc_node_name
Example:
kubectl get pod -o wide | grep gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
The empty output of this command indicates that there are no pods. Proceed to the next step because the output in Step 1 shows a volume attached to the node that does not have any pods.
Obtain information about all node objects to make sure that the same volume is attached to another node:
kubectl get node -o yaml
Example:
The same volume name shows up in two TKG cluster node objects.
On old node - "gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95"
volumesAttached:
- devicePath: ""
name: kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
On new node "gcm-cluster-antrea-1-workers-pvx5v-84486fc97-ndh88"
volumesAttached:
- devicePath: ""
name: kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
...
volumesInUse:
- kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
...
Search for the PV based on the volume handle in the volume name.
In the above example volume name is kubernetes.io/csi/csi.vsphere.vmware.com^943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
and the volume handle is 943bd457-a6cb-4f3d-b1a5-e301e6fa6095-59c73cea-4b75-407d-8207-21a9a25f72fd
.
Get all the PVs and search for the above volume handle by using this command:
kubectl get pv -o yaml
In the above example, the PV with that volume handle is pvc-59c73cea-4b75-407d-8207-21a9a25f72fd
.
Use the volumeattachment command to search for the PV found in previous step:
kubectl get volumeattachment | grep pv_name
Example:
# kubectl get volumeattachment | grep pvc-59c73cea-4b75-407d-8207-21a9a25f72fd
NAME ATTACHER PV NODE ATTACHED AGE
csi-2ae4c02588d9112d551713e31dba4aab4885c124663ae4fcbb68c632f0f46d3e csi.vsphere.vmware.com pvc-59c73cea-4b75-407d-8207-21a9a25f72fd gcm-cluster-antrea-1-workers-pvx5v-84486fc97-ndh88 true 3d20h
You can observe a volume attachment attached to node gcm-cluster-antrea-1-workers-pvx5v-84486fc97-ndh88
instead of node gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
. This indicates that the status.volumeAttached
in gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
in incorrect.
Delete the stale volumeAttached entry from the node object:
kubectl edit node tkc_node_name
Example:
kubectl edit node gcm-cluster-antrea-1-workers-pvx5v-5b9c4dbcc7-28p95
Remove the stale volume entry from status.volumeAttached
.
Repeat the above steps for all stale volumes in the status.volumeAttached
property.
If WCPMachines still exists, manually delete it and wait for few minutes to reconcile the cluster.
# kubectl get wcpmachines -n c1-gcm-ns
NAMESPACE NAME ZONE PROVIDERID IPADDR
c1-gcm-ns gcm-cluster-antrea-1-workers-jrc58-zn6wl vsphere://423c2281-d1bd-9f91-0e87-b155a9d291a1 192.168.128.17
# kubectl delete wcpmachine gcm-cluster-antrea-1-workers-jrc58-zn6wl -n c1-gcm-ns
wcpmachine.infrastructure.cluster.vmware.com "gcm-cluster-antrea-1-workers-jrc58-zn6wl" deleted
If a vSphere admistrator configures a self-service namespace template with resource limits that exceed the cluster capacity, an alert is not triggered.
When vSphere admistrators configure resource limits that exceed the cluster capacity, DevOps Engineers might use the template to deploy vSphere Pods with high resources. As a result, the workloads might fail.
Workaround: None
When you delete a Supervisor namespace that contains a Tanzu Kubernetes Grid cluster, persistent volume claims present in the Supervisor might remain in terminating state
You can observe this issue when a resource conflict occurs while the system deletes the namespace and detaches volumes from the pods in the Tanzu Kubernetes Grid cluster.
The deletion of the Supervisor namespace remains incomplete, and the vSphere Client shows the namespace state as terminating. Persistent volume claims that were attached to pods in the Tanzu Kubernetes Grid cluster also remain in terminating state.
If you run the following commands, you can see the Operation cannot be fulfilled on persistentvolumeclaims error:
kubectl vsphere login --server=IP-ADDRESS --vsphere-username USERNAME
kubectl logs vsphere-csi-controller-pod-name -n vmware-system-csi -c vsphere-syncer
failed to update PersistentVolumeClaim: \\\"<pvc-name>\\\" on namespace: \\\"<supervisor-namespace>\\\". Error: Operation cannot be fulfilled on persistentvolumeclaims \\\"<pvc-name>\\\": the object has been modified; please apply your changes to the latest version and try again\
Workaround:
Use the following commands to fix the issue:
kubectl vsphere login --server=IP-ADDRESS --vsphere-username USERNAME
kubectl patch pvc <pvc-name> -n <supervisor-namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge
When deleting multiple FCDs and volumes from shared datastores such as vSAN, you might notice changes in performance
The performance changes can be caused by a fixed issue. While unfixed, the issue caused stale FCDs and volumes to remain in the datastore after an unsuccessful FCD delete operation.
Workaround: None. The delete operation works as usual despite the change in the performance.
If a DevOps user starts volume operations or stateful application deployments while vCenter Server reboots, the operations might fail
The problem occurs because the workload storage management user account gets locked, and the vSphere CSI plug-in that runs on the Supervisor fails to authenticate. As a result, the volume operations fail with InvalidLogin errors.
The log file /var/log/vmware/vpxd/vpxd.log
displays the following message:
Authentication failed: The account of the user trying to authenticate is locked. :: The account of the user trying to authenticate is locked. :: User account locked: {Name: workload_storage_management-<id>, Domain: <domain>})
Workaround:
Obtain the account unlock time.
In the vSphere Client, navigate to Administration, and click Configuration under Single Sign On.
Click the elect Accounts tab.
Under Lockout Policy, get the Unlock time in seconds.
Authenticate with the Supervisor using the vSphere Plugin for kubectl.
kubectl vsphere login –server IP-ADDRESS --vsphere-username USERNAME
Note down the original replica count for vsphere-csi-controller deployment in vmware-system-csi- namespace.
kubectl get deployment vsphere-csi-controller -n vmware-system-csi -o=jsonpath='{.spec.replicas}'
original-replica-count
Scale down the vsphere-csi-controller deployment replica count to zero.
kubectl scale --replicas=0 deployment vsphere-csi-controller -n vmware-system-csi
Wait for the number of seconds indicated under Unlock time.
Scale up the vsphere-csi-controller deployment replica count to the original value.
kubectl scale --replicas=original-replica-count deployment vsphere-csi-controller -n vmware-system-csi
When you upgrade your vSphere IaaS control plane 7.0.x environment to vSphere 8.0.x, any TKG clusters of v1.27.6 become incompatible
vSphere 8.0.x doesn't support TKR 1.27.6.
As a result, the TKG clusters of v1.27.6 become incompatible after you upgrade vSphere IaaS control plane 7.0.x to vSphere 8.0.x. You will not receive any pre-check warnings during the Supervisor upgrade.
Workaround:
If you have any running TKGS clusters of v1.27.6 on vSphere 7.0.x, do not upgrade to vCenter 8.0.x, especially, vCenter 8.0 Update 2b. For more information, see VMware Tanzu Kubernetes releases Release Notes and KB article 96501.
If you plan to upgrade your vSphere 7.x environment to vCenter 8.x, do not upgrade to TKR 1.27.6.
TKG cluster worker nodes fail to powering on with error log VIF restore activity already completed for attachment ID
from the nsx-ncp pod
TKG cluster worker nodes fail to powering on with error the following error:
nsx-container-ncp Generic error occurred during realizing network for VirtualNetworkInterface
NCP logs an error:
VIF restore activity already completed for attachment ID
When a VM of a TKG cluster node created after an NSX backup migrates with vMotion immediately after NSX restore, NCP cannot restore the port for the VM, because the attachment ID will get reused in the vMotion and block the NCP's restore request.
Workaround:
Go to NSX Manager to get the segment ports that should be deleted, they have a display name in the format of <vm name>.vmx@<attachment id>
Before deleting the newly created port, find the host where the VM is running and turn off the ops-agent by running /etc/init.d/nsx-opsagent stop
on the host.
Delete the port by using NSX API https://<nsx-mgr>/api/v1/logical-ports/<logical switch port id>?detach=true
Turn on the ops-agent by running /etc/init.d/nsx-opsagent start
on the host.
Wait until NCP restores the port.
Pods, PVs, and PVCs in a TKG cluster may be stuck in a Terminating
state during TKG cluster clean-up or during recovery from ESXi hosts downtime
As part of the normal TKG cluster delete and cleanup process, all of the deployments, statefulsets, PVCs, PVs and similar objects are deleted. However, for TKG clusters based on TKR v1.24 and lower, some of the PVs may be stuck in a Terminating
state due to DetachVolume errors. The issue occurs when DetachVolume errors on a VolumeAttachment object cause the finalizers on the VolumeAttachment to not be removed, resulting in failure to delete related objects. This scenario can also occur if there is downtime in the ESXi hosts.
Workaround: Run the following command in the TKG cluster to remove the finalizers from volumeattachments with a detachError:
kubectl get volumeattachments \
-o=custom-columns='NAME:.metadata.name,UUID:.metadata.uid,NODE:.spec.nodeName,ERROR:.status.detachError' \
--no-headers | grep -vE '<none>$' | awk '{print $1}' | \
xargs -n1 kubectl patch -p '{"metadata":{"finalizers":[]}}' --type=merge volumeattachments
TGK cluster unreachable after backup and restore
If a vSphere Namespace is created after an NSX backup and configured with a customized ingress/egress CIDR, once NSX is restored from a backup, NCP fails to complete the restore process, and TKG clusters on the vSphere Namespace are not available. NCP fails during the restore process with an error such as the following:
NSX IP Pool ipp_<supervisor_namespace_id>_ingress not found in store
Workaround: In case a backup of the Supervisor was taken around the same time as the NSX backup, but before the affected vSphere Namespace was created, restore the Supervisor as well from the backup. Alternatively, delete the vSphere Namespace and associated TKG clusters, wait for NCP to resync and then re-create deleted resources.
TKG cluster unreachable after NSX backup and restore
When a Supervisor is configured with a customized Ingress CIDR, after a NSX backup restore, TKG clusters may become unavailable as the NCP component is unable to properly validate the TKG clusters' ingress VIP.
Workaround: By using the NSX API, manually configure VIPs for TKG clusters in NSX to restore access.
Existing Tanzu Kubernetes Grid clusters configured with a proxy server cannot be upgraded to a vSphere 8 Supervisor
FIXED: This known issue is fixed in the vSphere 8 with Tanzu MP1 release.
If you have configured an existing Tanzu Kubernetes Grid cluster with a proxy server, you cannot upgrade that cluster from a vSphere 7 Supervisor to Tanzu Kubernetes Grid 2.0 on vSphere 8 Supervisor.
Workaround: The contents of the noProxy field has conflicts with upgrade checks. Because this field is required if the proxy stanza is added to the cluster spec, you must remove the proxy configuration in its entirety before upgrading to vSphere 8.
The antrea-resource-init pod hangs in Pending state
After upgrading the Tanzu Kubernetes release version of a Tanzu Kubernetes Grid cluster, the antrea-resource-init pod might be in Pending state.
Workaround: Restart the pod on the Supervisor.
Tanzu Kubernetes Grid clusters v1.21.6 might enter a into FALSE state and some machines might not migrate
After upgrading to vCenter Server 8 and updating the Supervisor, v1.21.6 Tanzu Kubernetes Grid clusters might enter into a FALSE state and some wcpmachines might not migrate to vspheremachines.
Workaround: none
By default the password for the vmware-system-user account expires in 60 days for TKG clusters running TKR version v1.23.8---vmware.2-tkg.2-zshippable
As part of STIG hardening, for TKG clusters running TKR version v1.23.8---vmware.2-tkg.2-zshippable, the vmware-system-user account is configured to expire in 60 days. This can impact users who are using the vmware-system-user account to SSH to cluster nodes.
Refer to the following knowledgebase article to update the vmware-system-user password expiry, allowing SSH sessions to TKG cluster nodes if required: https://kb.vmware.com/s/article/90469
tanzu-capabilities-controller-manager pod is continually restarting and going to CLBO on TKC in vSphere IaaS control plane 8.0.0a
As a result of service account permission issues the tanzu-capabilities-controller-manager pod on the TKG cluster will be stuck in CLBO(Crash Loop back off) when using TKR version v1.23.8+vmware.2-tkg.2-zshippable.
Workaround: Add the required permissions to the capabilities service account tanzu-capabilities-manager-sa on the TKC.
Pause the reconciliation of the capabilities package on the TKC:
kubectl patch -n vmware-system-tkg pkgi tkc-capabilities -p '{"spec":{"paused": true}}' --type=merge
Create a new file capabilities-rbac-patch.yaml:
apiVersion: v1
kind: Secret
metadata:
name: tanzu-capabilities-update-rbac
namespace: vmware-system-tkg
stringData:
patch-capabilities-rbac.yaml: |
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"ClusterRole", "metadata": {"name": "tanzu-capabilities-manager-clusterrole"}}),expects="1+"
---
rules:
- apiGroups:
- core.tanzu.vmware.com
resources:
- capabilities
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- core.tanzu.vmware.com
resources:
- capabilities/status
verbs:
- get
- patch
- update-rbac
Patch the capabilities cluster role on TKC:
//Replace tkc with the name of the Tanzu Kubernetes Grid cluster: kubectl patch -n vmware-system-tkg pkgi tkc-capabilities -p '{"metadata":{"annotations":{"ext.packaging.carvel.dev/ytt-paths-from-secret-name.0":"tanzu-capabilities-update-rbac"}}}' --type=merge
Delete the tanzu-capabilities-controller-manager on the TKC.
Tanzu Kubernetes Grid 2.0 clusters on a Supervisor deployed on three vSphere Zones do not support VMs with vGPU and instance storage.
Tanzu Kubernetes Grid 2.0 clusters provisioned on a Supervisor instance deployed across three vSphere Zones do not support VMs with vGPU and instance storage.
Workaround: none
TKR version v1.22.9 is listed in the content library image but not in kubectl command
The content library for TKR images lists TKR v1.22.9. The command kubectl get tkr
does not list this image as available, because TKR v1.22.9 is not available for use and should not be used. This image appears in the content library by error.
Workaround: Use a TKR other than TKR v1.22.9. Refer to the TKR Release Notes for a list of the available TKRs.
Unable to provision a TKC using the v1alpha1 API and a v1.23.8 TKR in vSphere IaaS control plane 8.0.0a
When utilizing TKC v1alpha1 API to provision TKC with version v1.23.8. The request will fail with "unable to find a compatible full version matching version hint "1.23.8" and default OS labels: "os-arch=amd64,os-name=photon,os-type=linux,os-version=3.0".
Workaround: Switch to TKC v1alpha2 or v1alpha3 APIs when provisioning TKCs
Tanzu Kubernetes Grid 2.0 clusters provisioned with the v1beta1 API must be based on the default ClusterClass
If you are creating a Tanzu Kubernetes Grid 2.0 cluster on Supervisor by using the v1beta1 API, the Cluster must be based on the default tanzukubernetescluster
ClusterClass. The system does not reconcile a cluster based on a different ClusterClass.
Workaround: Starting with the vSphere 8 U1 release, you can provision a v1beta1 cluster based on a custom ClusterClass. Refer to the KB article https://kb.vmware.com/s/article/91826.
In an NSX Advanced Load Balancer setup, there is no section usage.ingressCIDRUsage
in clusternetworkinfo
or namespacenetworkinfo
output
In an NSX Advanced Load Balancer setup, ingress IP is allocated by the Avi controller, the usage for ingressCIDR will not be displayed in clusternetworkinfo
or namespacenetworkinfo
output.
Workaround: Get the ingressCIDR
usage from Avi controller UI at Applications > VS VIPs.
Pod CIDR on tier-0 prefix list is not removed after namespace deletion for a routed Supervisor
In Routed Supervisor, pod CIDR in a tier-o prefix list does not get deleted after namespace deletion.
Workaround: Delete the prefix-lists object:
curl -k -u ‘admin:U2.HzP7QZ9Aw’ -X PATCH -d ‘{“prefixes”:[{“network” : “10.246.0.0/16”,“le” : 28,“action” : “PERMIT”}]}’ https://<IP ADDRESS>/policy/api/v1/infra/tier-0s/ContainerT0/prefix-lists/pl_domain-c9:45c1ce8d-d2c1-43cb-904f-c526abd8fffe_deny_t1_subnets -H ‘X-Allow-Overwrite: true’ -H ‘Content-type: application/json
Kubernetes resources clusternetworkinfo
and namespacenetworkinfo
do not contain usage.ingressCIDRUsage
when using NSX Advanced Load Blanacer.
When using NSX Advanced Load Balancer in a NSX based Sueprvisor, the clusternetworkinfo
and namespacenetworkinfo
Kuberentes resources no longer contain the usage.ingressCIDRUsage
fields. This means that running a kubectl get clusternetworkinfo <supervisor-cluster-name> -o json
or kubectl get namespacenetworkinfo <namespace-name> -n <namespace-name> -o json
will not contain the ingressCIDR
usage object in the output.
Workaround: Use the Avi Controller UI page to access the ingressCIDR
usage.
Stale tier-1 segments exist for some namespaces after NSX backup and restore
After an NSX backup and restore procedure, stale tier-1 segments that have Service Engine NICs do not get cleaned up.
When a namespace is deleted after an NSX backup, the restore operation restores stale tier-1 segments that are associated with the NSX Advanced Load Balancer Controller Service Engine NICs.
Workaround: Manually delete the tier-1 segments.
Log in to the NSX Manager.
Select Networking > Segments .
Find the stale segments that are associated with the deleted namespace.
Delete the stale Service Engine NICs from the Ports/Interfaces section.
Load Balancer monitor might stop working, the Supervisor might get stuck in "configuring" state in the vSphere Client
If NSX Advanced Load Balancer is enabled, due to the presence of multiple enforcement points in NSX, NCP might fail to pull the load balancer status. This affects existing Supervisors configured with NSX Advanced Load Balancer once it is enabled on NSX. This issue does affect new Supervisors leveraging NSX Advanced Load Balancer. Supervisors impacted by this issue will still be functional - with the exception of the LB monitor capability.
Workaround: Disable NSX Advanced Load Balancer in NSX. This may limit the ability of deploying Supervisors with NSX Advanced Load Balancer in WCP environments with existing Supervisors running NSX Advanced Load Balancer.
You cannot use NSX Advanced Load Balancer with a vCenter Server using an Embedded Linked Mode topology.
When you configure the NSX Advanced Load Balancer controller, you can configure it on multiple clouds. However, you do not have an option to select multiple clouds while enabling vSphere IaaS control plane as it only supports the Default-Cloud option. As a result, you cannot use the NSX Advanced Load Balancer with a vCenter Server version using an Embedded Linked Mode topology.
Configure NSX Load Balancer for each vCenter Server.
Attempts to expand a persistent volume fail when the file system on the volume is corrupted
You might observe the following symptoms:
The file system on the volume has not been resized after the volume size has been expanded.
Pod remains in the pending state.
Error messages appear when you describe the pod.
Workaround:
The issue might be due to potential problems on the volume file system. For additional details and a workaround, see the Broadcom Knowledge Base article 374828.
Multiple CNS volume sync errors are observed in an environment where a datastore is shared across vCenter systems
Cross-vCenter migration is not supported by CNS. However, CNS periodic synchronization is automatically performed and creates locking contention for volumes on the shared datastore.
Workaround: To avoid this issue, set up a large time interval for the CNS periodic synchronization.
Locate the CNS config file in vCenter:
/usr/lib/vmware-vsan/VsanVcMgmtConfig.xml
Navigate to the following line:
<newSyncInterval>60</newSyncInterval>
By default, the periodic sync is set to 60 seconds.
Change the time to a longer period, for example, 31536000 for 1 year.
The volume allocation type cannot be changed for the disks in a vSAN Direct datastore
Once you decide the volume allocation type for the disks in the vSAN Direct datastore, you cannot change it. This is because the underlying layers do not support the type conversion. However, the volume allocation type change for the new disk is allowed for operations such as clone and relocate.
Workaround: none
Deleted VM causes CNS tasks to get stuck in queued state.
Operations sent to CNS return a task ID, but the task state never changes from queued. The tasks are for volumes attached to a VM that has been just deleted.
Workaround: If application layer can fix the calling order, nothing needs to be done on the CNS side. If not, disable the CNS new serialization by following the steps:
Open /usr/lib/vmware-vsan/VsanVcMgmtConfig.xml
on vCenter.
Change the following configuration to false: <newSerializationEnabled>true</newSerializationEnabled>
Restart vsan-health with vmon-cli -r vsan-health
See the KB 93903 for more details.
PVs remain in terminated state after successfully deleting PVCs
After you delete a PersistentVolumeClaim (PVC), the corresponding PersistentVolume (PV) might remain in a terminated state in Supervisor. Additionally, the vSphere Client might display multiple failed deleteVolume tasks.
Workaround:
Authenticate with the Supervisor:
kubectl vsphere login --server=IP-ADDRESS --vsphere-username USERNAME
Get the name of the persistent volume in terminating state:
kubectl get pv
Note down the volume handle from the persistent volume:
kubectl describe pv <pv-name>
Using the volume handle from previous step, delete the CnsVolumeOperationRequest Custom resource in the Supervisor:
kubectl delete cnsvolumeoperationrequest delete-<volume-handle>
Before deleting a PV, ensure that it is not being used by any other resources in the cluster.