VMware Cloud Director Container Service Extension 4.2.1 Release Notes

VMware Cloud Director Container Service Extension 4.2.1 \| 21 MAR 2024 \| Build 23512589 Check for additions and updates to these release notes.

VMware Cloud Director Container Service Extension 4.2.1 | 21 MAR 2024 | Build 23512589

Check for additions and updates to these release notes.

What's in the Release Notes

The release notes cover the following topics.

What's New in March 2024
Product Support Notice
Upgrade Notices
Compatibility Notices
Product Documentation
Earlier Releases

What's New in March 2024

New version of the getting-started-airgapped container: You can use version v0.1.3 of the getting-started-airgapped container to configure the private registry for air-gapped functionality. For more information, see Create an Airgapped Environment.
New version of the cluster-upgrade-script-airgapped container: You can use version v0.1.3 of the cluster-upgrade-script-airgapped container to update the Kubernetes components in the pre-existing cluster to the recommended versions. For more information, see Upgrade Kubernetes Components in VMware Cloud Director Container Service Extension Clusters.
Additional support for Tanzu Kubernetes Grid and Kubernetes: You can use Tanzu Kubernetes Grid 2.5 with Kubernetes versions 1.26, 1.27, and 1.28.
New VMware Cloud Director IP spaces feature: As a service provider you must add the rights to use the feature to the Kubernetes Cluster Author role and the Kubernetes Clusters Rights Bundle. For more information, see Kubernetes Cluster Author Role and Kubernetes Clusters Rights Bundle. Additionally, you must use Kubernetes Cloud Provider for VMware Cloud Director 1.6 and Kubernetes Cluster API Provider for VMware Cloud Director v1.3.0.
Support for Online and Offline Volume Expansion: With Kubernetes Container Storage Interface Driver for VMware Cloud Director 1.6.0, you can resize disks by performing online and offline volume expansion. To activate this feature, by using the Kubernetes Cluster Author role, set the allowVolumeExpansion field in storageClass to true. For more information, see the Kubernetes documentation.

Product Support Notice

VMware Cloud Director 10.4 will not be supported in VMware Cloud Director Container Service Extension 4.3.
The Auto Repair on Errors toggle is deprecated and will not be supported in VMware Cloud Director Container Service Extension 4.3.
Updated - Tanzu Kubernetes Grid versions 1.6.1, 1.5.4, and 1.4.3 are not supported
Tanzu Kubernetes Grid versions 1.6.1, 1.5.4 and 1.4.3 are no longer supported by VMware in VMware Cloud Director Container Service Extension 4.2. For more information on the end of this support, see Product Lifecycle Matrix.
- New cluster deployments by using unsupported versions fail.
- Existing Tanzu Kubernetes Grid clusters must be upgraded by service providers or tenant users to version 2.1.1, 2.2, 2.3.1 or 2.4 and supported Kubernetes versions.

Upgrade Notices

Kubernetes Container Clusters UI plug-in 4.2.1 is available to use with VMware Cloud Director
Before upgrading your VMware Cloud Director Container Service Extension server, you must first upgrade the Kubernetes Container Clusters UI plug-in. To upgrade the plug-in from version 4.2 to 4.2.1, perform the following tasks. For more information, see Managing Plug-Ins.
1. From the Support Portal, download version 4.2.1 of the Kubernetes Container Clusters UI plug-in.
2. In the VMware Cloud Director Portal, from the top navigation bar, select More>Customize Portal.
3. Select the check box next to Kubernetes Container Clusters UI plug-in 4.2 and click Disable.
4. Click Upload and in the Upload Plugin wizard, upload the Kubernetes Container Clusters UI plug-in 4.2.1 file.
5. To start using the new plug-in, refresh your browser.
VMware Cloud Director Container Service Extension Server 4.2.1 is available
As a service provider, you can upgrade the VMware Cloud Director Container Service Extension Server to version 4.2.1.
1. From the Support Portal, download VMware Cloud Director Container Service Extension Server 4.2.1.
2. In the Kubernetes Container Clusters UI plug-in of VMware Cloud Director, select CSE Management > Server Details > Update Server.
3. Update your VMware Cloud Director Container Service Extension Server.
  - To upgrade from 4.1.1a to 4.2.1, see Minor Version Upgrade.
  - To upgrade from 4.2 to 4.2.1, see Patch Version Upgrade.

Compatibility Notices

New - VMware Cloud Director Container Service Extension 4.2.1 Interoperability Updates with Kubernetes Resources

To view the interoperability of VMware Cloud Director Container Service Extension 4.2.1 and previous versions with VMware Cloud Director, and additional product interoperability, see the Product Interoperability Matrix.

The following table displays the interoperability between VMware Cloud Director Container Service Extension 4.2.1 and Kubernetes resources.

Kubernetes Resources	Supported Versions	Documentation
Kubernetes Cloud Provider for VMware Cloud Director™	1.6.0	Kubernetes Cloud Provider for VMware Cloud Director Documentation
Kubernetes Container Storage Interface Driver for VMware Cloud Director™	1.6.0	https://github.com/vmware/cloud-director-named-disk-csi-driver#container-storage-interface-csi-driver-for-vmware-cloud-director-named-independent-disks
Kubernetes Cluster API Provider for VMware Cloud Director™	v1.3.0	https://github.com/vmware/cluster-api-provider-cloud-director
RDE Projector	0.7.0	Not applicable

As a service provider, you can manually update Kubernetes resources by performing the following tasks.

In VMware Cloud Director UI, from the top navigation bar, select More > Kubernetes Container Clusters.
In Kubernetes Container Clusters UI plug-in 4.2.1, select CSE Management > Server Details > Update Server > Update Configuration > Next.
In the Current CSE Server Components section, update the Kubernetes resources configuration.
Click Submit Changes.

For more information, see Update the VMware Cloud Director Container Service Extension Server documentation.

Updated - After you install or upgrade to VMware Cloud Director Container Service Extension 4.2.1 by using the Kubernetes Container Clusters UI plug-in, components of the VMware Cloud Director Container Service Extension server configuration are updated automatically
The following components versions are used in VMware Cloud Director Container Service Extension 4.2.1.
- kind: v0.20.0
- clusterctl: v1.5.4
- core capi: v1.5.4
- bootstrap provider: v1.5.4
- control plane provider: v1.5.4
- kindest: v1.27.3
- cert manager: v1.13.2
Updated - In VMware Cloud Director Container Service Extension, you must confirm that the Kubernetes components in a cluster have the required versions
When tenant users attempt certain workflows in the Kubernetes Container Clusters UI plug-in, the following message might be displayed: Confirm that the components in this cluster have the required versions. You must verify that the relevant Kubernetes component versions listed on the Cluster Information page of the Kubernetes Container Clusters UI plug-in, match the supported versions in the above table.
- If the component versions match, ignore the message.
- If the component versions do not match, follow the instructions in Upgrade Kubernetes Components in VMware Cloud Director Container Service Extension Clusters.
Note:
For clusters that were created by using older versions of VMware Cloud Director Container Service Extension, perform a one time script upgrade action. This allows the clusters to be compatible with the latest VMware Cloud Director Container Service Extension.

Product Documentation

To access the full set of product documentation, see VMware Cloud Director Container Service Extension.

Earlier Releases

Resolved Issues in 4.2.1

New - After creating a cluster with Node Health Check activated or activating Node Health Check for an existing cluster, the cluster cannot be managed

The Kubernetes Container Clusters UI plug-in 4.2 creates the MachineHealthCheck capi yaml object with an invalid apiVersion value. Clusters that contain invalid MachineHealthCheck section cannot be managed. This issue is fixed in the VMware Cloud Director Container Service Extension 4.2.1 release. You can create new clusters with Node Health Check activated and for existing clusters affected by this issue, to fix the value of the MachineHealthCheck capi yaml object, you can toggle Node Health Check on or off.
New - Error value is displayed on the cluster list page and on the Cluster Information page

When you select a cluster in the cluster list datagrid, you might see and Error in the Upgrade column. Additionally, on the Cluster Information page, the Upgrade Availability property displays Error. This issue is fixed in the VMware Cloud Director Container Service Extension 4.2.1 release. vApp templates that are invalid or corrupted are not acknowledged, and the upgrade availability value does not error when encountering such vApp templates.
New - Kubernetes templates datagrid fails to load with an error message

If the Kubernetes templates datagrid fails to load, the following error message is displayed: Error: Failed to fetch Kubernetes Templates. This issue is fixed in the VMware Cloud Director Container Service Extension 4.2.1 release. vApp templates that are invalid or corrupted are not acknowledged, and the Kubernetes templates datagrid does not error when encountering such vapp templates.

Resolved Issues in 4.2

For API users, the cluster list page displays an error message if the node pool names in the cluster's capi yaml are changed after cluster creation

The cluster list datagrid fails to load, and displays the following error message: Error: Failed to fetch Kubernetes clusters. This issue is fixed in the VMware Cloud Director Container Service Extension 4.2 release.
For API users, the cluster list page displays an error message if the control plane node pool name does not end in -control-plane-node-pool

The cluster list datagrid fails to load and displays the following error message: Error: Failed to fetch Kubernetes clusters. This error means that API users must name their control plane node pool in a format that ends with -control-plane-node-pool. This issue is fixed in the VMware Cloud Director Container Service Extension 4.2 release.
In the Kubernetes Container Clusters UI plug-in, new worker node pools are created using parameters that were not specified in the Create New Worker Node Pools menu
Unspecified values in the Create New Worker Node Pools menu may result in the node pool using the parameters of other node pools. This issue affects the following parameters:
- vGPU Activated
- Sizing Policy
- Placement Policy or vGPU Policy (if GPU toggle is activated)
- Storage Profile
- Disk Size
This issue is fixed in the VMware Cloud Director Container Service Extension 4.2 release.
In the Kubernetes Container Clusters UI plug-in, the Kubernetes Version drop-down menu in the cluster creation wizard for a TKGs cluster displays a spinner indefinitely

On the Kubernetes Policy page in the cluster creation wizard, the Kubernetes Version drop-down selection displays a spinner indefinitely. When the supported Kubernetes version API sends an invalid response to the Kubernetes Container Clusters UI plug-in, the plug-in fails to parse the response. This issue is fixed in the VMware Cloud Director Container Service Extension 4.2 release.

Resolved Issues in 4.1.1a

The CSE Management workflow in a multi-site VMware Cloud Director setup only allows for a single server configuration entity

In Kubernetes Container Clusters UI plug-in, the CSE Management workflow in a multi-site VMware Cloud Director setup may display a server configuration entity that belongs to a different site. This results in the CSE Management workflows failing in multi-site environments. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release. The CSE Management workflow in Kubernetes Container Clusters UI plug-in now only fetches the server configuration entity that belongs to the site where the user is currently logged-in. This fix allows each site in a multi-site environment to create, and manage its own server configuration entity.
In VMware Cloud Director Service Provider Portal, if a service provider navigates into a specific cluster and returns to the landing page, the cluster list does not display all clusters

Service providers can view a full list of clusters that are in an environment in VMware Cloud Director Service Provider portal. From this view, if a service provider clicks in to a cluster to view details of that cluster, navigates to the Persistent Volumes tab, and clicks back to return to the listing of all clusters, then the original list of all the clusters is not visible. Some clusters do not display on the list. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release.
The Kubernetes Container Clusters UI plug-in does not display the Container Registry setting in Server Details page

In Kubernetes Container Clusters UI plug-in, in the CSE Management tab, the Server Details tab does not display the Container Registry setting. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release.
The Projector version is not updated in RDE after Projector deployment is upgraded in the cluster

The version of the Projector component is not updated in the cluster RDE's projector status section, even though the version in the Projector deployment changes. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release.
vApp creation failure events does not display error message

During cluster creation, if the create vApp operation fails then the Event Details tab for that cluster in the Kubernetes Container Clusters UI plug-in does not show full error message. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release. The error message is included in the Detailed Error section that aids in troubleshooting.
Tanzu Standard Repository is not installed with clusters that are created in VMware Cloud Director Container Service Extension 4.1 with Tanzu Kubernetes Grid 2.1.1 and 2.2

For clusters that were created using VMware Cloud Director Container Service Extension 4.1, it is necessary to install the repository and packages. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release.
When a cluster is created by using Kubernetes Container Clusters UI plug-in 4.0, upgrading the cluster by using the Kubernetes Container Clusters UI plug-in 4.1 might fail

When you use Kubernetes Container Clusters UI plug-in 4.1 to upgrade a cluster that was created in Kubernetes Container Clusters UI plug-in 4.0, even though the cluster-upgrade-script executes successfully without any errors, the following error message is displayed in the Events tab.

Error: PatchObjectError error message: [VCDCluster.infrastructure.cluster.x-k8s.io "<cluster-name>" is invalid: spec.loadBalancerConfigSpec: Invalid value: "null": spec.loadBalancerConfigSpec in body must be of type object: "null"] during patching objects

This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release.
If the Auto Repair on Errors feature is activate on a provisioned cluster, the cluster might enter an Error state

When Auto Repair on Errors is activated on a cluster that enters Error state, the cluster might be deleted and recreated, which causes disruption of workloads on that cluster. When you create clusters, it is recommended to deactivate the Auto Repair on Errors toggle. For more information on the Auto Repair on Errors in the cluster creation workflow, see Create a VMware Tanzu Kubernetes Grid Cluster. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release. VMware Cloud Director Container Service Extension Server deactivates the Auto Repair on Error toggle after a cluster is created successfully.
Updated - When new organizations are added to VMware Cloud Director, the VMware Cloud Director Container Service Extension server may fail to provide access to VMware Cloud Director Container Service Extension configuration to the new organizations

When this issue occurs, the following error message appears in the VMware Cloud Director Container Service Extension log file.

"msg":"error occurred while onboarding new tenants with ReadOnly ACLs for VCDKEConfig: [unable to get all orgs: [error occurred retrieving list of organizations: [error getting list of organizations: 401 Unauthorized]]]"

Tenant users may also see the following warning message and be blocked when they try to create a cluster using VMware Cloud Director Container Service Extension UI.

Cannot fetch provider configuration. Please contact your administrator

This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release.
For each cluster, repeated messages of Invoked getFullEntity (urn:vcloud:entity:vmware:capvcdCluster:{ID}) display in the Recent Tasks pane of VMware Cloud Director UI

This issue is happening because VMware Cloud Director Container Service Extension is retrieving RDE for all the clusters instead of retrieving RDE for unprocessed clusters. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1.1a release.
GPU operator installation fails on the vGPU node in the cluster
Kubernetes Cluster API Provider for VMware Cloud Director previously installed nvidia-container-runtime package and edited containerd runtime to point to nvidia-container-runtime during the provisioning of a vGPU node in the VMware Cloud Director Container Service Extension Kubernetes cluster. This customization interfered with NVIDIA GPU Operator installation. This issue is fixed in the Kubernetes Cluster API Provider for VMware Cloud Director v1.1.1 release, which is compatible with VMware Cloud Director Container Service Extension 4.1.1a and later. With Kubernetes Cluster API Provider for VMware Cloud Director v1.1.1, the solution removes the customization in favor of GPU Operator installation. For more information, see the Kubernetes Cluster API Provider for VMware Cloud Director v1.1.1 release notes.

Alternatively, if you manually install drivers on the vGPU node, perform the following customization tasks.
1. Install the nvidia-container-runtime package.
2. Update the containerd configuration to point to the NVIDIA runtime.

Resolved Issues in 4.1

Clusters created using Kubernetes Cluster API Provider for VMware Cloud Director (CAPVCD) management cluster, without involvement of VMware Cloud Director Container Service Extension server, displays a Pending status in Kubernetes Container Clusters UI

This issue is fixed in the VMware Cloud Director Container Service Extension 4.1 release. The status of these clusters is now Non-CSE. The only permitted operation for these non-VMware Cloud Director Container Service Extension clusters is to download the kube config of the cluster.
The Kubernetes Container Clusters UI plug-in storage profile selection form fields do not filter storage policies by entityType
The storage profile selection form fields display all storage profiles visible to the logged-in user, such as VMs, vApps, Catalog items, or named disks. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1 release. In the Kubernetes Container Clusters UI plug-in in VMware Cloud Director Container Service Extension 4.1, the storage policy selection only shows storage policies that support any of the following entityTypes:
- vApp and VM templates
- Virtual machines
Kubernetes cluster resize operation fails in VMware Cloud Director Container Service Extension 4.0.x

If users attempt to change organization VDC names in VMware Cloud Director after clusters are created, further cluster operations such as cluster resize can fail. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1 release.
When a node of the cluster is deleted due to failure in vSphere or other underlying infrastructure, VMware Cloud Director Container Service Extension does not inform the user and it does not auto-heal the cluster

When the node of a cluster is deleted, basic cluster operations, such as cluster resize and cluster upgrade, continue to work. The deleted node remains in deleted state, and is included in computations regarding size of the cluster. The Node Health Check Configuration feature addresses this occurrence and this issue is fixed in the VMware Cloud Director Container Service Extension 4.1 release.
The cluster creation for multi-control plane or multi-worker node goes into an error state. The Events tab in the cluster details page shows an EphemeralVMError event due to the failure to delete ephemeralVM in VMware Cloud Director.

The same error events can appear repeatedly if the Auto Repair on Errors setting is activated on the cluster. If the Auto Repair on Errors setting is off, sometimes the cluster can show an error state due to the failure to delete the ephemeralVM in VMware Cloud Director even though the control plane and worker nodes are created successfully. This issue is visible in any release and patch release after but not including VMware Cloud Director 10.3.3.3, and any release and patch release starting with VMware Cloud Director 10.4.1. This issue is fixed in the VMware Cloud Director Container Service Extension 4.1 release.
When a force delete attempt of a cluster fails, the ForceDeleteError that displays in the Events tab of the cluster info page does not provide sufficient information regarding the failure to delete the cluster

This issue is fixed in the VMware Cloud Director Container Service Extension 4.1 release.

Known Issues

New - When using VMware Cloud Director Container Service Extension to create a cluster, the operation might fail

If you are using 172.17.0.0/16 and 172.18.0.0/16 CIDR ranges or IP addresses from these ranges in your external network pool, the creation of a cluster might fail after the first control plane VM is created and you might observe guestinfo.cloudinit.target.cluster.get.kubeconfig.status phase failures in the Events tab of the Kubernetes Container Clusters UI plug-in.

The ephemeral VMs that leverage Docker use the same CIDR ranges during the creation of the bootstrap cluster. As a result of the IP conflict, communication between the components of the bootstrap cluster and the control plane VM is affected, which causes the cluster creation to fail.
Workaround: Ensure that you are not using 172.17.0.0/16 and 172.18.0.0/16 CIDR ranges or IP addresses from these ranges in the following network assets.
- Organization VDC network ranges where your TKG clusters are deployed.
- External IP allocations and ranges that are used by the Orgаnization Edge Gateway and the associated Load Balancer.
- Infrastructure networks where your DNS servers are connected.
- The IP address, which the VMware Cloud Director public API endpoint URL resolves to.
New - When you resize a disk volume by using online expansion in the Kubernetes Container Storage Interface Driver for VMware Cloud Director solution, the operation might fail

When attempting an online expansion of a volume on a named disk that is fast and thin provisioned, attached to a VM, and the name of the storage profile differs between the StorageClass and the VM, the csi-resizer container in the csi-vcd-controllerplugin pod might display the following error message. This is a known issue in VMware Cloud Director version 10.5.1.1 and earlier.

API Error: 400: [ ddedf59a-8efe-418f-9417-b4ce6aad2883 ] Cannot use multiple storage profiles in a fast-provisioned VDC "tenant_org_name" for VM "cluster-worker-node-pool-name".]

Workaround: Before performing online volume expansion, verify that the storage profile name for the VM are the same as specified in StorageClass.
New - After a cluster upgrade, the Kubernetes Container Storage Interface Driver for VMware Cloud Director solution does not run as expected

After running version v0.1.3 of the cluster-upgrade-script-airgapped container to upgrade a cluster, the images of the Kubernetes Container Storage Interface Driver for VMware Cloud Director solution are updated, but 1 of the 2 nodeplugin pods is in error state, such as CrashLoopBackoff or Error.

Workaround: To recreate the nodeplugin pods, replace and update the DaemonSet by running the following command.

kubectl replace --force -f "https://raw.githubusercontent.com/vmware/cloud-director-named-disk-csi-driver/1.6.0/manifests/csi-node.yaml"
New - In VMware Cloud Director Container Service Extension 4.2.1, if you force delete clusters that are configured to use IP Spaces, the IP allocated to the cluster and/or Kubernetes services running on the cluster are not released automatically, and manual intervention is necessary.

Workaround: None
The Create a Tanzu Kubernetes Grid Cluster and Create New Worker Node Pools workflows might fail

When both sizing policy and vGPU policy are specified and the vGPU policy already contains sizing information, the workflows cannot be completed successfully.

Workaround: If you select a vGPU policy that already contains sizing information during Create a Tanzu Kubernetes Grid Cluster workflow, or Create New Worker Node Pools workflow, do not also select a sizing policy.
VMware Cloud Director services fail continuously after startup

When a resolve operation is invoked on an RDE that has a lot of tasks associated with it, VMware Cloud Director crashes with the java.lang.OutOfMemoryError: Java heap space error message. The issue is present on VMware Cloud Director 10.4 and above. For more information, see VMware Knowledge Base Article 95464.

Workaround: None
Registry URL changes in VMware Cloud Director Container Service Extension configuration are not supported

Workaround: Use load balancers to front registry virtual machines to swap the virtual machines out if necessary.
If you use VMware Cloud Director 10.4.2.2, the cluster deletion workflow in Kubernetes Container Clusters UI plug-in might fail

The cluster deletion operation fails with the following error message.

"error": "failed to delete VCD Resource [clusterName] of type [VApp] from VCDResourceSet of RDE [urn:vcloud:entity:vmware:capvcdCluster:<uuid>]: [failed to update capvcd status for RDE [urn:vcloud:entity:vmware:capvcdCluster:<uuid>]; expected http response [200], obtained [400]: resp: [\"{\\\"minorErrorCode\\\":\\\"BAD_REQUEST\\\",\\\"message\\\":\\\"[ a8e89bd2-195d-458b-808d-3ff81e074fa0 ] RDE_CANNOT_VALIDATE_AGAINST_SCHEMA [ #/status/capvcd/vcdResourceSet/2: expected type: JSONObject, found: Null\\\\n ]\\\",\\\"stackTrace\\\":null}\"]: [400 Bad Request]]"

This is a known issue in VMware Cloud Director 10.4.2.2. For more information, see VMware Cloud Director 10.4.2.2 Known Issues.

Workaround: Delete the cluster by using the Force Delete workflow.
In Kubernetes Container Clusters UI plug-in, the CSE Management upgrade workflows might add or remove rights from the CSE Admin Role or Kubernetes Cluster Author

If required rights are missing, users might face errors during cluster workflows.

Workaround: Manually update the Custom roles that are cloned from CSE Admin Role or Kubernetes Cluster Author role.
VMware Cloud Director Container Service Extension does not automatically install a Tanzu-standard repository in the Tanzu Kubernetes Grid 2.1.1 and 2.2 clusters
Workaround: Perform one of the following tasks.
- When using VMware Cloud Director Container Service Extension 4.1, manually install the repository and packages. For more information, see the VMware Tanzu Kubernetes Grid 2.1 and VMware Tanzu Kubernetes Grid 2.2 documentation.
- Upgrade to VMware Cloud Director Container Service Extension 4.1.1a. In this version, clusters created with Tanzu Kubernetes Grid 2.1.1 and 2.2 automatically have the Tanzu-standard repository installed.
Updated - When tenants attempt certain actions with VMware Cloud Director Container Service Extension, the following error messages might be displayed
Warnings:
- Cannot fetch provider configuration. Please contact your administrator.
  
  Tenant users may see this warning, and be blocked when they try to create a cluster.
- Node Health Check settings have not been configured by your provider.
  
  Tenant users may see this warning when they try to activate Node Health Check during cluster creation or in the cluster settings.
These warnings can occur for the following reasons:
- The VMware Cloud Director Container Service Extension server has not finished starting up.
- The VMware Cloud Director Container Service Extension server has not yet published the server configuration to tenant organizations. The server configuration is published automatically every hour from the server startup as the server is running. Therefore, publishing to new tenant organizations that are created during hourly window occurs at the end of the hour.
- The tenant user's role does not have the following right: View: VMWARE:VCDKECONFIG. This right was added to the Kubernetes Cluster Author global role in VMware Cloud Director Container Service Extension 4.1.
- There was an unexpected error while fetching the server configuration.
Workaround: Perform the following tasks.
1. Ensure that the VMware Cloud Director Container Service Extension server is operating successfully.
2. Ensure the tenant user's role has the right View: VMWARE:VCDKECONFIG. Tenant users must log out of VMware Cloud Director, and log back in to activate any changes made to their role.
3. Wait for hourly publishing to new organizations.
In some instances, nodes cannot join clusters even when the cluster is in an available state

This issue can occur intermittently and the following error message appears in the Events tab of the cluster info page in Kubernetes Container Clusters UI.

VcdMachineScriptExecutionError with the following details: script failed with status [x] and reason [Date Time 1 /root/node.sh: exit [x]]

Workaround: For VMware Cloud Director Container Service Extension 4.1, there is a retry mechanism added that uses a retry feature from Cluster API which reduces the occurrence of this issue.
VMware Cloud Director Container Service Extension 4.1 does not support Dynamic Host Configuration Protocol (DHCP)

The cluster creation workflow in VMware Cloud Director Container Service Extension 4.1 fails if the cluster is connected to a routed organization VDC network that uses DHCP instead of static IP pool to distribute IPs to virtual machines.
Workaround: VMware Cloud Director Container Service Extension 4.1 only supports organization VDC networks in the following scenarios.
- If the VDC is routed.
- If the VDC uses static IP pool to distribute IPs to virtual machines that are connected to it.
It is not possible to activate GPU support in an air-gapped cluster

As VMware cannot redistribute nVidia packages, it is not possible to activate GPU support in an air-gapped cluster out of box. The failure occurs when the cluster attempts to download the nVidia binary from nvidia.github.io in the cloud initialization script.

Workaround: As a service provider, you can potentially consider allowing the cluster access to nvidia.github.io by using a proxy server.
Audit_trail table grows rapidly in the VMware Cloud Director database due to RDE modify events being too large

RDE modify events log the whole body of the RDE that has changed. These large events cause the audit_trail table to grow longer than necessary.
Workaround: Upgrade to VMware Cloud Director 10.3.3.4 or later and perform one of the following tasks.
- If you are using VMware Cloud Director 10.3.3.4, set the audit.rde.diffOnly configuration property to True.
- If you are using VMware Cloud Director 10.4.0 or later, no changes in the configuration properties are required.
VMware Cloud Director Container Service Extension 4.1 uses Kubernetes Cluster API Provider for VMware Cloud Director 1.1 and Kubernetes Cloud Provider for VMware Cloud Director 1.4 by default

Kubernetes Cluster API Provider for VMware Cloud Director 1.1 and Kubernetes Cloud Provider for VMware Cloud Director 1.4 do not support IP spaces.

Workaround: None
Tanzu Addons-Manager does not appear after upgrading to Tanzu Kubernetes Grid 2.2.0 with Kubernetes v1.24+

After you upgrade a VMware Cloud Director Container Service Extension 4.0.3 cluster from Tanzu Kubernetes Grid 1.6.1 with Kubernetes v1.23.x to Tanzu Kubernetes Grid 2.2 with Kubernetes v1.24.x, tanzu-addon-controller-manager pod is stuck at PENDING or CrashLoopBackOff state for the following reason:

Error from server (NotFound): packageinstalls.packaging.carvel.dev "addons-manager.tanzu.vmware.com" not found
Workaround: Manually delete the tanzu-addons-controller-manager deployment and PackageInstall object.
1. Delete the deployment by running the following commands.
```
kubectl get deployments -A kubectl delete deployment -n tkg-system   tanzu-addons-controller-manager
```
2. Delete the PackageInstall object by running the following commands.
```
kubectl get packageinstall -A kubectl delete packageinstall -n tkg-system   tanzu-addons-manager
```
When a cluster creation process finished, the API request to delete the ephemeral VM might fail

An ephemeral VM is created during the cluster creation process and is deleted by VMware Cloud Director Container Service Extension, when the process is complete. VMware Cloud Director Container Service Extension re-attempts to delete the ephemeral VM for up to 15 minutes. If VMware Cloud Director Container Service Extension fails to delete the ephemeral VM after reattempting, the ephemeral VM remains in the cluster's vApp. In the Events tab of the cluster info page in the Kubernetes Container Clusters UI plug-in, the EphemeralVMError error message appears with the following details.

error deleting Ephemeral VM [EPHEMERAL-TEMP-VM] in vApp [cluster-vapp-name]: [reason for failure]. The Epemeral VM needs to be cleaned up manually.

The reason for failure depends on the stage at which the ephemeral VM deletion failed.
Workaround: In the VMware Cloud Director UI, delete the ephemeral VM from the cluster's vApp.
1. Log in to the VMware Cloud Director Tenant Portal, and from VMware Cloud Director navigation menu, select Data Centers.
2. In the Virtual Data Center page, select the organization tile, and from the left navigation menu, select vApps.
3. In the vApps page, select the vApp of the cluster.
4. In the cluster information page, click the ellipse to the left of the Ephermal VM, and click Delete.
If the ephemeral VM is not manually cleaned up when a delete request is issued, the cluster delete operation fails. It is then necessary to force delete the cluster.
1. Log in to VMware Cloud Director, and from the top navigation bar, select More > Kubernetes Container Clusters.
2. Select a cluster, and in the cluster information page, click Delete.
3. In the Delete Cluster page, select the Force Delete checkbox, and click Delete.
When using a direct organization VDC network with NSX in VMware Cloud Director, creating clusters in VMware Cloud Director Container Service Extension 4.1 is not possible

VMware Cloud Director Container Service Extension 4.1 clusters do not support this configuration.

Workaround: None
In VMware Cloud Director Container Service Extension, the creation of Tanzu Kubernetes Grid clusters can fail due to a script execution error

In the Events tab of the cluster info page in Kubernetes Container Clusters UI plug-in, the ScriptExecutionTimeout error message appears with the following details.

error while bootstrapping the machine [cluster-name/EPHEMERAL_TEMP_VM]; timeout for post customization phase [phase name of script execution]
Workaround: To re-attempt cluster creation, activate Auto Repair on Errors from cluster settings.
1. Log in to VMware Cloud Director, and from the top navigation bar, select More > Kubernetes Container Clusters.
2. Select a cluster, and in the cluster information page, click Settings, and activate the Auto Repair on Errors toggle.
3. Click Save.
Note:
If you are troubleshooting issues related to cluster creation, deactivate the Auto Repair on Errors toggle.
In Kubernetes Container Clusters UI plug-in, when the cluster status is Error, the cluster delete operation might fail
Workaround: To delete a cluster in Error status, you must force delete the cluster.
1. Log in to VMware Cloud Director and from the top navigation bar, select More > Kubernetes Container Clusters.
2. Select a cluster and in the cluster information page, click Delete.
3. In the Delete Cluster page, select the Force Delete checkbox and click Delete.
Cluster creation fails with an error message
The ERROR: failed to create cluster: failed to pull image error message is displayed in the following scenarios.
- When a user attempts to create a Tanzu Kubernetes Grid Cluster using VMware Cloud Director Container Service Extension 4.1, and it fails intermittently.
- An image pull error due to a HTTP 408 response is reported.
This issue can occur if there is difficulty reaching the Internet from the EPHEMERAL_TEMP_VM to pull the required images.

Potential causes:
- Slow or intermittent Internet connectivity.
- The network IP Pool cannot resolve DNS (docker pull error).
- The network MTU behind a firewall must set lower.
Workaround: Ensure that there are no networking connectivity issues stopping the EPHEMERAL_TEMP_VM from reaching the Internet. For more information, refer to https://kb.vmware.com/s/article/90326.
Users may encounter authorization errors when executing cluster operations in Kubernetes Container Clusters UI plug-in if a Legacy Rights Bundle exists for their organization.
Workaround: Perform the following tasks.
1. After you upgrade VMware Cloud Director from version 9.1 or earlier, the system may create a Legacy Rights Bundle for each organization. This Legacy Rights Bundle includes the rights that are available in the associated organization at the time of the upgrade and is published only to this organization. To begin using the rights bundles model for an existing organization, you must delete the corresponding Legacy Rights Bundle. For more information, see Managing Rights and Roles.
2. In the Administration tab in the service provider portal, you can delete Legacy Rights Bundles. For more information, see Delete a Rights Bundle. Kubernetes Container Clusters UI plug-in CSE Management has a server setup process that automatically creates, and publishes Kubernetes Clusters Rights Bundle to all tenants. The rights bundle contains all rights that are involved in Kubernetes cluster management in VMware Cloud Director Container Service Extension 4.0.
After selecting the purpose of policy modification, the policies selection in VMware Cloud Director Container Service Extension 4 plug-in does not populate the full list

When a user selects a sizing policy in the Kubernetes Container Clusters UI plug-in and they want to change it, the drop-down menu only displays the selected sizing policy, and does not automatically load alternative sizing policies. The user has to delete the text manually to allow the alternative sizing policies to appear. This also occurs in the drop-down menu when the user selects of placement policies and storage policies.

Workaround: None. This is intentional and typical behavior of the combobox html web component in Clarity, the web framework that VMware Cloud Director UI is built on. The drop-down box uses the input text as a filter. When the input field is empty, you can see all selections, and the selections filter as you type.
When you create a VMware Cloud Director Container Service Extension cluster, a character capitalization error appears

In the Kubernetes Container Clusters UI, if you use capital letters, the following error message appears.

Name must start with a letter, end with an alphanumeric, and only contain alphanumeric or hyphen (-) characters. (Max 63 characters)

Workaround: None. This is a restriction set by Kubernetes, where object names are validated under RFC 1035 labels. For more information, see the Kubernetes documentation.
Kubernetes Container Clusters UI plug-in 4.1 does not interoperate with other versions of the Kubernetes Container Clusters UI plug-in, such as 4.0 or 3.5.0

The ability to operate these two plug-ins simultaneously without conflict is a known limitation of the VMware Cloud Director UI. You can only have one plug-in activated at any given time.

Workaround: None.
VMware Cloud Director Container Service Extension fails to deploy clusters with TKG templates that have an unmodifiable placement policy set on them
Workaround: Perform the following tasks.
1. Log in to the VMware Cloud Director Tenant Portal as an administrator.
2. Click Libraries > vApp Templates.
3. In the vApp Templates window, select the radio button to the left of the template.
4. In the top ribbon, click Tag with Compute Policies.
5. Select the Modifiable check boxes, and click Tag.
In VMware Cloud Director 10.4, service providers cannot log in to the virtual machine of VMware Cloud Director Container Service Extension
In VMware Cloud Director 10.4, after deploying the VMware Cloud Director Container Service Extension virtual machine from OVA file, the following two check boxes in the VM settings page are not selected by default.
- Allow local administrator password
- Auto-generate password
Workaround: To allow service providers to log-in to the virtual machine of VMware Cloud Director Container Service Extension and perform troubleshooting tasks, select the Allow local administrator password and Auto-generate password check boxes.
1. Log in to VMware Cloud Director UI as a service provider and create a vApp from the VMware Cloud Director Container Service Extension OVA file.
  
  For more information, see Create a vApp from VMware Cloud Director Container Service Extension server OVA file.
2. After you deploy the vApp, and before you power it on, browse to VM details > Guest OS Customization and select Allow local administrator password and Auto-generate password.
3. After the update task finishes, power on the vApp.
Fast provisioned disks in Organization VCD cannot be resized
Workaround: To resize disks, deactivate fast provisioning in Organization VDC.
1. Log in to VMware Cloud Director UI as a provider, and select Resources.
2. In the Cloud Resources tab, select Organization VDCs, and select an organization VDC.
3. In the organization VDC window, under Policies, select Storage.
4. Click Edit, and deactivate the Fast provisioning toggle.
5. Click Save.
After you log in as a service provider and upload the latest Kubernetes Container Clusters UI plug-in, the CSE Management tab is not displayed

If there are multiple activated Kubernetes Container Clusters UI plug-ins with the same name or id but different versions, the lowest version of the plug-in is used. Only the highest version of the Kubernetes Container Clusters UI plug-in must be active. For more information on managing plug-ins, see Managing Plug-Ins.
Workaround: Deactivate the previous Kubernetes Container Clusters UI plug-ins.
1. Log in to VMware Cloud Director UI as a provider, and select More > Customize Portal.
2. Select the check box next to the names of the target plug-ins, and click Enable or Disable.
3. To start using the newly activated plug-in, refresh the Internet browser page.
Resize or upgrade a Tanzu Kubernetes Grid cluster by using kubectl
After a cluster is created in the Kubernetes Container Clusters UI plug-in, you can resize, upgrade, lifecycle manage the cluster, or manage workloads, by using kubectl instead of the Kubernetes Container Clusters UI plug-in.
1. To delete the RDE-Projector operator from the cluster, run kubectl delete deployment -n rdeprojector-system rdeprojector-controller-manager.
2. Detach the Tanzu Kubernetes Grid cluster from Kubernetes Container Clusters UI plug-in.
  1. In the VMware Cloud Director UI, in the Cluster Overview page, retrieve the cluster ID of the cluster.
  2. Update the RDE and set the entity.spec.vcdKe.isVCDKECluster value to false.
    
    To get the payload of the cluster, run GET https://<vcd>/cloudapi/1.0.0/entities/<Cluster ID>.
    
    Copy and update the json path in the payload.
    
    Set the entity.spec.vcdKe.isVCDKECluster value to false.
    
    Run PUT https://<vcd>/cloudapi/1.0.0/entities/<Cluster ID> with the modified payload.
    
    It is necessary to include the entire payload as the body of PUT operation.
Note:
After performing the tasks above, the cluster is detached from VMware Cloud Director Container Service Extension 4.1 and you cannot manage the cluster through VMware Cloud Director Container Service Extension 4.1. You must use kubectl to manage, resize or upgrade the cluster by directly applying the cluster API specification, CAPI yaml.