VMware Cloud Director Container Service Extension 4.2 | January 18 2024 | Build: 23133984 Check for additions and updates to these release notes. |
VMware Cloud Director Container Service Extension 4.2 | January 18 2024 | Build: 23133984 Check for additions and updates to these release notes. |
A new version v0.1.2 of the getting-started-airgapped
container is available to configure the private registry for airgapped functionality in VMware Cloud Director Container Service Extension 4.2. For more information, see Set Up a Local Container Registry in an Airgapped Environment.
A new version v0.1.2 of the cluster-upgrade-script-airgapped
container is available. You can use this to update the Kubernetes components in the pre-existing cluster to recommended versions. For more information, see Upgrade Kubernetes Components in VMware Cloud Director Container Service Extension Clusters.
Compatibility with VMware Cloud Director Extension for VMware Tanzu Mission Control 1.0. For more information, see VMware Cloud Director Extension for VMware Tanzu Mission Control.
Additional Tanzu Kubernetes Grid and Kubernetes support: It is now possible to use the following Tanzu Kubernetes Grid and Kubernetes versions with VMware Cloud Director Container Service Extension 4.2:
Tanzu Kubernetes Grid 2.3.1 with Kubernetes 1.24, 1.25, and 1.26.
Tanzu Kubernetes Grid 2.4 with Kubernetes 1.25, 1.26, and 1.27.
The Auto Repair on Errors toggle is deprecated, and will be unsupported starting with the next major VMware Cloud Director Container Service Extension release.
The VMware Cloud Director IP spaces feature is available to use with VMware Cloud Director Container Service Extension 4.2. It is the responsibility of the service provider to add the rights to use this new feature to Kubernetes Cluster Author role and Kubernetes Clusters Rights Bundle. For more information see Kubernetes Cluster Author Role and Kubernetes Clusters Rights Bundle. To avail of this feature successfully, it is necessary to use Kubernetes Cloud Provider for VMware Cloud Director 1.6, and Kubernetes Cluster API Provider for VMware Cloud Director v1.3.0.
Online and Offline Volume Expansion is now supported on disks provisioned by Kubernetes Container Storage Interface Driver for VMware Cloud Director 1.6.0. The feature needs to be activated manually by the Kubernetes Cluster Author. To activate this feature, the Kubernetes Cluster Author must update the storageClass
with the allowVolumeExpansion
flag activated as described here.
To access the full set of product documentation, go to VMware Cloud Director Container Service Extension.
Kubernetes Container Clusters UI Plug-in 4.2 for VMware Cloud Director
A new version of Kubernetes Container Clusters UI plug-in is now available to use with VMware Cloud Director.
It is necessary to upgrade the Kubernetes Container Clusters UI plug-in before you upgrade the VMware Cloud Director Container Service Extension server.
The following steps outline how to upgrade the Kubernetes Container Clusters UI plug-in from 4.1.0 or 4.1.1 to 4.2:
Download the Kubernetes Container Clusters UI plug-in 4.2 from the VMware Cloud Director Container Service Extension Downloads page.
In the VMware Cloud Director Portal, from the top navigation bar, select More > Customize Portal.
Select the check box next to Kubernetes Container Clusters UI plug-in 4.1.0 or 4.1.1, and click Disable.
Click Upload > Select plugin file, and upload the Kubernetes Container Clusters UI plug-in 4.2 file.
Refresh the browser to start using the new plug-in.
For more information, refer to Managing Plug-Ins.
VMware Cloud Director Container Service Extension Server 4.2.
Service providers can now upgrade the VMware Cloud Director Container Service Extension Server to 4.2 through CSE Management > Server Details > Update Server in Kubernetes Container Clusters UI plug-in of VMware Cloud Director using the Minor Version Upgrade workflow.
For instructions on how to upgrade the VMware Cloud Director Container Service Extension Server from 4.1.0 or 4.1.1a to 4.2, see Minor Version Upgrade.
You can download VMware Cloud Director Container Service Extension Server 4.2 from the VMware Cloud Director Container Service Extension Downloads page.
VMware Cloud Director Container Service Extension 4.2 Interoperability Updates with Kubernetes Resources
To view the interoperability of VMware Cloud Director Container Service Extension 4.2 and previous versions with VMware Cloud Director, and additional product interoperability, refer to the Product Interoperability Matrix.
The following table displays the interoperability between VMware Cloud Director Container Service Extension 4.2, and Kubernetes resources.
Kubernetes Resources |
Supported Versions |
Documentation |
---|---|---|
Kubernetes Cloud Provider for VMware Cloud Director™ |
1.5.0 |
Kubernetes Cloud Provider for VMware Cloud Director Documentation |
Kubernetes Container Storage Interface Driver for VMware Cloud Director™ |
1.5.0 |
|
Kubernetes Cluster API Provider for VMware Cloud Director™ |
v1.2.0 |
https://github.com/vmware/cluster-api-provider-cloud-director |
RDE Projector |
0.7.0 |
Not applicable |
Service providers can manually update Kubernetes resources through the following workflow:
In VMware Cloud Director UI, from the top navigation bar, select More > Kubernetes Container Clusters.
In Kubernetes Container Clusters UI plug-in 4.2, select CSE Management > Server Details > Update Server > Update Configuration > Next.
In the Current CSE Server Components section, update the Kubernetes resources configuration.
Click Submit Changes.
For more information, see Update the VMware Cloud Director Container Service Extension Server.
Updated - In VMware Cloud Director Container Service Extension, it is necessary to confirm that the Kubernetes components in a cluster have the required versions.
For clusters that were created using older versions of VMware Cloud Director Container Service Extension, it is necessary to perform a one time script upgrade action. This allows the clusters to be compatible with the latest VMware Cloud Director Container Service Extension.
The Kubernetes Container Clusters UI warns users about this requirement with the following warning when tenant users attempt certain workflows in the UI:
Confirm that the components in this cluster have the required versions.
You can ignore this UI warning if the relevant Kubernetes component versions in the Cluster Information page in Kubernetes Container Clusters UI match the supported versions displayed in the above table. If this warning appears, and the current versions of Kubernetes components in the cluster do not match the available versions, follow the instructions in the Upgrade Kubernetes Components in VMware Cloud Director Container Service Extension Clusters. Do not continue with the workflow you are currently in.
Updated - VMware Cloud Director Container Service Extension 4.2 does not support Tanzu Kubernetes Grid versions 1.6.1, 1.5.4 and 1.4.3.
Tanzu Kubernetes Grid versions 1.6.1, 1.5.4 and 1.4.3 are no longer supported by VMware. For more information on the end of this support, see https://lifecycle.vmware.com/#/.
It is necessary for service providers and tenant users to upgrade pre-existing Tanzu Kubernetes Grid 1.6.1, 1.5.4 and 1.4.3 clusters to Tanzu Kubernetes Grid versions 2.1.1, 2.2, 2.3.1 or 2.4 and supported Kubernetes versions. New cluster deployment attempts from VMware Cloud Director Container Service Extension 4.2 using unsupported Tanzu Kubernetes Grid versions 1.6.1, 1.5.4 and 1.4.3 will fail.
New - For API users, the cluster list page displays an error message if the node pool names in the cluster's capi yaml are changed after cluster creation.
The cluster list datagrid fails to load, and displays the following error message:
Error: Failed to fetch Kubernetes clusters
This issue is fixed for VMware Cloud Director Container Service Extension 4.2.
New - For API users, the cluster list page displays an error message if the control plane node pool name does not end in -control-plane-node-pool.
The cluster list datagrid fails to load, and displays the following error message:
Error: Failed to fetch Kubernetes clusters
This error means that API users must name their control plane node pool in a format that ends with -control-plane-node-pool
.
This issue is fixed for VMware Cloud Director Container Service Extension 4.2.
New - In the Kubernetes Container Clusters UI plugin, new worker node pools are created using parameters that were not specified in the Create New Worker Node Pools menu.
Unspecified values in the Create New Worker Node Pools menu may result in the node pool using the parameters of other node pools. This issue affects the following parameters:
vGPU Activated
Sizing Policy
Placement Policy or vGPU Policy (if GPU toggle is activated)
Storage Profile
Disk Size
This issue is fixed for VMware Cloud Director Container Service Extension 4.2.
New - In the Kubernetes Container Clusters UI plugin, the Kubernetes Version dropdown menu in the cluster creation wizard for a TKGs cluster displays a spinner indefinitely.
On the Kubernetes Policy page in the cluster creation wizard, the Kubernetes Version dropdown selection displays a spinner indefinitely.
This occurs due to the supported Kubernetes version API sending an invalid response to the Kubernetes Container Clusters UI plugin, so the UI plugin fails to parse it.
This issue is fixed for VMware Cloud Director Container Service Extension 4.2.
GPU operator installation fails on the vGPU node in the cluster.
The Kubernetes Cluster API Provider for VMware Cloud Director component previously installed nvidia-container-runtime
package, and edited containerd
runtime to point to nvidia-container-runtime
during the provisioning of vGPU node in the VMware Cloud Director Container Service Extension Kubernetes cluster. This customization interfered with Nvidia GPU Operator installation.
This issue is fixed in Kubernetes Cluster API Provider for VMware Cloud Director v1.1.1.
Kubernetes Cluster API Provider for VMware Cloud Director v1.1.1 removes the customization in favor of GPU Operator installation. Users who want to manually install drivers on vGPU node are required to perform the following customization workflow:
Install the nvidia-container-runtime package
Update the containerd configuration to point to the NVIDIA runtime.
For more information. see Kubernetes Cluster API Provider for VMware Cloud Director v1.1.1 Release Notes.
The CSE Management workflow in a multi-site VMware Cloud Director setup only allows for a single server configuration entity.
In Kubernetes Container Clusters UI plug-in, the CSE Management workflow in a multi-site VMware Cloud Director setup may display a server configuration entity that belongs to a different site. This results in the CSE Management workflows failing in multi-site environments.
This issue is fixed in VMware Cloud Director Container Service Extension 4.1.1a. The CSE Management workflow in Kubernetes Container Clusters UI plug-in now only fetches the server configuration entity that belongs to the site where the user is currently logged-in. This fix allows each site in a multi-site environment to create, and manage its own server configuration entity.
In VMware Cloud Director Service Provider Portal, if a service provider navigates into a specific cluster, and returns to the landing page, the cluster list does not display all clusters.
Service providers can view a full list of clusters that are in an environment in VMware Cloud Director Service Provider portal. From this view, if a service provider clicks in to a cluster to view details of that cluster, navigates to the Persistent Volumes tab, and clicks back to return to the listing of all clusters, then the original list of all the clusters is not visible. Some clusters do not display on the list.
This issue is fixed for VMware Cloud Director Container Service Extension 4.1.1a.
The Kubernetes Container Clusters UI plug-in does not display the Container Registry setting in Server Details page.
In Kubernetes Container Clusters UI plug-in, in the CSE Management tab, the Server Details tab does not display the Container Registry setting.
This issue is fixed in VMware Cloud Director Container Service Extension 4.1.1a.
The Projector version is not updated in RDE after Projector deployment is upgraded in the cluster.
The version of the Projector component is not updated in the cluster RDE's projector status section, even though the version in the Projector deployment changes.
This issue is fixed in VMware Cloud Director Container Service Extension 4.1.1a.
vApp creation failure events does not display error message
During cluster creation, if the create vApp operation fails then the Event Details tab for that cluster in the Kubernetes Container Clusters UI-Plugin does not show full error message.
This issue is fixed in VMware Cloud Director Container Service Extension 4.1.1a. The error message is now included in the Detailed Error section that aids in troubleshooting.
Tanzu Standard Repository is not installed with clusters that are created in VMware Cloud Director Container Service Extension 4.1 with Tanzu Kubernetes Grid 2.1.1 and 2.2.
In VMware Cloud Director Container Service Extension 4.1.1a, clusters created with Tanzu Kubernetes Grid 2.1.1 and 2.2 automatically have the Tanzu-standard repository installed.
For clusters that were created using VMware Cloud Director Container Service Extension 4.1, it is necessary to install the repository and packages. For more information, see Known Issues.
Upgrading a cluster using Kubernetes Container Clusters UI plugin 4.1 fails when the cluster was initially created using Kubernetes Container Clusters UI plugin 4.0.
When you use Kubernetes Container Clusters UI plugin 4.1 to upgrade a cluster that was created in Kubernetes Container Clusters UI plugin 4.0, the following error is seen in the Events tab, even though the cluster-upgrade-script was executed successfully without any errors:
Error: PatchObjectError
error message: [VCDCluster.infrastructure.cluster.x-k8s.io "<cluster-name>" is invalid: spec.loadBalancerConfigSpec: Invalid value: "null": spec.loadBalancerConfigSpec in body must be of type object: "null"] during patching objects
This issue is fixed in VMware Cloud Director Container Service Extension 4.1.1a.
Auto Repair on Errors toggle must be deactivated immediately after a cluster is created.
It is possible that a provisioned cluster can go into an error state due to a known issue. If the Auto Repair on Errors feature is activated on the cluster, that cluster can get deleted and recreated, which causes disruption of workloads on that cluster.
When you create clusters, it is recommended to deactivate the Auto Repair on Errors toggle to avoid clusters from getting deleted, and recreated if they go into error state.
For more information on the Auto Repair on Errors in the cluster creation workflow, see Create a VMware Tanzu Kubernetes Grid Cluster.
The Auto Repair on Errors setting is deactivated by default in the Kubernetes Container Clusters UI. If you activate it for any reason, you must turn it off immediately after cluster is provisioned.
This issue has been fixed for VMware Cloud Director Container Service Extension 4.1.1a.
VMware Cloud Director Container Service Extension Server additionally deactivates the Auto Repair on Error toggle after a cluster is created successfully.
Updated - When new organizations are added to VMware Cloud Director, the VMware Cloud Director Container Service Extension server may fail to provide access to VMware Cloud Director Container Service Extension configuration to the new organizations.
When this issue occurs, the following error message appears in the VMware Cloud Director Container Service Extension log file:
"msg":"error occurred while onboarding new tenants with ReadOnly ACLs for VCDKEConfig: [unable to get all orgs: [error occurred retrieving list of organizations: [error getting list of organizations: 401 Unauthorized]]]"
Also, tenant users may see the following warning message and be blocked when they try to create a cluster using VMware Cloud Director Container Service Extension UI.
Cannot fetch provider configuration. Please contact your administrator
This issue is fixed for VMware Cloud Director Container Service Extension 4.1.1a.
For each cluster, repeated messages of Invoked getFullEntity (urn:vcloud:entity:vmware:capvcdCluster:{ID})
display in the Recent Tasks pane of VMware Cloud Director UI.
This issue is happening because VMware Cloud Director Container Service Extension is retrieving RDE for all the clusters instead of retrieving RDE for unprocessed clusters.
This issue has been fixed for VMware Cloud Director Container Service Extension 4.1.1a.
Clusters created using Kubernetes Cluster API Provider for VMware Cloud Director (CAPVCD) management cluster, without involvement of VMware Cloud Director Container Service Extension server, displays a Pending status in Kubernetes Container Clusters UI.
This issue has been fixed for VMware Cloud Director Container Service Extension 4.1. The status of these clusters is now Non-CSE. The only permitted operation for these non-VMware Cloud Director Container Service Extension clusters is to download the kube config
of the cluster.
The Kubernetes Container Clusters UI plugin storage profile selection form fields do not filter storage policies by entitytype
.
The storage profile selection form fields display all storage profiles visible to the logged-in user, such as VMs, vApps, Catalog items, or named disks.
This issue has been fixed for VMware Cloud Director Container Service Extension 4.1. In Kubernetes Container Clusters UI in VMware Cloud Director Container Service Extension 4.1, the storage policy selection only shows storage policies that support any of these entitytypes:
vApp and VM templates
Virtual machines
Kubernetes cluster resize operation fails in VMware Cloud Director Container Service Extension 4.0.x.
If users attempt to change organization VDC names in VMware Cloud Director after clusters are created, further cluster operations such as cluster resize can fail.
This issue has been fixed for VMware Cloud Director Container Service Extension 4.1.
When a node of the cluster is deleted due to failure in vSphere or other underlying infrastructure, VMware Cloud Director Container Service Extension does not inform the user, and it does not auto-heal the cluster.
When the node of a cluster is deleted, basic cluster operations, such as cluster resize and cluster upgrade, continue to work. The deleted node remains in deleted state, and is included in computations regarding size of the cluster.
This issue has been fixed for VMware Cloud Director Container Service Extension 4.1, as the Node Health Check Configuration feature addresses this occurence.
The cluster creation for multi-control plane or multi-worker node goes into an error state. The Events tab in the cluster details page shows an EphemeralVMError
event due to the failure to delete ephemeralVM in VMware Cloud Director.
The same error events can appear repeatedly if the Auto Repair on Errors setting is activated on the cluster. If the Auto Repair on Errors setting is off, sometimes the cluster can show an error state due to the failure to delete the ephemeralVM in VMware Cloud Director even though the control plane and worker nodes are created successfully.
This issue is visible in any release and patch release after but not including VMware Cloud Director 10.3.3.3, and any release and patch release starting with VMware Cloud Director 10.4.1.
This issue is fixed for VMware Cloud Director Container Service Extension 4.1 release.
When a force delete attempt of a cluster fails, the ForceDeleteError
that displays in the Events tab of the cluster info page does not provide sufficient information regarding the failure to delete the cluster.
This issue is fixed for VMware Cloud Director Container Service Extension 4.1 release.
New - In VMware Cloud Director Container Service Extension 4.2, if you force delete clusters that are configured to use IP Spaces, the IP allocated to the cluster and/or Kubernetes services running on the cluster are not released automatically, and manual intervention is necessary.
New - Creating clusters with Node Health Check activated, or activating Node Health Check for existing clusters causes the cluster to become unmanageable.
This error is caused by Kubernetes Container Clusters UI plugin 4.2 creating the MachineHealthCheck capi yaml object with an invalid apiVersion value. Clusters that contain this invalid MachineHealthCheck section become unmanageable.
There are two ways in Kubernetes Container Clusters UI 4.2 plugin for a cluster to enter this unmanageable state:
If a user creates a cluster with Node Health Check activated, then the resulting cluster will be unmanageable. Users should ensure that Node Health Check is deactivated when creating clusters with UI plugin 4.2.
If a user activates Node Health Check for a cluster that has never previously activated Node Health Check, the cluster will become unmanageable.
If a cluster was created using a lower UI plugin version and Node Health Check had already been activated before, then that cluster can activate or deactivate Node Health Check without encountering this issue.
Workaround:
The only workaround for the Kubernetes Container Clusters UI Plugin 4.2 to fix a cluster that has this issue is to manually update the cluster’s MachineHealthCheck apiVersion value to cluster.x-k8s.io/v1beta1 through API.
New - The Create a Tanzu Kubernetes Grid Cluster workflow, and Create New Worker Node Pools workflow fails when sizing policy, and vGPU policy are both specified if the vGPU policy already contains sizing information.
Workaround:
If you select a vGPU policy that already contains sizing information during Create a Tanzu Kubernetes Grid Cluster workflow, or Create New Worker Node Pools workflow, do not also select a sizing policy.
New - The Upgrade Availability value on the cluster list page, and on the Cluster Information page displays Error
.
When you select a cluster in the cluster list datagrid, Error
displays in the Upgrade column.
In the Cluster Information page, the Upgrade Availability property displays Error
.
Workaround:
This occurs if there is a corrupted or invalid vApp template in a catalog that is visible to the user. Remove the problematic vApp template from the catalog, and then the user must refresh their browser.
New - Kubernetes templates datagrid displays error message, and fails to load.
When the Kubernetes templates datagrid fails to load, the following error message displays:
Error: Failed to fetch Kubernetes Templates
Workaround:
This occurs if there is a corrupted or invalid vApp template in a catalog that is visible to the user. Remove the problematic vApp template from the catalog, and then the user must refresh their browser.
java.lang.OutOfMemoryError: Java heap space
error causes VMware Cloud Director services to fail continuously after startup.
VMware Cloud Director crashes due to OutOfMemoryError
. The issue occurs when the resolve operation is invoked on an RDE that has a lot of tasks associated with it. The issue is present on VMware Cloud Director 10.4 and above. For more information, see VMware Knowledge Base Article 95464.
Do not change the Registry URL in VMware Cloud Director Container Service Extension configuration as changes are not supported.
Use load balancers to front registry virtual machines to swap the virtual machines out if necessary.
If you use VMware Cloud Director 10.4.2.2, the cluster deletion workflow can fail in Kubernetes Container Clusters UI.
The cluster deletion operation fails with the following error:
"error": "failed to delete VCD Resource [clusterName] of type [VApp] from VCDResourceSet of RDE [urn:vcloud:entity:vmware:capvcdCluster:<uuid>]: [failed to update capvcd status for RDE [urn:vcloud:entity:vmware:capvcdCluster:<uuid>]; expected http response [200], obtained [400]: resp: [\"{\\\"minorErrorCode\\\":\\\"BAD_REQUEST\\\",\\\"message\\\":\\\"[ a8e89bd2-195d-458b-808d-3ff81e074fa0 ] RDE_CANNOT_VALIDATE_AGAINST_SCHEMA [ #/status/capvcd/vcdResourceSet/2: expected type: JSONObject, found: Null\\\\n ]\\\",\\\"stackTrace\\\":null}\"]: [400 Bad Request]]"
This is a bug of VMware Cloud Director 10.4.2.2. For more information, see VMware Cloud Director 10.4.2.2 Known Issues.
Workaround:
Use Force Delete workflow to delete the cluster.
In Kubernetes Container Clusters UI plug-in, the CSE Management upgrade workflows may add or remove rights from the CSE Admin Role or Kubernetes Cluster Author.
It is necessary to manually update the Custom roles that are cloned from CSE Admin Role or Kubernetes Cluster Author. If you do not do this, users can face errors during cluster workflows.
VMware Cloud Director Container Service Extension does not automaticially install a Tanzu-standard repository in the Tanzu Kubernetes Grid 2.1.1 and 2.2 clusters.
Workaround:
Install the repository and packages using documentation for Tanzu Kubernetes Grid 2.2 and 2.1.1 clusters. For more information, see VMware Tanzu Kubernetes Grid 2.1, and VMware Tanzu Kubernetes Grid 2.2.
Updated - Tenant users may see the following warnings when they attempt certain actions using VMware Cloud Director Container Service Extension.
Warnings:
Cannot fetch provider configuration. Please contact your administrator.
Tenant users may see this warning, and be blocked when they try to create a cluster.
Node Health Check settings have not been configured by your provider.
Tenant users may see this warning when they try to activate Node Health Check during cluster creation or in the cluster settings.
These warnings can occur for the following reasons:
The VMware Cloud Director Container Service Extension server has not finished starting up.
The VMware Cloud Director Container Service Extension server has not yet published the server configuration to tenant organizations. The server configuration is published automatically every hour from the server startup as the server is running. Therefore, publishing to new tenant organizations that are created during hourly window occurs at the end of the hour.
The tenant user's role does not have the following right: View: VMWARE:VCDKECONFIG. This right was added to the Kubernetes Cluster Author global role in VMware Cloud Director Container Service Extension 4.1.
There was an unexpected error while fetching the server configuration.
Workaround:
Service providers must ensure the VMware Cloud Director Container Service Extension server is set up, and operating successfully.
Ensure the tenant user's role has the right View: VMWARE:VCDKECONFIG. Tenant users must log out of VMware Cloud Director, and log back in to activate any changes made to their role.
Wait for hourly publishing to new organizations.
In some instances, nodes cannot join clusters even when the cluster is in an available state. This issue can occur intermittently.
The following error appears in the Events tab of the cluster info page in Kubernetes Container Clusters UI:
VcdMachineScriptExecutionError
with the following details:
script failed with status [x] and reason [Date Time 1 /root/node.sh: exit
[x]]
Workaround:
For VMware Cloud Director Container Service Extension 4.1, there is a retry mechanism added that uses a retry feature from Cluster API which reduces the occurrence of this issue.
VMware Cloud Director Container Service Extension 4.1 does not support Dynamic Host Configuration Protocol (DHCP)
The cluster creation workflow in VMware Cloud Director Container Service Extension 4.1 fails if the cluster is connected to a routed organization VDC network that uses DHCP instead of static IP pool to distribute IPs to virtual machines.
VMware Cloud Director Container Service Extension 4.1 only supports organization VDC networks in the following circumstances:
If the VDC is routed.
If the VDC uses static IP pool to distribute IPs to virtual machines that are connected to it.
It is not possible to activate GPU support in an airgapped cluster.
As VMware cannot redistribute nVidia packages, it is not possible to activate GPU support in an airgapped cluster out of box. The failure occurs when the cluster attempts to download the nVidia binary from nvidia.github.io
in the cloud initialization script.
Workaround:
As VMware cannot redistribute nVidia packages, it is not possible to activate GPU support in an airgapped cluster out of box. The failure occurs when the cluster attempts to download the nVidia binary from nvidia.github.io
in the cloud initialization script. Service providers can potentially consider allowing the cluster access to nvidia.github.io
by using a proxy server.
Audit_trail
table grows rapidly in the VMware Cloud Director database due to RDE modify events being too large.
RDE modify events log the whole body of the RDE that has changed. These large events cause the audit_trail
table to grow longer than necessary.
Workaround:
Upgrade to VMware Cloud Director 10.3.3.4 or above. If you are using VMware Cloud Director 10.3.3.4, set the audit.rde.diffOnly
config property to True
.
If you are using VMware Cloud Director 10.4.0 and above, there is no requirement to set any configuration property.
VMware Cloud Director Container Service Extension 4.1 uses Kubernetes Cluster API Provider for VMware Cloud Director 1.1 and Kubernetes Cloud Provider for VMware Cloud Director 1.4 as default. These two component versions do not support IP spaces.
Tanzu Addons-Manager does not appear after upgrading to Tanzu Kubernetes Grid 2.2.0 with Kubernetes v1.24+.
After you upgrade a VMware Cloud Director Container Service Extension 4.0.3 cluster from Tanzu Kubernetes Grid 1.6.1 with Kubernetes v1.23.x to Tanzu Kubernetes Grid 2.2 with Kubernetes v1.24.x, tanzu-addon-controller-manager
pod is stuck at PENDING or CrashLoopBackOff
state for the following reason:
Error from server (NotFound): packageinstalls.packaging.carvel.dev "addons-manager.tanzu.vmware.com" not found
Use the following workaround to manually delete the tanzu-addons-controller-manager
deployment and PackageInstall object.
To delete the deployment, perform the following commands:
kubectl get deployments -A
kubectl delete deployment -n tkg-system tanzu-addons-controller-manager
To delete the PackageInstall
object, perform the following commands:
kubectl get packageinstall -A
kubectl delete packageinstall -n tkg-system tanzu-addons-manager
An ephemeral VM is created during the cluster creation process, and is deleted by VMware Cloud Director Container Service Extension when the cluster creation process is complete. It is possible that the API request to delete the ephemeral VM can fail.
VMware Cloud Director Container Service Extension reattempts to delete the ephemeral VM for up to 15 minutes. In an event that VMware Cloud Director Container Service Extension fails to delete the ephemeral VM after reattempting, it leaves the ephemeral VM in the cluster's VApp without deleting it.
The following error appears in the Events tab of the cluster info page in Kubernetes Container Clusters UI:
EphemeralVMError
with the following details:
error deleting Ephemeral VM [EPHEMERAL-TEMP-VM] in vApp [cluster-vapp-name]: [reason for failure]. The Epemeral VM needs to be cleaned up manually.
The reason for failure depends on the stage at which the ephemeral VM deletion failed. Once you observe this notification, it is safe to delete the ephemeral VM from the cluster's VApp in the VMware Cloud Director UI.
Workaround:
Log in to the VMware Cloud Director Tenant Portal, and from VMware Cloud Director navigation menu, select Data Centers.
In the Virtual Data Center page, select the organization tile, and from the left navigation menu, select vApps.
In the vApps page, select the vApp of the cluster.
In the cluster information page, click the ellipse to the left of the Ephermal VM, and click Delete.
However, if the ephemeral VM is not manually cleaned up, and if a delete request is issued, the cluster delete operation fails. It is then necessary to force delete the cluster.
Log in to VMware Cloud Director, and from the top navigation bar, select More > Kubernetes Container Clusters.
Select a cluster, and in the cluster information page, click Delete.
In the Delete Cluster page, select the Force Delete checkbox, and click Delete.
It is not possible to create clusters in VMware Cloud Director Container Service Extension 4.1. when using a direct organization VDC network with NSX in VMware Cloud Director.
VMware Cloud Director Container Service Extension 4.1 clusters do not support this configuration.
In VMware Cloud Director Container Service Extension, the creation of Tanzu Kubernetes Grid clusters can fail due to a script execution error.
The following error appears in the Events tab of the cluster info page in Kubernetes Container Clusters UI:
ScriptExecutionTimeout
with the following details:
error while bootstrapping the machine [cluster-name/EPHEMERAL_TEMP_VM]; timeout for post customization phase [phase name of script execution]
Workaround:
When this error occurs, it is recommended to activate Auto Repair on Errors from cluster settings. This instructs VMware Cloud Director Container Service Extension to reattempt cluster creation.
Log in to VMware Cloud Director, and from the top navigation bar, select More > Kubernetes Container Clusters.
Select a cluster, and in the cluster information page, click Settings, and activate the Auto Repair on Errors toggle.
Click Save.
It is recommended to deactivate the Auto Repair on Errors toggle when troubleshooting cluster creation issues.
In Kubernetes Container Clusters UI plug-in, the cluster delete operation can fail when the cluster status is Error.
To delete a cluster that is in Error status, it is necessary to force delete the cluster.
Log in to VMware Cloud Director, and from the top navigation bar, select More > Kubernetes Container Clusters.
Select a cluster, and in the cluster information page, click Delete.
In the Delete Cluster page, select the Force Delete checkbox, and click Delete.
ERROR: failed to create cluster: failed to pull image
failure
This error occurs in the following circumstances:
When a user attempts to create a Tanzu Kubernetes Grid Cluster using VMware Cloud Director Container Service Extension 4.1, and it fails intermittently.
An image pull error due to a HTTP 408 response is reported.
This issue can occur if there is difficulty reaching the Internet from the EPHEMERAL_TEMP_VM to pull the required images.
Potential causes:
Slow or intermittent Internet connectivity.
The network IP Pool cannot resolve DNS (docker pull error).
The network MTU behind a firewall must set lower.
To resolve the issue, ensure that there are no networking connectivity issues stopping the EPHEMERAL_TEMP_VM from reaching the Internet.
For more information, refer to https://kb.vmware.com/s/article/90326.
Users may encounter authorization errors when executing cluster operations in Kubernetes Container Clusters UI plug-in if a Legacy Rights Bundle exists for their organization.
After you upgrade VMware Cloud Director from version 9.1 or earlier, the system may create a Legacy Rights Bundle for each organization. This Legacy Rights Bundle includes the rights that are available in the associated organization at the time of the upgrade and is published only to this organization. To begin using the rights bundles model for an existing organization, you must delete the corresponding Legacy Rights Bundle. For more information, see Managing Rights and Roles.
In the Administration tab in the service provider portal, you can delete Legacy Rights Bundles. For more information, see Delete a Rights Bundle. Kubernetes Container Clusters UI plug-in CSE Management has a server setup process that automatically creates, and publishes Kubernetes Clusters Rights Bundle to all tenants. The rights bundle contains all rights that are involved in Kubernetes cluster management in VMware Cloud Director Container Service Extension 4.0.
Resizing or upgrading a Tanzu Kubernetes Grid cluster using kubectl.
After a cluster has been created in the Kubernetes Container Clusters UI plug-in, you can use kubectl to manage workloads on Tanzu Kubernetes Grid clusters.
If you also want to lifecycle manage, resize and upgrade the cluster through kubectl instead of the Kubernetes Container Clusters UI plug-in, complete the following steps:
Delete the RDE-Projector operator from the cluster kubectl delete deployment -n rdeprojector-system rdeprojector-controller-manager
Detach the Tanzu Kubernetes Grid cluster from Kubernetes Container Clusters UI plug-in.
In the VMware Cloud Director UI, in the Cluster Overview page, retrieve the cluster ID of the cluster.
Update the RDE with entity.spec.vcdKe.isVCDKECluster
to false.
Get the payload of the cluster - GET https://<vcd>/cloudapi/1.0.0/entities/<Cluster ID>
Copy and update the json path in the payload. - entity.spec.vcdKe.isVCDKECluster
to false.
PUT https://<vcd>/cloudapi/1.0.0/entities/<Cluster ID>
with the modified payload. It is necessary to include the entire payload as the body of PUT operation.
At this point the cluster is detached from VMware Cloud Director Container Service Extension 4.1, and it is not possible to manage the cluster through VMware Cloud Director Container Service Extension 4.1. It is now possible to use kubectl to manage, resize or upgrade the cluster by applying CAPI yaml, the cluster API specification, directly.
Policies selection in VMware Cloud Director Container Service Extension 4 plug-in does not populate the full list after selection for the purpose of policy modification.
When a user selects a sizing policy in the Kubernetes Container Clusters plug-in and they want to change it, the dropdown menu only displays the selected sizing policy, and does not automatically load alternative sizing policies.
The user has to delete the text manually to allow the alternative sizing policies to appear. This also occurs in the dropdown menu when the user selects of placement policies and storage policies.
This is intentional. This is how the combobox html, Clarity, web component works.
Note:Clarity is the web framework that VMware Cloud Director UI is built on.
The dropdown box uses the input text as a filter. When nothing is in the input field, you can see all selections, and the selections filter as you type.
When you create a VMware Cloud Director Container Service Extension cluster, a character capitalization error appears.
In the Kubernetes Container Clusters UI, if you use capital letters, the following error appears:
Name must start with a letter, end with an alphanumeric, and only contain alphanumeric or hyphen (-) characters. (Max 63 characters)
This is a restriction set by Kubernetes. Object names are validated under RFC 1035 labels. For more information, refer to Kubernetes website.
Kubernetes Container Clusters UI-Plugin 4.1 does not interoperate with other Kubernetes Container Clusters UI plug-ins, such as 4.0 and 3.5.0.
The ability to operate these two plug-ins simultaneously without conflict is a known VMware Cloud Director UI limitation. You can only have one plug-in activated at any given time.
VMware Cloud Director Container Service Extension fails to deploy clusters with TKG templates that have an unmodifiable placement policy set on them.
Log in to the VMware Cloud Director Tenant Portal as an administrator.
Click Libraries > vApp Templates.
In the vApp Templates window, select the radio button to the left of the template.
In the top ribbon, click Tag with Compute Policies.
Select the Modifiable checkboxes, and click Tag.
In VMware Cloud Director 10.4, service providers are unable to log-in to the VMware Cloud Director Container Service Extension virtual machine by default.
In VMware Cloud Director 10.4, after deploying the VMware Cloud Director Container Service Extension virtual machine from OVA file, the following two checkboxes in the VM settings page are not selected by default:
Allow local administrator password
Auto-generate password
It is necessary to select these checkboxes to allow providers to log-in to the VMware Cloud Director Container Service Extension virtual machine in future to perform troubleshooting tasks.
Log in to VMware Cloud Director UI as a service provider, and create a vApp from the VMware Cloud Director Container Service Extension OVA file. For more information, see Create a vApp from VMware Cloud Director Container Service Extension server OVA file.
Once you deploy the vApp, and before you power it on, go to VM details > Guest OS Customization > Select Allow local administrator password and Auto-generate password.
After the vApp update task finishes, power on the vApp.
Fast provisioning must be deactivated in Organization VDC in order to resize disks.
Log in to VMware Cloud Director UI as a provider, and select Resources.
In the Cloud Resources tab, select Organization VDCs, and select an organization VDC.
In the organization VDC window, under Policies, select Storage.
Click Edit, and deactivate the Fast provisioning toggle.
Click Save.
When you log in as a service provider, after you upload the latest UI plug-in, the CSE Management tab does not display.
Deactivate the previous UI plug-in that is built into VMware Cloud Director.
Log in to VMware Cloud Director UI as a provider, and select More > Customize Portal.
Select the check box next to the names of the target plug-ins, and click Enable or Disable.
To start using the newly activated plug-in, refresh the Internet browser page.
If there are multiple activated plugins with the same name or id but different version, the lowest version plug-in is used. Therefore, only activate the highest version plug-in. Deactivate all other version plug-ins.
For more information on managing plug-ins, see Managing Plug-Ins.