VMware Cloud Director Container Service Extension 3.1.3 Release Notes

VMware Cloud Director Container Service Extension 3.1.3 | 14 APR 2022

Check for additions and updates to these release notes.

What's New in April 2022

New RDE 2.1 for TKG and native clusters. For more information, refer to Sample input specification file.
New Kubernetes Container Clusters plugin version 3.3.0. You can download the plugin directly from VMware Cloud Director 10.3.3 Download page and install into VMware Cloud Director 10.3.1+ that is in use.
Support for default storage class for TKG clusters through UI Plugin 3.3.0 and CLI on VCD 10.3.1+. You can download the UI Plugin 3.3.0 from VMware Cloud Director 10.3.3 Download page directly.
Support for Ubuntu 20.04 Kubernetes OVAs from VMware Tanzu Kubernetes Grid Versions 1.5.1 and Kubernetes version 1.22.
Support for TKG Core Package installation for kapp-controller and metrics-server for TKG clusters. For more information, refer to CSE 3.1.3 Core package installation .
Support for TKG compatible Antrea version installation for TKG clusters, by default. Users can overwrite the Antrea version. For more information, refer to CSE 3.1.3 for TKG - CPI, CSI, and CNI fields in RDE 2.1 .
Support for Kubernetes External Cloud Provider for VCD (CPI) version 1.1.1, as default. For more information, refer to Kubernetes External Cloud Provider for VMware Cloud Director.
Support for Kubernetes Container Storage Interface for VCD (CSI) version 1.2.0, as default. For more information, refer to Container Storage Interface (CSI) driver for VMware Cloud Director Named Independent Disks.
Support for Python version 3.10 for CSE installation. For more information, refer to Installing CSE software.
Support for Antrea, CPI, CSI version override in cluster spec CSE server configuration. For more information, refer to CSE 3.1.3 for TKG - CPI, CSI, and CNI fields in RDE 2.1 and extra_optionsSection.
A new Ubuntu 20.04 Native template for K8s 1.23 Kubernetes Clusters. For more information, refer to Template Announcements.
Revision updates to existing Ubuntu 16.04 Native templates. For more information, refer to Template Announcements.

Known Issues

Resizing pre-existing TKG clusters after upgrade to CSE 3.1.3 fails with “kubeconfig not found in control plane extra configuration” in server logs

In VMware Cloud Director Container Service Extension 3.1.3, the control plane node writes the kubeconfig to the extra config so that worker nodes can install core packages. During cluster resize, when the pre-existing cluster’s worker nodes look for the kubeconfig, the control plane’s extra config does not have it because the cluster was created prior to 3.1.3.

Resolution

This issue is fixed in VMware Cloud Director Container Service Extension 3.1.4 for pre-existing clusters not to retrieve the kubeconfig during cluster resize.
Workaround for VMware Cloud Director Container Service Extension 3.1.3
1. Log in to the control plane vm
2. Add a placeholder VM extra config element by using vmtoolsd --cmd "info-set guestinfo.kubeconfig $(echo VMware | base64)". This step allows the worker nodes to retrieve a placeholder kubeconfig even though this kubeconfig won’t be used.
3. Verify if the extra config element has been set properly with command vmtoolsd --cmd "info-get gustinfo.kubeconfig"
4. Reattempt Resize operation
Cluster creation fails when the VMware Cloud Director external network has a DNS suffix and the DNS server resolves localhost.my.suffix to a valid IP.

This is due to a bug in etcd.

The main issue is that etcd prioritizes the DNS server over the /etc/hosts file to resolve hostnames, when the conventional behavior would be to prioritize checking any hosts files before going to the DNS server. This becomes problematic when kubeadm attempts to initialize the control plane node using localhost. etcd checks the DNS server for any entry like localhost.suffix, and if this actually resolves to an IP, attempts to do some operations involving that incorrect IP, instead of localhost.
1. Create a kubeadm config file, and modify the kubeadm init command in the VMware Cloud Director Container Service Extension control plane script for the template of the cluster you are attempting to deploy.
2. Change the command to the following command: kubeadm init --config >/path/to/kubeadm.yaml > /root/kubeadm-init.out
Note: The VMware Cloud Director Container Service Extension control plane script is located at ~/.cse-scripts/<template name>_rev<template_revision>/scripts/mstr.s.

Note: It is necessary to specify the Kubernetes version within the configuration file as --kubernetes-version and --config are incompatible.
VMware Cloud Director Container Service Extension service fails to start.

Restart the VM to start the service
Failures during template creation or installation
- One of the template creation scripts may have exited with an error.
- One of the scripts may be hung waiting for a response.
- If the VM has no Internet access, scripts can fail.
- Check VMware Cloud Director Container Service Extension logs for script outputs, to determine the cause behind the observed failure.
VMware Cloud Director Container Service Extension upgrade from 3.1.3 to 3.1.3 fails.

The use case for upgrading from VMware Cloud Director Container Service Extension 3.1.3 to 3.1.3 is needed if cse upgrade or cse install fails. It is necessary to run cse upgrade for VMware Cloud Director Container Service Extension to run, but the upgrade fails.
1. Delete the VMware Cloud Director Container Service Extension connection.
2. Run cse install again.
No kapp-controller or metrics-server version is installed or listed in the UI/CLI on TKG clusters using TKG ova 1.3.X

The compatible kapp-controller and metrics-server versions are listed in an ova’s TKR bom file. For TKG ova 1.3.Z, these versions are not found in the same sections of the TKR bom file as the sections for TKG ova’s >= 1.4.0.
Output of vcd cse cluster info for TKG clusters has the kubeconfig of the cluster embedded in it, while output for Native clusters don’t have it.

Although both native and TKG clusters use RDE 2.0.0 for representation in VMware Cloud Director, they differ in their structure. kubeconfig content being part of the output of vcd cse cluster info for TKG clusters and not native clusters is by design.
In VMware Cloud Director Container Service Extension 3.1.1, vcd-cli prints the error Error: 'NoneType' object is not subscriptable to console on invoking VMware Cloud Director Container Service Extension commands.

This error is observed when VMware Cloud Director Container Service Extension tries to restore a previously expired session and/or the VMware Cloud Director Container Service Extension server is down or unreachable.

Logout and log back into vcd-cli before executing further VMware Cloud Director Container Service Extension related commands.
In VMware Cloud Director Container Service Extension 3.1, pre-existing templates do not work after upgrading to VMware Cloud Director Container Service Extension 3.1: legacy_mode=true.

After upgrading to VMware Cloud Director Container Service Extension 3.1 running in legacy_mode, existing templates do not work, unless their corresponding scripts files are moved to the right location. VMware Cloud Director Container Service Extension 3.0.x stores the template script files under the folder ~/.cse_scripts, VMware Cloud Director Container Service Extension 3.1.0 stores them under ~./cse_scripts/<template cookbook version>.
- Create a folder named ~./cse_scripts and move all contents of ~./cse_scripts into it.
- Alternatively, you can recreate the templates.
In VMware Cloud Director Container Service Extension 3.1, deletion of a cluster in an error state fails in CLI and UI

The deletion of a cluster that is in an error state (RDE.state = RESOLUTION_ERROR (or) status.phase = <Operation>:FAILED), can fail with Bad request (400)
Depending on your VMware Cloud Director version, use one of the following solutions:

VMware Cloud Director 10.3:
1. Login to VMware Cloud Director as the user who installed VMware Cloud Director Container Service Extension.
2. In vcd-cli, enter the following command for RDE resolution: POST https://<vcd-fqdn>/cloudapi/1.0.0/entities/{cluster-id}/resolve
3. In vcd-cli, enter the following command for RDE deletion: DELETE https://<vcd-fqdn>/cloudapi/1.0.0/entities/{cluster-id}?invokeHooks=false
4. vApp deletion: Delete the corresponding vApp from UI or by API call:
  1. API call: Perform GET https://<vcd-fqdn>/cloudapi/1.0.0/entities/{cluster-id} to retrieve the vApp Id, which is the same as the externalID property in the corresponding RDE. Invoke Delete vApp API.
  2. UI: Identify the vApp with the same name as the cluster in the same Organization virtual datacenter and delete it.
For VMware Cloud Director 10.3, use vcd cse cluster delete --force to delete clusters that cannot be deleted. For more information, refer to Force Deleting Clusters.
In VMware Cloud Director Container Service Extension 3.1, pending tasks are visible in the VMware Cloud Director UI immediately after using the cse upgrade command.

After you upgrade to VMware Cloud Director Container Service Extension 3.1 using the cse upgrade command, you may notice pending tasks on RDE based Kubernetes clusters. This is a cosmetic issue, and does not effect the functionality. The pending tasks disappear after 24 hours of inactivity.
VMware Cloud Director Container Service Extension 3.1 ignores the api_version property in the config.yaml.

You do not need to start VMware Container Service Extension 3.1 with a particular VMware Cloud Director API version. It can now accept incoming requests at any supported VMware Cloud Director API version. For more information, refer to changes in the Configuration File.
VMware Cloud Director Container Service Extension 3.1 upgrade fails to update the clusters owned by System users correctly.

During the cse upgrade, the RDE representation of the existing clusters is transformed to become forward compatible. The newly created RDEs are supposed to be owned by the corresponding original cluster owners in the process. However, the ownership assignment can fail if the original owners are from the System org. This is a bug in VMware Cloud Director.

Edit the RDE by updating the owner.name and owner.id in the payload: PUT https://<vcd-fqdn>/cloudapi/1.0.0/entities/id?invokeHooks=false
Unable to change the default storage profile for Native cluster deployments
The default storage profile for native cluster deployments cannot be changed in VMware Cloud Director Container Service Extension, unless specified in CLI.

VMware Cloud Director follows a particular order of precedence to pick the storage-profile for any VM instantiation:
1. User-specified storage-profile
2. Storage-profile with which the template is created, if VM is being instantiated from a template.
3. Organization virtual datacenter default storage-profile
1. Disable the storage-profile with which the template is created on the ovdc.
2. Set the desired storage-profile as default on the ovdc.