VMware Telco Cloud Automation 2.3 Release Notes

VMware Telco Cloud Automation 2.3 \| 10 Apr 2023 \| Build - VM-based: 21563123, HA-based: 21563125 \| Release Code: R151 Check for additions and updates to these release notes.

VMware Telco Cloud Automation 2.3 | 10 Apr 2023 | Build - VM-based: 21563123, HA-based: 21563125 | Release Code: R151

Check for additions and updates to these release notes.

What's New

VMware Telco Cloud Automation version (TCA) 2.3 delivers critical fixes for version 2.2 and includes the following new capabilities:

Infrastructure Automation
Enhanced Interoperability for VMware Telco Cloud Platform
CaaS Enhancements
Cloud Native Network Function
Key Features in Technical Preview
Diagnosis and RAN Checklist Validation
OCI-Based Helm Charts
MultiCloud Deployment on EKS

Infrastructure Automation

Scale requirement to support 15000 cell site hosts for VM-based deployments

From VMware Telco Cloud Automation 2.3 onwards, the infrastructure automation supports a maximum of 15000 hosts per TCA.

This enhanced scalability is supported only for VM-based environments and not Cloud native.
Rate limiting API requests

From VMware Telco Cloud Automation release 2.3 onwards, a rate limiter is introduced to limit the number of the API requests. You will need to retry if the limit exceeds. You can change the configuration of API limit and interval in the config/api_rate_limit_config.json file. After making the changes, restart the container.
Backup implementation for cloud_config and cloud_spec

From the VMware Telco Cloud Automation release 2.3 onwards, the cloud_config and cloud_spec files are automatically backed up.
Deletion of workflow to execute in async mode 

Improved resiliency in managing the deletion of hosts.
ZTP support for CN-based predeployed environment

Infrastructure automation is now supported on cloud-native Telco Cloud Automation.
Scalability extended to support fifteen thousand cell site hosts

Scalability extended to support an inventory size of fifteen thousand hosts per VMware Telco Cloud Automation instance.
Datastore creation with a custom name

VMware Telco Cloud Automation 2.3 supports datastore name customization on cell site hosts.

Enhanced Interoperability for VMware Telco Cloud Platform

VMware Telco Cloud Automation 2.3 provides interoperability support for the following.

Product	Supported Versions
VMware vCenter Server	8.0b, 8.0u1
VMware vSphere	8.0b, 8.0u1
VMware NSX-T Note: Enables IPv6 for NSX Control-Plane	4.0.1.1, 4.1.0.2
VMware Tanzu Kubernetes Grid	Kubernetes 1.24.10, 1.23.16, 1.22.17 Long-term support for K8S 1.24
VMware Cloud Director	10.4, 10.4.1
VMware Managed Cloud on AWS	M22
VMware vRealize Orchestrator	8.10, 8.10.1, 8.10.2, 8.11, 8.11.1

CaaS Enhancements

CaaS v2 Templates

You can create CaaS templates that contain most of the placement and sizing-related information required for v2-based CaaS Clusters. Using these templates, you can easily deploy new CaaS Clusters through the TCA portal.
Improved CaaS scale and performance

Node pool creation is faster on a large scale, and it can handle high concurrency.
CaaS Addons
- Default deployment configuration for Prometheus and FluentBit (for vROps) - Provides reference configuration for integrating with VROps and VRLI.
- TKG standard extension support - Allows faster installation of TKG standard extensions through the Telco Cloud Automation UI.
- Multiple storage classes support.
- Restic support for the Velero add-on to backup NFS Persistent Volumes in the workload cluster.
CaaS security
- PSA (Pod Security Admission) control for Kubernetes 1.24.x.
- vSphere CSI can use separate credentials.
- Airgap server STIG hardening - From the VMware Telco Cloud Automation 2.3 release, Airgap Server deployments are more secure by following the STIG security guidelines.
VMware Tanzu Kubernetes Grid 2.1.1 uptake
- New Kubernetes versions: 1.24.10, 1.23.16, 1.22.17
- TKG 2.1.1 uptake with 1.24-based Kubernetes Clusters:
  - Support for lifecycle management of TKG 2.1.1 clusters.
  - TKG workload clusters with Kubernetes versions 1.22.17, 1.23.16, and 1.24.10.
  - TKG management clusters with Kubernetes version 1.24.10.
- TKG 2.1 LTS / ES support for Kubernetes 1.24.x - From the VMware Telco Cloud Automation 2.3 release, Kubernetes 1.24.x workload cluster is compliant with the TKG Extended Support policy.
- Cluster certificate expiry renewal - The Kubernetes cluster certificate automatically renews the control plane node VM before the certificate expiration.

Cloud Native Network Function

Allows CNF repository URL update - You can update the repository URL as part of the CNF Reconfigure and Upgrade operation.
You can perform the Rollback and Retry operations when a CNF upgrade fails.
The CNF update and upgrade operations are merged.
Better management of multiple values.yaml which has conflicting entries.

Key Features in Technical Preview

TCA Multi-tenancy
Workflow Hub (formerly known as Pipeline Builder)
ETSI-SOL Release 4 API support

Note:

Contact the VMware Telco Cloud Automation Product Management team for more details.

Diagnosis and RAN Checklist Validation

A general-purpose diagnosis tool based on plug-in architecture. For VMware Telco Cloud Automation 2.3, this tool uses plugins to run a set of validations on Cell Sites against the RAN BOM checklist. This helps vendors to validate their sites before and after the DU instantiation and ensure that the validations are per VMware recommendations.

OCI-Based Helm Charts

VMware Telco Cloud Automation release 2.3 supports OCI-based helm charts hosted on Harbor repositories for CNF LCM operations.

MultiCloud Deployment on EKS

You can deploy a cloud-native Telco Cloud Automation appliance on the Bring Your Own (BYO) cluster using the AWS Elastic Kubernetes Service (EKS).

Important notes

Management cluster creation
MachineHealthCheck config of the management cluster
Migration of V1 workload cluster upgraded from TKG 1.6 to TKG 2.1 to V2 workload cluster
MongoDB to Postgres Migration
Merging of CNF Update and Upgrade operations
Harbor: Chartmuseum deprecation and removal

Management cluster creation

No support to reflect the control plane name and worker node name into the management cluster node name separately.

The old name format of the control plane node is {$CLUSTER_NAME}-{$CONTROL_PLANE_NAME}-control-plane-{$RANDOM_CHARACTER} and the new name format of control plane node is {$CLUSTER_NAME}-{$RANDOM_CHARACTER}.

The old name format of the worker node is {$CLUSTER_NAME}-{$NODE_POOL_NAME}-{$RANDOM_CHARACTER}-{$RANDOM_CHARACTER} and the new name format of the worker node is {$CLUSTER_NAME}-md-0-{$RANDOM_CHARACTER}.

No support to set different values of Resource Pool, VM Folder, Network Name, and Name-servers for control plane nodes and worker nodes.

MachineHealthCheck config of the management cluster

Enable MachineHealthCheck by default when creating a management cluster.
No support to activate, deactivate, or update the parameters of MachineHealthCheck after the management cluster creation is completed.
No support to customize MemoryPressure, DiskPressure, PIDPressure, and NetworkUnavailable conditions.
No support to specify different Node start-up timeouts for control plane nodes and worker nodes.
No support to specify a different timeout of unhealthy conditions for control plane nodes and worker nodes.

Migration of V1 workload cluster upgraded from TKG 1.6 to TKG 2.1 to V2 workload cluster

Behavioral changes:

The control plane nodes are created after migration
Node pool worker nodes are created after migration

Explanation:

In TKG 2.1, either the control plane or the node pool is updated one after another during the V1 to V2 transformation.

Estimation of the impact of recreation:

Transformation is completed during the maintenance time similar to upgrade
No impact on the k8s state and application state which persisted in the rolling update

MongoDB to Postgres Migration

VMware Telco Cloud Automation 2.3 replaces the internal database from MongoDB to Postgres.

The change is automatically managed when you upgrade from a TCA 2.2-based appliance. It is highly recommended to take snapshots and backups of all the TCA appliances before upgrading to TCA 2.3.

Important:

Upgrades to VMware Telco Cloud Automation 2.3 are only possible from VMware Telco Cloud Automation 2.2. TCA 2.1.x or prior versions must be upgraded to version 2.2 before upgrading to TCA 2.3.

Merging of CNF Update and Upgrade operations

VMware Telco Cloud Automation 2.3 merges the Update and Upgrade operations on CNF into one single operation through the UI. This simplifies the flow from a user’s perspective and removes any ambiguity caused due to multiple operations.

Harbor: Chartmuseum deprecation and removal

Harbor 2.6 deprecates chartmuseum-based helm charts. OCI-based helm charts are the new standard. Customers should ensure compatibility of helm charts and availability of chartmusuem support on Harbor before upgrading or deploying the latest Harbor versions. See Updating CNF Repository from Chartmuseum to OCI.

Deprecations

Helm v2.x is no longer supported.
CaaS v1 workload cluster will not be supported from the next release onwards.

Kubectl and SSH

Fixed issue 3096058: Open terminal fails if logged in to TCA-M with a different domain

Open terminal fails if the user logged in to the Telco Cloud Automation-Manager (TCA-M) has a domain different from the one that is configured within the TCA-M appliance manager.

Cluster Automation

Fixed issue 15130: Second cluster creation is stuck when the network of one workload endpoint IP is unreachable in the large-scale environment

Airgap

Fixed issue 3068146: Harbor credentials are not cleared when the airgap operations such as setup, sync, and deployment fail with a wrong harbor password
Fixed issue 3067906: Airgap technical support collection fails with insufficient space on /tmp

If the server runs for a longer period, the Airgap technical support collection fails with insufficient space on /tmp.

Workflows

Fixed issue 3087794: RBAC filters for Workflow instances do not function correctly
Fixed issue 3055138: Global tags are not applicable for Workflows

Dual Stack

Fixed issue 3086288: Upgrading the Dualstack environment from TCA 2.2 to future releases results in issues when performing CaaS operations

The issue is because the file /opt/third-party/environment-vars-config.sh is not backed up during the upgrade process resulting in the exportIPv6_system=true flag missing in the file.

Known Issues

Cluster Automation
Partner Systems
Network Service LCM
EKS/VMC Node Customization
CaaS Backup and Restore
Elastic Kubernetes Service
VNF
Host Configuration
Kubectl and SSH
Network Function LCM
Workflows

Cluster Automation

Issue 3266678: Node Pool scale-out causes the entire node pool to be redeployed including existing nodes

Scaling out a node pool within a workload cluster deployed as part of VMware Telco Cloud Automation release 2.1 redeploys the entire node pool and replaces the existing nodes. However, this occurs only for the first scale-out operation of a node pool. Subsequent scale-out operations do not have this issue.
Issue 3109704: When creating or editing a cluster, an error message is displayed based on the cluster status condition

When creating or editing a cluster, the following error message is displayed when vCenter is configured on TCA-CP using FQDN and is later re-configured using IP or vice versa:

"admission webhook \“vvcenterprime.kb.io\” denied the request: VCenterPrime.telco.vmware.com \“p-ee60411cdd00338a0f67125da7720543\” is invalid: spec.server: Invalid value: v1alpha1.ServerInfo{Address:\“10.10.10.10\“"

Workaround:

vCenter needs to be restored to the previously configured address, whether it’s FQDN or IP.
Issue 3152043: Importing the BYOI template does not automatically enable "Synchronize time periodically"

When you import the BYOI template from vCenter, you must enable “Synchronize time periodically” manually.
Workaround:

After importing the BYOI template from vCenter, you must edit the VM.
1. Click VM Options > VMware Tools.
2. Enable "Synchronize time periodically" and save the configuration.
3. Covert the VM into a template.
Issue 3151224: When you install the management cluster, the control plane label is not tagged to the control plane nodes

After you add a label for the management cluster control plane, the label is not shown on the control plane nodes.
Workaround:
1. Login to the TCA-CP console as an admin user.
2. Switch to the management cluster, and add a label for the control plane nodes.
```
kubectl label {node_name} {label}
```
Issue 3150997: When the management cluster enables a machine health check and the Management cluster control plane scales out, the new master node is tagged with the node pool label: telco.vmware.com/nodepool={namepool name}

When the Management cluster control plane scales out or the management cluster enables a machine health check and creates a new node as a remediation process, the new node is listed on the Node Pool Details table.
Workaround:
1. Login to the TCA-CP console as an admin user.
2. Switch to the management cluster, list the nodes, and node labels.
```
kubectl get nodes --show-labels
```
3. Remove the node pool label from the control plane nodes.
```
kubectl label node {node_name} {label_key}-
```
Issue 3152373: The “Copy Specification and Deploy New Cluster” function fails to copy tkg-contour to the new cluster

When using the "Copy Specification and Deploy New Cluster" function to create a new cluster, if TKG standard extensions are installed on the existing cluster, one of the TKG standard extensions tkg-contour is not copied to the new cluster.

Workaround:

You can deploy the TKG standard extensions either during the cluster creation or after the cluster creation.
Issue 3150766: Management Cluster upgrade in TCA 2.2 corrupted the underlying workload clusters’ node pool machineset version

The Kubernetes version with the status TcaNodePool is incorrect and is not in sync with the actual Kubernetes version of the v2 workload cluster. This impacts the v2 workload cluster upgrade.
Workaround:

Update the Kubernetes MachineDeployment version of the node pool on the management cluster.
1. SSH login management cluster control plane node
2. Query MachineDeployment of the node pool by using the node pool name.
```
kubectl get md <$NODE_POOL_NAME> -n <$CLUSTER_NAME>
```
3. Edit MachineDeployment of node pool, update spec.template.spec.version to the actual Kubernetes version.
```
kubectl edit md <$NODE_POOL_NAME> -n <$CLUSTER_NAME>
```
After step 3 is completed, the nodes of the node pool are roll-updated, and the Kubernetes version with the status TcaNodePool is corrected.
Issue 3151324: Cannot deploy Management cluster on cell site hosts

You cannot deploy the Management cluster on cell site hosts in VMware Telco Cloud Automation release 2.3.
Issue 3119914: sshAuthorizedKeys is missing when the v1 workload cluster is transformed to v2

After the transformation of the workload cluster from v1 to v2, logging in to the cluster node without a password through TCA-CP is not allowed.

Note:
Scale out or upgrade the control plane nodes by using the edit cluster operation to enable sshAuthorizedKeys.
Issue 3151034: Storage Class is not deleted from the backend even after deleting from the VMware Telco Cloud Automation UI

When you remove the storage class by using the vsphere-csi addon configuration, the storage class still exists in the backend and is not removed from the VMware Telco Cloud Automation UI.
Workaround:
1. Log in to the VMware Telco Cloud Automation Control Plane console as an admin user
2. List the storage class by using the following command:
```
kubectl get sc -A
```
3. Delete the specified storage class by using the following command:
```
kubectl delete sc -n <namespace> <name-of-storage-class>
```
Issue 3117105: A failed management cluster with the status "Deleting" cannot be removed

You cannot delete a failed management cluster that is stuck in the Deleting status. The management cluster namespace status on the minikube cluster displays as Terminating.
Workaround:
1. Verify the status of the management cluster namespace on the minikube cluster and use your management cluster name instead of test-mc.
```
[admin@tcacp ~]$ kubectl get ns test-mc -oyaml NAME STATUS AGE test-mc Terminating 19h
```
2. Remove the finalizers of the related CR manually if the status of the namespace reports the error 'Some content in the namespace has finalizers remaining: telco.vmware.com/virtualmachine in 1 resource instances'.
```
[admin@tcacp ~]$ kubectl get vme -n test-mc
NAME                            AGE
bootstrap-test-mc-master-5e3b   20h
[admin@tcacp ~]$ kubectl edit  vme -n test-mc bootstrap-test-mc-master-5e3b
```
The Namespace of the management cluster is deleted after step 2 and the management cluster status on Telco Cloud Automation is Deleted.
Issue 12641: Multi-zone labels do not appear on v1 and v2 workload cluster nodes

When multi-zone is enabled, the vSphere tags on the datacenter, cluster, and hosts do not appear on the Kubernetes nodes as labels, and the csinodes objects do not include the topologyKeys parameter.
Workaround:

On the workload cluster, do the following:
1. Clear all the csinodetopology custom resources.
```
kubectl delete csinodetopology --all
```
2. Restart the vsphere-csi-node daemonset.
```
kubectl rollout restart ds -n kube-system vsphere-csi-node
```
Issue 3111959: When you delete Persistent Volume, the data is not completely removed from the system

When you delete Persistent Volume (PV), the data is not completely removed. The data file is stored on the NFS server with the prefix "archived-".

Workaround:

To delete the PV data completely, remove the archive-* file from the NFS server manually.
Issue 3109712: MTU value set for the management cluster does not work as expected

After setting the Maximum Transmission Unit (MTU) value for the management cluster (Kubernetes Version: v1.24.10) control plane and worker nodes, the MTU value switches back to 1500, which is the default value.
Issue 3085524: Installation of load-balancer-and-ingress-service addon fails due to the length of the workload cluster

When the length of v2 workload clusters is more than 29 characters, the installation of the load-balancer-and-ingress-service addon fails for the respective v2 workload clusters.

Workaround:

Make sure the length of the v2 workload cluster name is lesser than 29 characters if you want to install the load-balancer-and-ingress-service addon.
Issue 3064429: fluent-bit add-on in CrashLoopBackOff state

When the user sets the value of the cpu-manager-policy field to static for cluster nodes, then the fluent-bit pod on those nodes crashes.

Workaround:

Remove the cpu-manager-policy configuration for the node pools.
Issue 17598: Different resource pools for TKG management cluster control plane nodes and worker nodes are not supported

You cannot have different resource pools, VM folders, network names, and DNS for TKG management cluster control plane nodes and worker nodes.

Partner Systems

Issue 3112830: Management cluster creation fails if the CA Certificate content is not provided

Management cluster creation fails if the CA Certificate content is not provided within Airgap Partner System registration.

Workaround:

For greenfield deployments, even though a public-signed certificate is used in the airgap server while registering the airgap server as a partner system in TCA, you must provide a CA certificate.

For brownfield upgrades, where the airgap server is not registered with the CA certificate, use the airgap server for management cluster upgrades, workload cluster upgrades, and workload cluster creations.

For management cluster creations, when creating the cluster don't select the airgap server from the list. Instead, provide the details of the airgap server by selecting the repository details in the Airgap Repository field.

Network Service LCM

Issue 3150854:The Reset state is always enabled for Network Service instances

The Reset state is enabled while the Retry operation is in progress.
Issue 3074296: Terminate and delete operations are allowed on the underlying Network Service instances

You can delete or terminate underlying (nested) Network Service Instances.

EKS/VMC Node Customization

Issue 3152364: Node customization and workflows are not supported for EKS/AWS
The following node customizations are not supported:
- SRIOV Interface addition and configuration
- NUMA alignment of vCPUs and VF/PFs
- DPDK binding for SRIOV interfaces
- Passthrough devices for PTP

CaaS Backup and Restore

Issue 3154144: Backup of Persistent Volume using the Restic plugin fails

If you use the Restic plugin to back up Persistent Volume to remote Object Storage through the HTTP/HTTPS proxy, the backup process fails.
Workaround:
1. Pause the velero package install on the workload cluster.
```
kubectl patch pkgi velero -n tca-system -p '{"spec":{"paused":true}}' --type=merge
```
2. Edit the Restic Daemonset to add the same proxy settings as the velero deployment.
```
- name: HTTP_PROXY
  value: ***
- name: HTTPS_PROXY
  value: ***
- name: NO_PROXY
  value: ***
```

Elastic Kubernetes Service

Issue 3156354: Open Terminal/Download Kubeconfig does not work for EKS-based CNFs and Clusters

Login to CNFs and Clusters through Kube Config, Login Credentials, or Terminal on an EKS-based TCA does not work.

VNF

Issue 3154546: Syncing remote tasks to TCA-M fails
The format for the field lastSyncTime after the upgrade is not supported by the 2.3 codebase.
- Lower-level task updates are no longer shown on the TCA-M UI. For example, during VNF instantiation, as part of the Create Servers step, you no longer see the intermediate message "cloning server <vdu-name>".
- TCA Control Plane (TCA-CP) app logs are filled with error messages.
Workaround:

Follow the steps in Remote tasks no longer appear on TCA Manager (TCA-M) after the upgrade to TCA 2.3.0.

Host Configuration

Issue 3164391: CSAR instantiation fails due to an empty PCI ID in ESX info CR

If you modify an existing host profile to enable SR-IOV on a vNIC and create a new PCI group for the vNIC, the PCI group entry in the esxinfo Custom Resource (CR) has an empty PCI ID causing the CSAR instantiation to fail.
Workaround:

Use one of the following workarounds:
- Instead of modifying the existing host profile, create a new host profile and apply it to the host or domain.
- Delete the existing esxinfo CR by logging into the management cluster and resyncing the host profile for the host or domain.

Kubectl and SSH

Issue 3263189: Open Terminal does not work for CaaS clusters when pod security policies are applied
Issue 1226141: Open Terminal does not work for users with permissions that are backed by vCenter-AD groups
Workaround:
1. Configure the username directly within TCA permissions.
2. Configure TCA to use AD as the authentication provider directly instead of using AD through vCenter.

Network Function LCM

Network Function listing does not work if a Workflow name within CSAR contains a comma

When you upload a CSAR containing a comma within the built-in workflow name and then instantiate a Network Function Instance through this CSAR, it causes a failure of the Network Function listing with the following error:

[{"code":"E_INTERNAL_ERROR","message":"Error while getting VNF instances. Reason:JSONObject[\"href\"] not found."}]
Workaround:
1. Edit the Network Function CSAR and remove the comma from the workflow name to ensure that the subsequent instantiations are not affected.
2. Perform one of the following workarounds for the instantiated NF:
- a) Terminate and delete the offending Network Function through an API.
Note:
You can fetch the Network Function ID from the Postgres Database, Vnf table.
- i) SSH to the TCA-M system as the admin user.
- ii) Run connect-to-postgres.
- iii) Execute the following query by replacing the REPLACE_NF_INSTANCE_NAME_HERE string with the NF instance name:
- select val ->> 'id' as id from "Vnf" where val ->> 'vnfInstanceName' = 'REPLACE_NF_INSTANCE_NAME_HERE';
- iv) Through an API client, execute the Terminate NF API:
- POST:
- /telco/api/vnflcm/v2/vnf_instances/NF_INSTANCE_ID/terminate
- Request Body:
- {"terminationType":"GRACEFUL","gracefulTerminationTimeout":120,"additionalParams":{"lcmInterfaces":[]}}
Note:
If the NF has Termination Workflows, add appropriate sections in the lcmInterfaces section of the Request Body.
- v) Execute the Delete NF API.
- DELETE /telco/api/vnflcm/v2/vnf_instances/NF_INSTANCE_ID
- b) Edit the Postgres database > Vnf table to rectify the NF instance with the correct _link.
Note:
You can contact VMware support for further assistance.

Workflows

Issue 3263361: Copying multiple files at various steps of a v3 schema workflow reports an error

Copying multiple files at various steps of a v3 schema workflow reports the error “Unable to get attachment fileName to file location destinationFilePath”.

Workaround:

Use v2 schema workflows to copy multiple files at various steps of a workflow.
Issue 3145966: Clicking the Workflow Name under Workflow Executions redirects the user to the Workflows tab in the VNF catalog

Clicking the Workflow Name under Workflow Executions redirects the user to the Workflows tab in the VNF/NS catalog even though the workflow definition is available under the Workflows catalog.
Issue 3152700: RBAC Workflows do not have the privilege of Tags

A user with Workflow Design privileges cannot design a workflow due to the unavailability of the Tags privilege.

Workaround:

Associate the workflow design privilege with Network Function/Network Service catalog or Network Function/Network Service instance privileges or any other privilege that provides the privileges to Tags.
Issue 3146011: No error is displayed when a standalone workflow is uploaded to the standalone workflow catalog without an attachment

When a standalone workflow that has an attachment is uploaded to the standalone workflow catalog and if the attachment referenced in the workflow definition is missing, no error is displayed.

Workaround:

Ensure that you upload the attachments that are referenced in the workflow definition.

Security Fixes

STIG hardening for node: sshd to be configured on the Photon operating system to use FIPS 140-2 ciphers.

STIG hardening is enabled for airgap server operating systems.