VMware Telco Cloud Automation 2.0 | 14 DEC 2021 | Build - VM-based: 19033650, HA-based: 19033599 | R148

Check for additions and updates to these release notes.

What's New

New Platform Interoperability

  • New Platform Interoperability

    VMware Telco Cloud Automation 2.0 brings interoperability support for the following:

    Product Supported Versions
    VMware vCenter Server
    • 7.0u2d
    • 7.0u3
    VMware NSX-T
    • 3.1.3
    VMware Integrated OpenStack
    • 7.1
    VMware Tanzu Kubernetes Grid
    • 1.4 with Kubernetes versions: 1.19.12, 1.20.8 and 1.21.2
    VMware Cloud Director
    • 10.3
    VMware vRealize Orchestrator
    • 8.4
    • 8.4.1
    • 8.4.2

    Note: You can download Photon BYOI templates for VMware Tanzu Kubernetes Grid 1.4 from the VMware Customer Connect site at https://customerconnect.vmware.com/. For steps to download Photon BYOI templates, see the Important Notes section.

Platform Usability and Deployability

  • VMware Telco Cloud Automation Control Plane Deployment on VMC (Phase One of VMware Telco Cloud Automation as a Service)

    VMware Telco Cloud Automation adds support for managing public clouds hosted on VMware Cloud on AWS as VIM endpoints. This enables users to deploy CaaS clusters and NFV workloads modeled as Network Functions on such public clouds.

  • Context Based UI Navigation

    VMware Telco Cloud Automation improves user experience by retaining the context of the search or filter within Network Functions and Network Services.

  • Cloud Native Re-Architecture

    VMware Telco Cloud Automation now supports a highly available, clustered mode install on Day 0 using VMware Tanzu Kubernetes Grid. This cloud-native platform provides VMware Telco Cloud Automation with improved active-active availability and sets the base for future scalability.

  • Airgapped Model Support

    VMware Telco Cloud Automation now provides an Airgap Server Setup reference guide and Ansible scripts for bootstrap and configuration automation. It provides a two-stage procedure guide for building an airgap server in an Internet accessible lab, and import to a fully isolated production environment.

  • License Enhancements

    Users can now see the CPU license usage along with entitled licenses. When the usage crosses 90% of entitlement, an alert is raised. License calculation is based on the managed vCPU counts and using a transformation factor to convert to CPU licenses.

  • 7.0U3 Uptake for TCP RAN 1.5

    The ESX 7.0U3 release coupled with the new real-time Photon version 198-5 (or later) contains several optimizations for meeting the latency requirements for RAN workloads.

  • Platform Core Count Optimization

    With earlier releases of VMware Telco Cloud Automation, the core overhead was two physical cores for ESXi and two physical cores for Photon and Kubernetes. VMware Telco Cloud Automation 2.0 includes enhancements to the stack that brings down the minimum overhead to one physical core for ESXi and one physical core for Photon on a worker node VM.

Network Function and Network Service Automation

  • Updated SOL001 2.7.1 Compliance

    All new Network Functions designed within VMware Telco Cloud Automation 2.0 are compliant with SOL001 2.7.1. Existing and older Network Functions would continue to work without any disruption. VMware Telco Cloud Automation automatically updates the SOL001 compliance version for older catalog items upon edit.

  • Workflow Designer and Editor

    Network Function and Network Service catalogs are now enriched with a rich UI-based Workflow editor. Users can create, edit, and view Workflows starting with VMware Telco Cloud Automation 2.0.

  • Editing Network Functions and Network Services

    Users have a new option of editing any Network Function or Network Service through the UI. Catalog Items edited in this manner would automatically be updated to the latest SOL compliance version (2.7.1). Network Functions and Network Services open by default in a Read-Only view starting VMware Telco Automation 2.0.

  • Multiple Harbor repository configurations

    VMware Telco Cloud Automation 2.0 brings in the ability to associate multiple Harbor repositories with any given CaaS cluster. Users would now have an option to select the corresponding repository from any of the Harbor systems associated with the CaaS Cluster.

  • Cloud Director – Using vApp Templates as a VNF

    VNFs packaged as VMware Cloud Director vApp Templates can now be directly cloned to create new VNF instances.

  • Cloud Director – Select Storage Profile

    VMware Telco Cloud Automation enhances the instantiation flow of VNFs on VMware Cloud Director systems by allowing the user to select the specific Storage Profile on which they want to deploy their VNF.

  • Cloud Director – Leaner Inventory

    VMware Telco Cloud Automation improves its integration with VMware Cloud Director by reducing the amount of inventory being synced.

CaaS Automation

  • CaaS Clusters with Custom MTU and enhanced support for Secondary Networks

    Custom MTU values can be provided during CaaS cluster creation for both primary and secondary networks. Additionally, secondary interfaces can be modified for overriding the MTU or the underlying Network.

  • Enhanced Data Path support using NSX-T ENS

    CaaS Clusters and NFV workloads can now leverage superior network performance by utilizing underlying NSX-T ENS enabled networks.

  • Addition of secondary Multus networks via Late Binding

    VMware Telco Cloud Automation 2.0 enables users to add secondary network interfaces to their CaaS Workload Clusters using the Late Binding process as part of CNF instantiation.

  • Better Observability for CaaS Infrastructure

    VMware Telco Cloud Automation 2.0 enables users to track the health of various Kubernetes components by providing a deep-dive view for the CaaS Infrastructure directly within the UI.

  • Simplified upgrades for CaaS Infrastructure

    CaaS Management Cluster upgrades are completely decoupled and do not incur any downtime to NFV workloads running on CaaS Infrastructure.

  • Upgrade Add-on options for CaaS Clusters

    Users have the option to Upgrade Add-ons for their CaaS Clusters on demand. Cluster and CNF operations are now supported for CaaS Clusters deployed in previous releases of VMware Telco Cloud Automation without the need of upgrading the Clusters or the Add-ons within them.

Infrastructure Automation

  • Infrastructure Automation for vSphere 7.0U2 GA

    VMware Telco Cloud Automation supports vSphere 7.0U2 release as part of the Infrastructure Automation. Central sites (CDC) and Regional site (RDC) will be based on this updated version of vSphere.

  • Tool for Patching ESXi-BIOS/Firmware Updates - Host Profiles

    Infrastructure Automation supports the Host Profiles feature to automate BIOS or Firmware configuration and the PCI functionality of ESXi hosts. Host Profiles are associated with Infrastructure Automation domains such as CDC, RDC, or cell site groups, and applied to hosts as part of the overall provisioning.

Technical Preview

  • Network Slicing in VMware Telco Cloud Automation (Tech Preview)

    Introducing a new layer of automation for the Telco Cloud – Project Lattice, 5G Network Slicing. 5G Network Slicing is a technology and an architecture that enables the creation of on-demand, isolated, and logical networks, running on a shared and common infrastructure, across clouds, cloud technologies, and network domains (stretching all the way from the core through transport and RAN terminating at the user equipment).

  • VMware Telco Cloud Automation Supports Amazon EKS as VIM (Tech Preview)

    VMware Telco Cloud Automation Infrastructure Automation now adds Amazon Elastic Kubernetes Service (Amazon EKS) as a public cloud VIM option. This addition extends the current VIM support for CNF onboarding, CNF instantiation, and CNF LCM operations on AWS.

Important Notes

  • Static IP Address Requirement for Kubernetes Control Plane

    A set of static virtual IP addresses must be available for all the clusters that you create, including both Management and Tanzu Kubernetes Grid clusters.

    • Every cluster that you deploy to vSphere requires one static IP address for Kube-Vip to use for the API server endpoint. You specify this static IP address when you deploy a management cluster. Make sure that these IP addresses are not in the DHCP range but are in the same subnet as the DHCP range. Before you deploy management clusters to vSphere, make a DHCP reservation for Kube-Vip on your DHCP server. Use an auto-generated MAC Address when you make the DHCP reservation for Kube-Vip so that the DHCP server does not assign this IP to other machines.
    • Each control plane node of every cluster that you deploy requires a static IP address. This includes both Management clusters and Tanzu Kubernetes Grid clusters. These static IP addresses are required in addition to the static IP address that you assign to Kube-Vip when you deploy a management cluster. To make the IP addresses that your DHCP server assigned to the control plane nodes static, you can configure a DHCP reservation for each control plane node in the cluster, after you deploy it. For instructions on how to configure DHCP reservations, see your DHCP server documentation.

    For more information, see the VMware Tanzu Kubernetes Grid 1.4.0 documentation at: https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.4/vmware-tanzu-kubernetes-grid-14/GUID-mgmt-clusters-vsphere.html.

  • Operation Limitation

    After upgrading VMware Telco Cloud Automation to version 2.0, if you do not upgrade the clusters and cluster add-ons, changing the Harbor user name to include special characters will not work. A valid harbor user name must match regex '^[-._a-zA-Z0-9]+$'). The change will work only after you upgrade the clusters and cluster add-ons to 2.0.

  • Download Photon BYOI Templates for VMware Tanzu Kubernetes Grid

    To download Photon BYOI templates, perform the following steps:

    1. Go to the VMware Customer Connect site at https://customerconnect.vmware.com/.
    2. From the top menu, select Products and Accounts > All Products.
    3. In the All Downloads page, scroll down to VMware Telco Cloud Automation and click Download Product.
    4. In the Download VMware Telco Cloud Automation page, ensure that the version selected is 2.0.
    5. Click the Drivers & Tools tab.
    6. Expand the category VMware Telco Cloud Automation Photon BYOI Templates for TKG.
    7. Against Photon BYOI Templates for VMware Tanzu Kubernetes Grid 1.4.0, click Go To Downloads.
    8. In the Download Product page, download the appropriate Photon BYOI template.
  • Upgrading VMware Telco Cloud Automation

    Upgrade VMware Telco Cloud Automation Manager and all its registered VMware Telco Cloud Automation Control Plane (TCA-CP) nodes to version 2.0 together. After upgrading VMware Telco Cloud Automation Manager to version 2.0, and before performing any operations on the user interface, ensure that you upgrade all the registered TCA-CP nodes to version 2.0. Force-refresh the VMware Telco Cloud Automation Manager user interface after upgrading to version 2.0.

  • CaaS Pre-Upgrade Checklist

    Analyze the following points in the existing deployment:

    • Note down the deployment specifications for all the Kubernetes clusters that were deployed as part of VMware Telco Cloud Automation version 1.9.5 or earlier.
    • Placement - Compare the deployment and ensure that the virtual machine paths of the cluster such as resource pools and virtual machine folders exist. Also, ensure that the clusters are placed within the folders specified in the deployment specification.
    • Networks - Compare the deployment and ensure that the cluster virtual machines are attached to the correct networks that were selected during cluster deployment.
    • Storage - Verify that the minimum storage size for each Control-Plane and Worker node within the cluster is at least 30 GB.
  • CaaS Upgrades

    • Upgrade all the Management Clusters.
    • To ensure zero down time for any NFV workloads deployed on clusters, VMware Telco Cloud Automation decouples the upgrades between management clusters and workload clusters. For more information, see the VMware Telco Cloud Automation 2.0 User Guide.
    • Upgrade any workload cluster running on Kubernetes version 1.18 to the latest Kubernetes version.
    • VMware Telco Cloud Automation 2.0 supports all the operations on supported legacy workload clusters (deployed in version 1.9.5 and earlier). You can also upgrade the Add-ons on the clusters individually.
    • VMware Telco Cloud Automation 2.0 recommends to upgrade workload clusters to their latest corresponding version.
  • Discontinuation of 20 GB BYOI Templates

    VMware Telco Cloud Automation has discontinued support for Kubernetes management and workload clusters that have Control Plane or worker nodes with a storage size of less than 30 GB. Subsequently, all the clusters that you deploy or upgrade would utilize only a single set of BYOI templates, which have a minimum disk size of 30 GB. If the storage size for the existing Control Plane and worker nodes is less than 30 GB, you must redeploy these clusters using the recommended set of BYOI templates.

  • Discontinuation of Kubernetes Version 1.18

    VMware Telco Cloud Automation, along with VMware Tanzu Kubernetes Grid, has discontinued support for Kubernetes management and workload clusters with version 1.18. It is mandatory to upgrade any 1.18 Kubernetes clusters to a newer version after upgrading to VMware Telco Cloud Automation 2.0.

  • Upgrade Path for Kubernetes Clusters with Version 1.17.

    Before upgrading to VMware Telco Cloud Automation 2.0, ensure that you upgrade Kubernetes from version 1.17.x to 1.18.x or later. Otherwise, there is no upgrade path for Kubernetes clusters with version 1.17.x.

Known Issues

Cluster Upgrade

  • Kubernetes upgrade fails or times out.

    Upgrade workload cluster times out because kube-vip floats to unready control plane. In such scenario, upgrade gets blocked at this step.

    1. Reboot unready control plane node.
    2. Check whether the pods status of cluster shows Running
    3. If the pod status of cluster shows Running, retry the upgrade using VMware Telco Cloud Automation.

  • VMware vSphere CSI daemonset does not restart after the change in CSI configurations, due to which nodes are not labeled with zone and region info. Also the topologyKeys fails to populate.

    The vSphere CSI daemonset fails to restart after the change in CSI configurations, due to which nodes are not labeled with zone and region info. Also the topologyKeys fails to populate.

    1. Login to the Workload Cluster control plane.
    2. Restart the vsphere CSI node daemonset manually after applying the CSI parameters from VMware Telco Cloud Automation.
    3. Ensure that the operation is complete from VMware Telco Cloud Automation.
    4. Command to restart CSI daemonset : kubectl rollout restart ds vsphere-csi-node -n kube-system.
    5. Check the Node labels after around 2 minutes.

  • Cannot upgrade the workload cluster and VMware Tanzu Kubernetes Grid "displays no available upgrades for cluster" error.

    The upgrade path to the VMware Tanzu Kubernetes Grid CR is empty.

    1. Switch to mamanagemnt cluster context.
    2. Get the name oftkr-controller-manager pod.
    3. Restart tkr-controller-manager pod of management cluster.
    4. Get the status of the target tkr using Kubectl get status command.
    5. Wait till the status of target tkr becomes True.
    6. Check whether the UpdatesAvailable of target tkr has upgrade path.
    7. If the UpdatesAvailable of target upgrade path shows an upgrade path, retry upgrade in VMware Telco Cloud Automation.

Infrastructure Automation

  • No support for pre-deployed CDC or RDC in cloud native deployment.

    VMware Telco Cloud Automation 2.0 does not support pre-deployed CDC or RDC in a cloud native deployment.

    No workaround.

  • NTP server validation failure when adding a DNS server for a domain backed by (e.g, by load-balancing) several NTP servers.

    If a specified DNS server address for a domain is backed by (e.g, by load-balancing) several NTP servers, then the NTP validation performed by CloudBuilder may fail.

    Do not add DNS server address for a domain backed by (e.g, by load-balancing) several NTP servers.

  • Service install failure in cloud native deployment

    Infrastructure Automation task Deploy TCA services on Kubernetes cluster or Deploy TCA-CP services on Kubernetes cluster can fail based on several reasons.

    Few known issues as represented and seen in Infrastructure Automation logs and UI extracts are:

    • Helm API failed: release: <service_name> not found
    • Helm API failed: release name <service_name> in namespace <namespace_name> is already in use
    • Release "service_name" failed: etcdserver: leader changed
    • Failed to deploy <service_name>-readiness

    Uninstall the failed service manually and perform Resynchronisation from Infrastructure Automation. For details on uninstallation of failed service, see the Troubleshooting section of the VMware Telco Cloud Deployment Guide.

  • Site pairing of VMware Telco Cloud Automation Control Plane with VMware Telco Cloud Automation on RDC or CDC workload fails if VMware Telco Cloud Automation activation is not done on CDC.

    Infrastructure Automation task Configure TCA-CP services on Kubernetes cluster. The Site-Pairing issue can happen based on several reasons. Some of the known issues are:

    • etcdserver leader change
    • etcdserver timeout
    • Socket-timeout issue

    Perform Resynchronisation from Infrastructure Automation.

  • Allows the Get API calls on tca-bootstrapper VM after migration.

    VMware Telco Cloud Automation allows Get API calls on the tca-bootstrapper VM after Infrastructure Automation successfully completes migration.

    No Workaround.

  • Allows the get/post/put/patch/delete API calls to VMware Telco Cloud Automation or VMware Telco Cloud Automation Control Plane in clusters after migration.

    VMware Telco Cloud Automation or VMware Telco Cloud Automation Control Plane in clusters allow get/post/put/patch/delete API calls after the infrastructure automation successfully compltes the migration.

    No Workaround.

  • RDC deployment fails with error TCA-CP Config Failed. 'port'

    At times, RDC deployment fails with error TCA-CP Config Failed. 'port'.

    vRealize Orchestrator (vRO) does not respond correctly. Due to this, the API does not receive the right version details.

    Reboot the guest operating system of the vRO VM and re-synchronize the RDC provisioning.

  • When you add a pre-deployed domain without using appliance overrides, state of domain shows provisioned.

    When adding a pre-deployed CDC/RDC domain, if you do not specify the appliance override for vCenter appliance, it does not do any validation on datacenter in the vCenter identified in global settings and goes to provisioned state.

    Provisioning a child cell site group domain fails as the parent domain does not contain the datacenter name.

    Delete the CSG domain and the bringup spec file of the parent pre-deployed domain. Update the parent domain with correct vCenter appliance overrides and resync. Re-create the CSG domain.

  • Applying the Hostconfig profile task for firmware upgrade, failing with error "503 Service Unavailable".

    In Host Config Profile, providing an URL not exe/xml extension for firmware location fails with a cryptic error.

    The operator webhook validation fails for such input.

    Editing the host config profile with correct URL format and re-syncing works.

  • Not providing passwords for appliance overrides during domain modification is successful but fails during provisioning.

    When making appliance overrides in CDC/RDC management/workload domains, we can choose to not provide the passwords for the appliances (Root, Admin, Audit) and save the changes. The infrastructure automation accepts and saves the changes. But the provisioning of the domain fails with a non-user friendly message.

    When you override the value of password for any appliance, ensure that you provide a password.

  • Domain can be provisioned without any global settings, but won't proceed further, status not conveyed in user interface of infrastructure automation.

    You can provision a domain without global settings. However, the provisioning does not happen and the infrastructure automation interface does not reflects the status.

    Do not provision a domain without configuring the global settings.

  • Cell site group provisioning with DNS server overrides does not work.

    The provisioning of cell site group with DNS override enabled fails.

    Do not use DNS override in the cell site group.

  • Manual upload of certificate for CDC deployment in cloud native based deployments.

    In a cloud native environment, CDC deployment requires the user to manually upload the certificate for provisioning to start.

    Upload the certificate manually before you begin the CDC domain deployment.

  • You cannot update a provisioned domain due to vSAN NFS node count mismatch.

    If you have more vSAN NFS nodes than hosts in a domain, the DNS lookup for the extra vSAN NFS node fails.

    If you have enabled vSAN NFS and you add or removes hosts in CDC/RDC, update the vSAN NFS node count in the appliance overrides section.

Common

  • VMware Tanzu Kubernetes Grid cluster installation fails with error "message etcdserver: request timed out".

    This is an ETCD performance issue when disk or network latencies are too high. Error messages such as etcdserver: request timed out or etcdserver:leader changed are displayed in the internal Tanzu Kubernetes Grid services, and the cluster becomes unstable.

    Run disk IO benchmark with a tool such as Fio to check the disk IO performance. Check the network latency and tune the two parameters according to ETCD official doc: https://etcd.io/docs/v3.5/tuning/.

    heartbeat-interval    # default 100ms
    election-timeout      # default 1000ms

    There are two ways to tune these parameters:

    Method 1: For existing deployment with ETCD pods, SSH to one of the control plane nodes of the cluster, edit the file /etc/kubernetes/manifests/etcd.yaml, and update the two parameters of the - command section to:

    - --heartbeat-interval=500
    - --election-timeout=5000
    

    The ETCD pods will restart with the new values.

    Method 2: On a new setup, before creating the cluster, add above the following values to the file /root/.config/tanzu/tkg/providers/infrastructure-vsphere/v0.7.10/ytt/base-template.yaml of the TCA-CP VM or the Bootstrapper VM:

    Section of the file:

    .spec.kubeadmConfigSpec.clusterconfiguration.etcd.local.extraArgs

    Parameters and values:

    heartbeat-interval: '500'
    election-timeout: '5000'

Cluster Automation

  • Cluster Creation fails as kubelet crashes with "error: open /var/lib/kubelet/config.yaml: no such file or directory".

    When creating, upgrading, or scaling out a cluster, VMware Tanzu Kubernetes Grid creates new nodes, and sometimes the new node is stuck in the NotReady status and error messages are captured in the /var/log/clould-init-output.log file.

    1. Switch to the Management cluster context.
    2. Restart the following pods:
      • capi-kubeadm-bootstrapper-controller
      • capi-kubeadm-control-plane-controller
      • capi-controller-manager through kubectl delete pod
  • VMware Tanzu Kubernetes Grid fails to delete the management cluster.

    If you force-delete the management cluster and create a new management cluster with same cluster name, the following error is displayed: "configuration validation failed: cluster name mgmt1 matches another management cluster".

    To remove the orphaned cluster:

    1. Delete the cluster node VMs from vCenter.
    2. ssh to TCA-CP VM with root user.
    3. Use 'docker ps' command to find the the VMware Tanzu Kubernetes Grid (TKG) kind cluster container.
    4. Use 'docker rm -f <container id>' command to delete the TKG kind cluster container.
    5. Edit the /root/.config/tanzu/config.yaml file and delete the managementClusterOpts section for the orphaned management cluster:

    For example:

    [root@10 ~/.config/tanzu]# cat config.yaml
    apiVersion: config.tanzu.vmware.com/v1alpha1
    clientOptions:
      cli:
        repositories:
        - gcpPluginRepository:
           bucketName: tanzu-cli
           name: core
        - gcpPluginRepository: 
          bucketName: tanzu-cli-tkg-plugins
          name: tkg
    current: mgmt-cluster-1-9-1
    kind: ClientConfig
    metadata:
      creationTimestamp: null
    servers:
    - managementClusterOpts:     context: mgmt-cluster-1-9-1-admin@mgmt-cluster-1-9-1     path: /root/.kube-tkg/config   name: mgmt-cluster-1-9-1   type: managementcluster
  • VMware Tanzu Kubernetes Grid cluster deletion hangs.

    VMware Tanzu Kubernetes Grid cluster deletion hangs and causes frequent vCenter deletion tasks due to some nodes in disconnected hosts.

    Remove the resources that are in the namespace to remove the cluster:

    1. Run the command: kubectl api-resources -o name --verbs=list --namespaced | xargs -n 1 kubectl get --show-kind --ignore-not-found -n {cluster name}

      This command fetches all the current resources that you want to delete.

    2. To view all the resources in the cluster namespace, run: kubectl edit {resource type} {resource name} -n {cluster name}
    3. To force-delete the resource, delete the finalizer part of the resource.
    4. Remove the cluster namespace.
  • VMware Tanzu Kubernetes Grid cluster is in Configuring state and does not support the Put Addon operation.

    For an existing VMware Tanzu Kubernetes Grid cluster, if one of the nodes is in the NotReady state, then the status of the cluster will be Configuring and does not support the Put Addon operation. That is, it does not support the configuration of add-on plug-ins.

    1. Log in to the target cluster and run [kubectl get node -owide]
    2. Verify if any node is in the NotReady state.
    3. If the node IP address cannot be accessed, detect the reason from the vCenter.
    4. If the node IP address can be accessed, log in to this node and check the status of the kubelet service and containerdservice.

  • Node Pool customizations status is displayed as Unknown for workload clusters.

    After upgrading to VMware Telco Cloud Automation 2.0 and upgrading the management cluster, the Node Pool customizations status is displayed as Unknown for workload clusters on which the Upgrade Add-On is not run.

    Run the Upgrade Add-On option on the workload cluster.

  • VMware Telco Cloud Automation sends CPU pinning to ENS-prepared hosts, which may conflict with lcores reserved by ENS for packet processing.

    If isNumaConfigNeeded is present in the CSAR and you select a Non-ENS network, and the underlying ESXi host has any other pnic that is ENS-enabled, VMware Telco Cloud Automation tries to perform CPU pinning which may conflict with the lcores reserved for the ENS.

    Do not use isNumaConfigNeeded when hosts are connected to multiple DVS of which some DVS are ENS prepared.

  • Management cluster deletion fails.

    1. Create a Management cluster.
    2. Delete the Management cluster. The operation fails.
    3. Switch to kind cluster context using kubectl and check the resource. The cluster delete operation hangs and vspheremachine CR is not deleted.
    # kubectl --kubeconfig ~/.kubetkg/tmp config_<random chars> get cluster -n tkg-system
    NAME                PHASE
    nodepool-mc-smoke   Deleting
    # kubectl --kubeconfig ~/.kubetkg/tmp config_<random chars> get machine -n tkg-system
    No resources found in tkg-system namespace.
    # kubectl --kubeconfig ~/.kubetkg/tmp config_<random chars> get vspheremachine -n tkg-system
    NAME                         AGE
    nodepool-mc-worker-0-lgtgq   16h

    Delete the corresponding vspheremachine CRs manually.

    # kubectl --kubeconfig ~/.kubetkg/tmp config_<random chars> delete vspheremachine nodepool-mc-worker-0-lgtgq -n tkg-system
  • Management cluster upgrade from VMware Telco Cloud Automation 1.9.5.1 to 2.0 fails.

    Management cluster upgrade from VMware Telco Cloud Automation 1.9.5.1 to 2.0 fails sometimes due to "capi-webhook-service" connection refused.

    Retry Management cluster upgrade. Upgrade will work on subsequent retries.

  • CaaS cluster templates show older versions when there are no TCA-CP connected.

    For a freshly deployed VMware Telco Cloud Automation environment that has no TCA-CP connected, the versions that are displayed are: 1.18.17, 1.19.9, and 1.20.5.

    Add a TCA-CP (vSphere-based) as a VIM to VMware Telco Cloud Automation. The versions shown now align with version 2.0 correctly: 1.19.12, 1.20.8 and 1.21.2.

  • No IP when a worker node is restarted by CNF customization or manually.

    The worker node IP is lost when the node VM is rebooted.

    Restart the CPI pod by running the kubectl command: kubectl rollout restart daemonset/vsphere-cloud-controller-manager -n kube-system.

  • Node pool customisations status displays as Unknown for workload clusters on which Upgrade Add-On is not run.

    After upgrading to version 2.0 and upgrading the management cluster, the node pool customizations status is Unknown for workload clusters on which Upgrade Add-On is not run.

    Run the Upgrade Add-On option on the workload cluster.

Cloud Native

  • Cloud native TCA-CP only supports vCenter Server and vCloud Director infrastructure.

    Cloud native VMware Telco Cloud Automation Control Plane only supports vCenter Server and vCloud Director infrastructure.

    Platforms such as VMware Integrated OpenStack and Kubernetes are not supported even if you upgrade or migrate from older builds.

    No workaround.

  • Build version discrepancy between VM based and Cloud Native VMware Telco Cloud Automation.

    The build number in Cloud Native Platform Manager UI shows the Management Platform build instead of VMware Telco Cloud Automation build.

    No workaround.

  • Need to reserve the IP for TCA-Bootstrapper cluster for Day-X operations.

    As of in 2.0, any TCA-CP appliance that needs to support Kubernetes clusters (CaaS Clusters), needs a bootstrapper cluster for any operations on the Management cluster, for example, new Management cluster creation. This TCA-Bootstrapper cluster is created in Day-0 with a static IP, and its manifest and details are stored in VMware Telco Cloud Automation and then deleted. Later in Day-X, for any operation on CaaS Management cluster, this same details of the TCA-Bootstrapper cluster are used to recreate the TCA-Bootstrapper cluster and then perform operation on the CaaS Management cluster.

    Consider a scenario, on Day-X, the static IP of the TCA-Bootstrapper cluster's master node is now being used by any other VM, then the recreation of the TCA-Bootstrapper cluster would fail and subsequently block any operations on any Management cluster.

    Reserve the static IP that is used to create the TCA-Bootstrapper cluster.

  • WLD domain deployment failure.

    WLD domain deployment fails during VMware Telco Cloud Automation Control Plane deployment with error Failed to deploy TCA CLUSTER 69247203-c518-4bdf-95e4-1e77cf3d078d. Reason: compatibility file (/root/.config/tanzu/tkg/compatibility/tkg-compatibility.yaml) already exists.

    The WLD domain deployment failure happens across all domains (CDC/RDC of management/workload).

    Run the following commands:

    curl -k -XGET --user "admin:tca-password" https://tca-ip-fqdn:9443/api/admin/clusters?clusterType=MANAGEMENT"
    curl -k -XDELETE --user "admin:tca-password" "https://tca-ip-fqdn:9443/api/admin/clusters/85f5b151-5a52-46e8-b1c0-c24ea1cfc956?clusterType=MANAGEMENT"
    curl -k -XGET --user "admin:tca-password" https://tca-ip-fqdn:9443/api/admin/clusters/85f5b151-5a52-46e8-b1c0-c24ea1cfc956/status?clusterType=MANAGEMENT"
    curl -k -XGET --user "admin:tca-password" https://tca-ip-fqdn:9443/api/admin/clusters/85f5b151-5a52-46e8-b1c0-c24ea1cfc956/status?clusterType=MANAGEMENT"
    curl -k -XDELETE --user "admin:tca-password" "https://tca-ip-fqdn:9443/api/admin/clusters/85f5b151-5a52-46e8-b1c0-c24ea1cfc956?clusterType=MANAGEMENT&forcedDelete=true"
  • Tech support bundle does not contain CaaS cluster logs.

    CaaS logs are missing from the tech support bundle.

    Get the pod logs from your workload cluster using this script:

    • kubectl
    • default kubeconfig set to the correct cluster
    • tar and gzip
    • write access to /tmp

    #!/bin/bash
     
    set -f
    IFS='
    '
    PODS=$(kubectl -n tca-system get po --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name --no-headers)
    SUFFIX=$(echo $RANDOM)
    OPDIR=/tmp/cluster_logs_${SUFFIX}
    for POD in ${PODS}
    do
      IFS=' ' read -r -a array <<< "$POD"
      echo "Getting logs for pod ${array[1]} in namespace ${array[0]}"
      log=$(kubectl logs -n ${array[0]} ${array[1]} --all-containers)
      mkdir -p ${OPDIR}/${array[0]}
      echo "${log}" > ${OPDIR}/${array[0]}/${array[1]}
    done
     
    tar -C /tmp -cvzf /tmp/cluster_logs_${SUFFIX}.tar.gz cluster_logs_${SUFFIX}
    rm -rf ${OPDIR}
    set +f
    unset IFS

    This will gather logs from all pods in all namespaces and archive and compress them into a single file.

  • Performance Monitoring for VDU does not list the data.

    Graphical view of the performance metrics for CPU, Network, Memory, and Virtual Disk is not present. UI will display: No data available.

    No workaround.

  • No Mongo dump in Tech support bundle for TCA-CP only deployment.

    Based on the deployment type, change the input.yaml spec file.

  • Cloud native backup and restore is not fully operational in version 2.0.

    No workaround.

AirGap

  • The error: /photon-reps/updates/photon-updates/.repodata/ already exists! is displayed.

    This issue may occur during build metadata of the Photon OS repository after all packages are synced to local. The command createrepo verifies if there is a temp folder named .repodata under the repo folder. If the folder is found, createrepo considers that another createrepo session is running (but actually not) and will exit.

    Remove the folder .repodata and all its contents and retry repo sync operations.

vCloud Director

  • VNF Heal fails for vApp having a VM in suspended state.

    In vCloud Director 10.3, the VNF Heal operation fails for vApp having a VM in suspended state.

    No workaround.

Appliance

  • Login to VMware Telco Cloud Automation fails after upgrading from version 1.9.1 to version 2.0.

    This is not reproducible and may occur in rare scenarios of upgrade.

    1. Log in to VMware Telco Cloud Automation.
    2. Switch to root, and open the file /usr/local/apache2/conf/httpd.
    3. Look for duplicate entries of "Listen *:80".
    4. Delete the duplicate entries and keep only one entry for "Listen *:80".
    5. Start httpd service using the command "systemctl start apache-httpd".

License Computation

  • CNFs with no resource details in the Helm chart are treated as zero usage.

    License computation engine relies on the presence of the resource details for calculating details of consumption.

    For example:

    spec:
          containers:
          - env:
            ...
            image: mysql:5.7.28
            imagePullPolicy: IfNotPresent
            livenessProbe:
              ...
            name: mysql-162-34480-ylks3
            ports:
            ...
            readinessProbe:
              ...
            # Resource definition for container
            resources:
              requests:
                cpu: 100m
                memory: 256Mi

    CNFs with no resource details in Helm Chart are treated as zero usage.

    As a best practice, mention the resources in the Helm charts.

CSI: multi-zone

  • CSI Multizone configuration - Multizone labels are not set on nodes.

    When multi-zone is enabled on a workload cluster, the cluster's multi-zone configuration is synchronized but the zone/region label is not applied on the nodes. This can fail PV creation. The root cause is CSI daemonset is not rebooted after changing the configuration.

    Disable multi-zone from the VMware Telco Cloud Automation UI and API, and enable it after after a while.

Workflow-Schema

  • Workflow Schema limitations.

    Limitations:

    1. User must not add or delete fields to the steps (both inbindings and outbindings), except for vRO workflow. For vRO Workflow, inbindings cannot be deleted but they can be added.
    2. Every workflow must have at least one step at any point of time.
    3. User must be careful while modifying workflow JSON file and make sure that they are making valid changes.
    4. Users must not make random modifications because the errors thrown are not appropriate.

    No workaround.

User Interface

  • VMware Telco Cloud Automation UI issues.

    At times, the UI does not load correctly.

    Perform a browser refresh.

Catalog Management

  • Onboarded NS virtual link issue.

    Editing an onboarded NS (that has VNFs within it) and changing the VNF's virtual link configuration does not work. The changes made will be lost and the connection to virtual links will be reverted to its earlier state.

    No workaround.

  • Network Service upload limitation.

    While Uploading NS CSAR, ensure that the name given during on-boarding is same as the NS package name.

    Substitution Mappings issue:

    This issue is caused by uploading an NS package and assigning a new name that is different than the NS template node name used in NSD.yaml.

    Upload NS package with new name > Edit Catalog > Save & Update Package throws the error "Http failure response for https://tca-ip-fqdn/telco/api/nsd/v2/ns_descriptors/e77588f9-190d-4f72-b43b-573315616b7b/nsd_content/view/nsd_92464899-f3f3-4d95-8f55-12513c92ebbf/update: 400 OK"

    Clicking Save as New throws error “There is an error in NSD.yaml. unknown tag !<tag:yaml.org,2002:js/undefined> at line 16, column 62: ... g:yaml.org,2002:js/undefined> '' ^”.

    No Workaround.

  • Edit Catalog operation on uploaded old / new NS CSAR error.

    When you perfrom an Edit Catalog operation on uploaded old / new NS CSAR error, the following errors are displayed:

    • "Unable to find VNF Catalog with descriptor id 830307a9-fe29-4da7-adaa-26d812f15bc4” for VNFs inside NS.
    • “JSONObject["nsId"] not found.” For NS inside NS.

    Ignore the error and go ahead with editing the NS CSAR. The Update Package and Save As New operations are working as expected.

CNF Lifecycle Management

  • CNF instance should not have Heal and Scale to Level in the list of available operations.

    In VMware Telco Cloud Automation 2.0, CNF instance have Heal and Scale to Level in the list of available operations. These two operations needs to be ignored as they are invalid for a CNF.

    Ignore the operations.

Kubernetes Bootstrapper

  • Pods crash after a node pool reboot.

    Following a node pool VM restart with VMware Telco Cloud Automation 2.0, some pods are observed to stay in the crash/unknown/pending state.

    Delete the pod to let it restart. If the issue is not resolved, recreate the worker node cluster.

  • VMconfig-operator pod runs out of memory.

    Creating worker node cluster fails with error: failed calling webhook "defaulter.vmconfig.acm.vmware.com".

    1. On the Management cluster, verify if the vmconfig-operator pod is running.
    2. If the vmconfig-operator is terminated because of running out memory, manually increase the pod memory size to 460 MB on the Management cluster. (“kubectl edit deploy vmconfig-operator -n tca-system”).
  • The hostconfig service is in "Stopped" state after VMware Telco Cloud Automation Control Plane (TCA-CP) is installed. It fails to start from the TCA-CP web GUI.

    When you add a new host profile, the configuration may not get applied to the host. A re-sync operation may also not help apply the host profile.

    Root Cause:

    The hostconfig service is in Stopped state and the host profile config is not synced, though task shows successful in VMware Telco Cloud Automation GUI.

    Log in to TCA-CP and verify/start the hostconfig service.

  • When using the system cgroup driver in VMware Telco Cloud Automation, an issue in Kubernetes v1.21.2 freezes the control groups.

    When creating clusters using Kubernetes v1.21.2 with VMware Telco Cloud Automation, the kubelet freezes the control groups affecting the performance of real-time RAN workloads running in those clusters.

    Use Kubernetes v1.20.8 in VMware Telco Cloud Automation 2.0.

Resolved Issues

Upgrade

  • Slow download speed for the VMware Telco Cloud Automation upgrade package.

    Users experience slow download speed when trying to download any service update through Administration > System Upgrades.

    Download the upgrade bundle from the VMware Customer Connect portal and use Appliance Management 9443 to upgrade the VMware Telco Cloud Automation appliance.

Security Fixes

The following security issues were fixed in VMware Telco Cloud Automation version 2.0.

Catalog Management

  • Upgraded Spring boot from 1.4.6 to 2.5.4

Base Image Package

  • Upgraded curl to the latest version in the baseimage.

  • Updated tcpdump 4.9.2 to latest version.

  • OpenSSL upgraded to version 1.1.1l from 1.1.1k.

Vulnerability Testing

  • Upgraded Python from 3.6 to 3.9.2.

  • Enforced correct user permissions when accessing registered Partner Systems.

Security

  • Added support for TLSv1.3.

General

  • Upgraded JDK to latest 11.0.13.

  • Updated apache httpd in VM-based and cloud native-based deployment.

check-circle-line exclamation-circle-line close-line
Scroll to top icon