times

The release notes cover the following topics:

About VMware Telco Cloud Operations
What's New in this Release
Known Issues
Resolved Issues

About VMware Telco Cloud Operations

VMware Telco Cloud Operations is a real-time automated service assurance solution designed to bridge the gap between the virtual and physical worlds. It provides holistic monitoring and network management across all layers for rapid insights, lowers costs, and improved customer experience. Powered by machine learning (ML) capabilities, VMware Telco Cloud Operations automatically establishes dynamic performance baselines, identifies anomalies, and alerts operators when abnormal behavior is detected.

VMware Telco Cloud Operations simplifies the approach to data extraction, enrichment, and analysis of network data across multi-vendor environments into actionable notifications and alerts to manage the growing business needs of Telco in an SDN environment.

For information about setting up and using VMware Telco Cloud Operations, see the VMware Telco Cloud Operations Documentation.

What's New in this Release

The VMware Telco Cloud Operations v1.4.0.1 introduces the following enhancement:

Improved protection against Log4j vulnerability:
- Updated Apache Log4j to version 2.16 to resolve CVE-2021-44228 and CVE-2021-45046.
- Updated Apache Log4j version to 2.17.0 in Elasticsearch component to resolve CVE-2021-44228, CVE-2021-45105, and CVE-2021-45046, in VMware Smart Assurance - Service Assurance Manager (SAM).

For information about system requirements, hardware requirements, patch installation, and sizing guidelines, see the VMware Telco Cloud Operations Deployment Guide.

Resolved Issues

Remote code execution vulnerability CVE-2021-44228 and CVE-2021-45046 in Apache Log4j.
The security vulnerabilities https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228 and https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-45046 have been determined for affected components in VMware Telco Cloud Operations.

Post 1.4.0.1 patch upgrade, you need to follow the below procedure:

In the upgraded system the administrator needs to start or re-deploy any stream processing pipelines that were running prior to the upgrade. These include any Enrichment, KPI, Alerting, and/or Analytics stream definitions. From the Administration UI, you can navigate to the "Administration" tab and select any of the menus for the respective definitions.

Uninstalling of 1.4.0.1 patch is not recommended, since it includes fix for critical security vulnerability in the Apache Log4j.
Remote code execution vulnerability CVE-2021-44228, CVE-2021-45105 and CVE-2021-45046 in Apache Log4j on SAM (Elastic Search Module).
The security vulnerabilities https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228, https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-45105 and https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-45046 have been determined for affected components in VMware Smart Assurance Service Assurance Manager (SAM).

It's highly recommended to upgrade the VMware Smart Assurance Service Assurance Manager (SAM) 10.1.5 to patch 10.1.5.5.
Uninstalling of VMware Smart Assurance Service Assurance Manager (SAM) 10.1.5.5 patch is not recommended, since it includes fix for critical security vulnerability in the Apache Log4j.

Note: For more information on installation and uninstallation of patch, see VMware Smart Assurance 10.1.5.5 GA Patch Release Notes.

Known Issues

The known issues are grouped as follows.

Known issues from previous release
Known issues in 1.4.0

Known issues from previous release

A possible cause for the deployment to fail is when you use the automated deployment tool.
When you deploy VMware Telco Cloud Operations using the automated deployment tool, the deployment of the worker node may fail with the error: Failed to send data.

Workaround: Modify the VCENTER_IP configuration parameter in the deploy.settings file to use the fully qualified domain name (FQDN). For more information about modifying the deploy.settings file, see the VMware Telco Cloud Operations Deployment Guide.
VMware Telco Cloud Operations currently does not support connections to SAM server with broker authentication, EDAA authentication, and Edge Kafka authentication.

For a workaround, see the Security Recommendation section in the VMware Telco Cloud Operations Deployment Guide.

Note: EDAA related operations including the Acknowledge, Ownership, Server Tools, Browse Details > Containment and Browse Details > Domain Manager are not supported when Smarts Broker is configured in secure mode.
When the number of hops of connectivity is increased, you may experience performance issues in the topology maps.
There might be performance issues in the rendering of Redundancy Group, MPLS, Metro-E, and SDN connectivity map types in the Map Explorer view. This issue is observed on deployments with a complex topology where the topology maps may stop working when the number of hops of connectivity is increased.
Broker failover is not supported in VMware Telco Cloud Operations.
Primary Broker fails in the Smart Assurance failover environment.

Workaround: Currently when a Broker (multi-broker) failover happens in Smart Assurance, then it requires a manual intervention where you need to log in to VMware Telco Cloud Operations and change the Broker IP address to point to the new Broker IP.
Procedure:
1. Go to https://IPaddress of the Control Plane Node.
2. Navigate to Administration > Configuration > Smarts Integration
3. Delete the existing Smarts Integration Details.
4. Re-add the Smarts Integration Details by pointing it to secondary Broker.
Statistics - Tunnel reports for SDWAN displays unknown elastic error, if the specific device is not selected in Edge filter.

Workaround: To avoid the error, remove ALL option for Edge.

Procedure to disable the ALL option: Statistics Tunnel > Dashboard Settings > Variables > Edge > Disable Include All option.
When Smarts is restarted without repos for multiple times, the Viptela ControlNode controller status is going to OTHER/UNKNOWN state.

Workaround: Use below command in control plane node to delete the respective Viptela stale collectors:

kubectl delete deployments.apps <viptela deployment app instance>
VMware Telco Cloud Operations Health Status Pod report displays empty value for some pods. They indicate that some pods ran for sometime, consumed some CPU and Memory resources, but no longer exist.

Workaround: To select a small range, you can go to the Gear icon on the top right of the reports and uncheck the option Hide time picker and go back to the reports.
Weekly indexes are not displayed while creating custom reports, only daily and hourly index are shown part of reports.

Workaround:
1. Select Configurations > Data Sources from the left side menu bar
2. Click Add Data Source.
3. Select Elasticsearch.
4. Enter relevant name based on the metric-type for which the weekly index needs to be created (for example: Week-Network-Interface) and the Elastic http url as http://elasticsearch:9200, refer any other VMware Telco Cloud Operations data sources
5. Enter Index Name based on the metric type for which the weekly index needs to be create ([vsametrics-week-networkinterface-]YYYY.MM) and select Pattern "Monthly"
6. Enter the Time Field Name timestamp and Version 7+.
7. Keep the rest of the fields to default value.
8. Click Save & Test.
Notification count mismatch between SAM and VMware Telco Cloud Operations UI due to non-filtering of notification with Owner field set to SYSTEM. By default in VMware Telco Cloud Operations there are no filters set. 

Workaround: Manually apply the filter to remove notifications with Owner field not containing SYSTEM in VMware Telco Cloud Operations Notification Console window by following below steps:
1. Go to Default Notification Console.
2. Click Customize View.
3. Go to Filters and provide Filter Set Name, for example Filterout SYSTEM Notifications.
4. Filter Section Add Attribute with below condition:
  Property = Owner
  
  Expression = regex
  
  Value = ~(SYSTEM+)
5. Click Update.
Verify the Default Notification Console has only those notifications whose owner not set to SYSTEM. The default notification count must match between SAM and VMware Telco Cloud Operations UI.
Netflow-9 Statistics, Netfow-9 Trends, Netflow-5 Statistics, and Netflow-5 Trends reports display error message - Failed to parse query with the Default Time interval of 3 hours.

Workaround: You need to select smaller time intervals. For example: 15 minutes, 30 minutes, 1 hour, etc.
When the Kafka server is configured to a wrong IP or the Kafka node goes down during discovery, then the Velocloud discovery hangs for 20 minutes before exiting the discovery. This is the case even when the messagePollTimeout of the VCO Access setting is set to a lower value.

Workaround: In the esm-param.conf file add the below line replacing the <kafka ip address> and <time in seconds>, and restart the server.

MessagePollTimeoutPeriodInSeconds-<kafka ip address> <time in seconds>
The Containment, Browse detail, Notification Acknowledge/Unacknowledge does not work when the primary Tomcat server fails in a Smart Assurance HA environment.
In a Smart Assurance Failover deployment, when the primary Tomcat fails, the UI operations including the Notification Acknowledgement, Containment, Browse Detail, and Domain Managers fail.

Workaround: When the primary Tomcat instance fails in a Smart Assurance failover environment, then you can manually point the VMware Telco Cloud Operations to a secondary Tomcat instance.
Procedure:
1. Go to https://IPaddress of the Control Plane Node.
2. Navigate to Administration > Configuration > Smarts Integration
3. Delete the existing Smarts Integration Details.
4. Re-add the Smarts Integration Details by editing the EDAA URL and pointing it to the secondary Tomcat Instance.
An error message appears in Grafana report.
When user logs out from Operational UI and tries to launch report from Grafana user interface, an error message appears.

Workaround: Refresh or relaunch Grafana UI to logout.
The SDWAN Flow Top N Summary reports displays an error message.
In case of SDWAN Flow Top N Summary report, the Grafana Bar Gauge widget does not support substantial time intervals.

Workaround: You need to set smaller time interval (24 hour) for the flow reports. Follow the procedure to set the substantial time interval:
1. Click Edit from the report.
2. Expand the Interval in the last row (Date Histogram) of query, and set it to higher interval like (7d or so on).
3. Save the report.
The SAM server is getting listed in the Domain Manager section instead of Presentation SAM section.
During Smart integration and configuration, INCHARGE SA (SAM server) is getting listed in the Domain Manager section. This problem occurs only when, the SAM server is started in Non-EDAA Mode.

Workaround: To get listed under Presentation SAM section, start the SAM server in EDAA Mode.
Disk usage is not mentioned in the VMware Telco Cloud Operations Health Status Node report.
In Health Status Node Report, the disk usage is not specified for which kubernetes cluster node (Controlplane, Arango, ElsticSearch, Domain Manager, Kafka, and so on) the disk is used.

Workaround:
1. Click Edit in the panel of Disk usage.
2. Click Field tab.
3. Click Display Name and No value fields (no need to enter any value).
  Node names appear.
4. Click Save and Apply.
Some of the DataCenter Summary reports are taking longer time to display.
On 100k footprint with 10 Million records sent per polling, the DataCenter reports are taking more than usual time to display

Workaround: Perform the following procedure on report side:
1. Reduce the default time interval from 24hr to 12hr or 6hr.
2. If the issue still persists, point the datasource to hourly index for the panel which is showing the error.
Topology pod is down, due to Redis service failure.
In one of the 100k deployment, Topology pod is down due to redis service failure, and notification sync in VMware Telco Cloud Operations is very slow.

Workaround: Following procedure can be applied to restart Redis cluster and restart dependent services. On control plane node perform below steps:
1. Scale down events pods using command:(kubectl scale deployment <events_POD> --replicas=0)
2. Scale down topology pods using command:(kubectl scale deployment <topology_POD> --replicas=0)
3. Delete Redis deployment using command:(kubectl delete deployment redis)
4. cd to /home/clusteradmin/kubernetes
  - kubectl apply -f redis.yaml.
5. Once Redis comes up, Scale up Topology and Events Pods.
  - kubectl scale deployment <events_POD> --replicas=1
  - kubectl scale deployment <topology_POD> --replicas=1
Security Vulnerability
CVE-2021-3449 -- An OpenSSL TLS server may crash if sent a maliciously crafted renegotiation ClientHello message from a client. If a TLSv1.2 renegotiation ClientHello omits the signature_algorithms extension (where it was present in the initial ClientHello), but includes a signature_algorithms_cert extension then a NULL pointer dereference will result, leading to a crash.
When Arangoworker node hosting the Flink services (Job Manager/Task Manager) goes down, Ingestion of Topology, Metrics, and Events might not work correctly.
Flink services are not deployed in HA mode. If an Arango worker node goes down, the enrichment service might not be fully operational, which results in ingestion services to stop processing until the node is restored.

Workaround: You need to bring up the Arango worker node and restart the enrichment streams post the node is up. Refer, VMware Telco Cloud Operations Troubleshooting Guide for more information.
Authorization error message appears in html code, when user does not have Grafana edit permission.
When a Role is created for a user with only "Dashboard & Reporting" view permission, and the user attempts to edit any Dashboard or Reporting settings in Grafana, the authorization error appears in html format .

Known issues in 1.4.0

User limitation for adding classes/relationship through dynamic model
There is an existing limitation in VMware Telco Cloud Operations 1.4.0, that disallows the creation of collections beyond 2048. Based on the existing collection count, the behavior of the Topology Explorer becomes unpredictable when more than 35 collections are introduced.

Workaround: While adding classes and relationships through Dynamic Modelling, the cumulative count must not exceed 35.
After patch upgrade, Grafana reports landing page does not displayed.
Grafana reports are not accessible after patch upgrade and this issue is inconsistent.

Workaround: To view Grafana reports, re-run the deploy-reports.sh script in the Grafana pod using below procedure:
1. Log into control-plane node.
2. Find one Grafana pod using command: kubectl -n vmware-smarts get pods | grep grafana
3. Log into Grafana pod using command: kubectl -n vmware-smarts exec -it <POD_NAME> -- bash
4. Deploy reports using command: sh tco-reports/deploy-reports.sh