VMware Telco Cloud Service Assurance 2.3.1 Release Notes

VMware Telco Cloud Service Assurance 2.3.1 \| 29 FEB 2024 Check for additions and updates to these release notes.

VMware Telco Cloud Service Assurance 2.3.1 | 29 FEB 2024

Check for additions and updates to these release notes.

About VMware Telco Cloud Service Assurance

VMware Telco Cloud Service Assurance is a real-time automated service assurance solution designed to holistically monitor and manage complex virtual and physical infrastructure and services end to end, from mobile core to the RAN to the edge. From a single pane of glass, VMware Telco Cloud Service Assurance provides cross‑domain, multi‑layer, automated assurance in a multi‑vendor and multi‑cloud environment. It provides operational intelligence to reduce complexity, perform rapid root cause analysis and see how problems impact services and customers across all layers lowering costs and improved customer experience.

For information about setting up and using VMware Telco Cloud Service Assurance, see the VMware Telco Cloud Service Assurance Documentation.

What's New

VMware Telco Cloud Service Assurance release 2.3.1 brings together various features and enhancements across platforms, networking, and virtual infrastructure management. This release introduces the following features:

Notification Console is enhanced to reorder the Notification Details Tabs based on the users need from the notification panel.
Alerting feature is now enabled with Multiple threshold conditions supported by different severity levels of the alarm.

Users can now create alarms with different severity conditions based on different threshold conditions or different time intervals for particular threshold levels.
Alarms creation is now enabled with Clear condition option where a user can define a condition to clear an alarm in the same definition where the alarm triggers.
User Defined Fields (UDF) in the alarms are now enhanced with a combination of multiple properties, tags, and static data, which enables users to provide all the required details in User Defined Fields. The UDF in creating Alarm Definition is enabled with a combination of static values and multiple tags, properties, and static data like description, which helps the user to provide all the details, like Property (Router/Host) is located in Tag (tag.Location) with value Tag (tag.Details) in the User Defined Fields.
Alarm definition is now enabled with an EDIT option for filters as well.
vCenter Collector is now available out of the box to collect performance data from configured vCenter endpoints.
EDAA is enabled by default in all the service definitions of the domain managers (IP, SAM, ESM, NPM, and MPLS).
SAM Console (Linux) is enhanced to launch in GUI mode (using third party GUI launcher).
New device certifications are added in the IP domain manager.
Block Storage is now supported in VMbased Deployment.
VMbased Deployment is now supported on RHEL 8.0.

Fixed Issues

Alarms and Anomaly is not allowing static and user defined metrics to define alarms and anomaly.
Alarm is not getting generated with range regex filter "<0-9>".

Known Issues

Unable to delete the Domain Setting credentials of INCHARGE-OI.

When the user deletes and adds Smarts Integration or edit the Smarts Integration name, post that user is unable to delete the Domain Setting credentials of INCHARGE-OI.

Workaround:

Restart the required INCHARGE-OI with different name, and Add Domain Manager, then add another domain settings credentials.
Grafana tables and graphs are not updated to the new tables, it still shows TABLE_OLD in edit widget.

This does not have any functional impact, the tables shows a warning message when edit option is selected. User can still use the tables.
When the users log in to the VMware Telco Cloud Service Assurance user interface, the OpenSearch Exception appears when there is no data ingestion. This error disappears automatically after sometime.
Kubernetes upgrade fails, if the Control Plane and Worker Node orders are getting changed in vars.yaml file.

Workaround:

Retain the exact order which was used in the base Kubernetes cluster deployment during the upgrade as well.
Collectors deployed on "core" datacenter, JVM Memory and Time spent in GC Reports are not appearing.

Workaround:

There is not functionality impact. Few health report widgets are not shown related to JVM memory and Time Spent only when collector running in Core. If the collector runs in RDC these reports will be shown.
When the users log in to the VMware Telco Cloud Service Assurance user interface, the OpenSearch Exception appears when there is no data ingestion. This error disappears automatically after sometime.
Post Domain managers upgrade, EDAA is not enabled by default.

Post Domain Managers upgrade to 2.3.1.0, EDAA must be enabled in all DMs by default.

Standard behaviour where service definitions remain unchanged during upgrades to avoid potential modifications of custom service definitions. In fresh deployment EDAA is enabled by default.
The vRAN discovery and monitoring pod's name still have vRealize Operations (vROps) reference.
Service logs UI (Kibana) does not enforce dark theme.
The Kafka subscription and the monitoring are suspended post restart.

When only one Domain Manager is up and active in Failover environment and if user restarts that Domain Manager server, the Kafka subscription and the monitoring are suspended.

This is applicable for ESM and OI Servers where user collects the data through Kafka subscription for features like VMware Aria Operations discovery, VMware Telco Cloud Automation, and so on.

Workaround:

Set the flag suspendTopologyManager = FALSE in the bootend.conf file in the ESM and OI Server and then restart the server. Once the server is up, the monitoring and the Kafka subscription will be turned on.

Location of bootend.conf file in ESM Install : <ESM-Basedir>/smarts/local/conf/esm/bootend.conf

Location of bootend.conf file in OI Server: <SAM-Basedir>/smarts/local/conf/icoi/bootend.conf
Capability Discovery stopped, when user moves to another tabs while Capability Discover is running.

Workaround:

User must wait in Capability Discover tab until Capability Discovery completes.

In case if user moves to other tabs, the Capability Discovery will be aborted. But, you can revisit the Capability Discovery tab and click Discover Capabilities.
Anomaly definitions created in old version of VMware Telco Cloud Service Assurance 2.2 are not available when upgraded to 2.3.0.
New alert definitions created by the user in VMware Telco Cloud Service Assurance 2.2.0 are getting deleted post upgrade to 2.3.0.
BGP alarms failure is seen in co-relation of alarms.
Binutils component check on RHEL 9.x is failing.

Gdb is a debugging tool. The upgrade to the latest version was done to resolve the vulnerabilities in the older versions. The drawback here is that the latest version is incompatible with the other OS versions. This is applicable to all domain manager products.

Workaround:

From the other OS versions, Gdb can be copied from /usr/bin/gdb to /bin location when it is required.
RabbitMQ has an error in setup_rabbitmq.log file.
Below error is noticed in setup_rabbitmq.log.
```
Waiting for RabbitMQ to be available: ............................ OKMerging default configuration: Failed!204 No ContentDied at/opt/InCharge/SAM/smarts/perl/lib/setup_rabbitmq.pl line 207
```
Workaround:

This error will be seen only after installation. To avoid the error, restart the smarts-rabbitmq service.
User is unable to use Java 1.8 351 and above to launch Web Console.

The latest Java updates 8.0.351 and above, deprecate the use of SHA1 signed jars and treat them as unsigned jars.

Workaround:

Since, the web console utilizes the system's Java, and we have limited control over it, we recommend to use the lower version of Java at client side or apply the workaround mentioned in the KB article. This workaround does not impact the product.
Airflow application is not getting reconciled.

During deployment, the Airflow application is not getting reconciled.
Workaround:

If application reconciliation fails with the error Failed with reason BackoffLimitExceeded, then follow the below steps to recover from the failure.

Delete the airflow jobs, by running the following commands, post which the app reconciliation must succeed.
```
kubectl delete job airflow-run-airflow-migrations
kubectl delete job airflow-create-user
```
All the classes and instances are not visible in Topology Explorer/Topology Maps.

When more numbers of SAMs are added to the VMware Telco Cloud Service Assurance, the classes and instances are not visible in Topology Explorer/Topology Maps.
Workaround:

Execute the following script from the ArangoDB coordinator pod:
```
Kubectl get pods | grep arangodb-crdn
kubectl exec -it <<arangodb crdn pod>>  – bash
cd /opt/vmware/vsa/infra/install/Scripts
./viewUpdate.sh
```

Unable to recognize the base install during MPLS upgrade to 2.3.1.0.

This is applicable only when upgrade is performed from versions 10.1.7.

Workaround:

Open /var/.com.zerog.registry.xml file from a machine where smarts is installed and perform the following steps:

MPLS independent upgrade: Under <product> tag search for old MPLS installation and modify id from 28322b95-1f3b-11b2-bb9a-eb6ec3979369 to 28322b95-1f3b-11b2-bb9a-eb6ec3979370.

When NPM and MPLS installations are present in same directory, and incase user wants to upgrade NPM and then MPLS: Add below <product> tag content under <product> tag, just before MPLS upgrade:

<product name="MPLS" id="28322b95-1f3b-11b2-bb9a-eb6ec3979370" upgrade_id="db14898f-1f3a-11b2-a90c-eb6ec3979369" version="11.1.0.0" copyright="2019" info_url="VMware Inc" support_url="www.http://vmware.com" location="/opt/InCharge" last_modified="2022-07-29 05:02:08"><![CDATA[]]><vendor name="InstallAnywhere" id="2832bde7-1f3b-11b2-bb9a-eb6ec3979369" home_page="http://www.installanywhere.com" email="[email protected]"/><feature short_name="Application" name="Application" last_modified="2022-07-29 05:02:08"><![CDATA[This installs the application feature.]]><component ref_id="db1488e4-1f3a-11b2-a8c7-eb6ec3979369" version="1.0.0.0" location="/tmp/install.dir.4793/./devstat_err-javadoc.jar"/><component ref_id="db1488e3-1f3a-11b2-a8c8-eb6ec3979369" version="1.0.0.0" location="/opt/InCharge/MPLS/jre"/></feature></product>

When MPLS and NPM installations are present in same directory, incase user wants to upgrade MPLS and then NPM:Add below <product> tag content under <products> tag, just before MPLS upgrade:

<product name="MPLS" id="28322b95-1f3b-11b2-bb9a-eb6ec3979370" upgrade_id="db14898f-1f3a-11b2-a90c-eb6ec3979369" version="11.1.0.0" copyright="2019" info_url="VMware Inc" support_url="www.http://vmware.com" location="/opt/InCharge" last_modified="2022-07-29 05:02:08"><![CDATA[]]><vendor name="InstallAnywhere" id="2832bde7-1f3b-11b2-bb9a-eb6ec3979369" home_page="http://www.installanywhere.com" email="[email protected]"/><feature short_name="Application" name="Application" last_modified="2022-07-29 05:02:08"><![CDATA[This installs the application feature.]]><component ref_id="db1488e4-1f3a-11b2-a8c7-eb6ec3979369" version="1.0.0.0" location="/tmp/install.dir.4793/./devstat_err-javadoc.jar"/><component ref_id="db1488e3-1f3a-11b2-a8c8-eb6ec3979369" version="1.0.0.0" location="/opt/InCharge/MPLS/jre"/></feature></product>

Add below tag before NPM upgrade, if not present:

<product name="NPM" id="28322b95-1f3b-11b2-bb9a-eb6ec3979369" upgrade_id="db14898f-1f3a-11b2-a90c-eb6ec3979369" version="11.1.0.0" copyright="2019" info_url="VMware Inc" support_url="www.http://vmware.com" location="/opt/InCharge" last_modified="2022-08-02 02:28:34"><![CDATA[]]><vendor name="InstallAnywhere" id="2832bde7-1f3b-11b2-bb9a-eb6ec3979369" home_page="http://www.installanywhere.com" email="[email protected]"/><feature short_name="Application" name="Application" last_modified="2022-08-02 02:28:34"><![CDATA[This installs the application feature.]]><component ref_id="db1488e4-1f3a-11b2-a8c7-eb6ec3979369" version="1.0.0.0" location="/tmp/install.dir.7209/./devstat_err-javadoc.jar"/><component ref_id="db1488e4-1f3a-11b2-a8c7-eb6ec3979369" version="1.0.0.0" location="/opt/InCharge/NPM/_uninst/uninstaller"/></feature></product>

Note:

Ensure that the 'location' and 'version' (highlighted in bold) attributes are updated correctly as per the existing smarts deployment location and version.

Metric file names in filters are not segregating tags and properties.

While creating an alarm, when properties and tags of the incoming metric are the same, duplicate entries appear in the drop-down.

Workaround: Ensure that there are no duplicates in the properties and tags section of incoming metrics.
Adding Enrichment other than default causes the duplication of records in VMware Telco Cloud Service Assurance topics.

When a user adds new enrichers, the VMware Telco Cloud Service Assurance ends up duplicating the records, as all of the records go through default as well as the new enricher.
Alarm resulting Notification is not having Ticket ID specified in Notification definition.
Cloudify discovery not working by passing DCF complex passwords.
In out-of-box cisco-aci collector, the devicetype/deviceName value are empty.

Legacy issue, no reports and functionality impacted.
Warning message "java.io.FileNotFoundException" appears in snmp config collector logs.

No functional impact. Warning messages can be ignored.
Grafana reports are not getting populated with daily or weekly VMware Smart Assurance metrics indexes.
Workaround:
1. Select Configurations > Data Sources from the left side menu bar.
2. Click Add Data Source.
3. Select OpenSearch.
4. Enter the relevant name based on the metric type for which the weekly index is created. For example, (Week-Network-Interface). Use HTTP URL as http://nginx:8099/esdb-proxy/. You can also refer to any other VMware Telco Cloud Service Assurance data sources.
5. Under Auth, check Skip TLS Verify and Forward OAuth Identity.
6. Enter Index Name based on the metric type for which the weekly index is created. Check the OpenSearch DB for the availability of the indexes. For example, ([vsametrics-week-networkinterface-]YYYY.MM). Select Pattern "Monthly" ([vsametrics-week-networkinterface-]YYYY) and select Pattern "Yearly".
7. Enter the Time Field Name timestamp and Version OpenSearch 1.0.x.
8. Retain the rest of the fields to default value.
9. Click Save & Test.
10. Now in the corresponding report settings, change the default data source to the newly created Datasource. For example, Report.. Home==>Top 10 Bandwidth Utilization.
11. Navigate to setting, and change Datesource "Network-Interface" to "Week-Network-Interface".
12. Change metric "CurrentUtlization" to "CurrentUtlization.avg".
13. Save.
Exception appears in the user interface while adding more than 30 SAMs or Domain Managers at a time.

Sometimes the user interface timeouts with an exception, while adding more number of SAMs or Domain Managers in a single Smarts Integration Create Wizard flow.

Though the timeout happens , the collectors would be created successfully. Youcan validate by going to the details of the Smarts Integration.

Workaround: To avoid the timout exception, it is advisable to add limited SAM or Domain Mnaagers (around 30) in initial Smarts Integration create wizard flow. And, subsequently add additional SAM or Domain Managers in the existing Smarts Integration.
The SDK REST Custom collector pod is spinned up and in still state when the REST simulator is down.

The SDK REST Custom collector is up and running, after the simulator is up.

But the spinned up REST pods which are in error state remains in still sate. These pods are not consuming any memory or CPU resources, and there is no functional impact.
After upgrade and migration, the ESM Server hangs.

The ESM Server hangs for the first time, after performing an upgrade or migrate from earlier versions. Server does not responds to any dmctl commands and the SAM Console gets freezed. You are unable to naviagte in the SAM Console.

You can stop the ESM Server using dmquit, or kill the ESM Server after the ESM Server is started for the first time post upgrade or migrate.

Once the server is completely stopped, restart the ESM Server. After the Server is restarted, the hang issue is no more observed, and the dmctl commands start responding.
VMware Telco Cloud Service Assurance job Instantiation and Terminate status is not showing correct status in VMware Telco Cloud Automation user interface.

VMware Telco Cloud Service Assurance job Instantiation and Terminate status is shown as success, eventhough the VMware Telco Cloud Service Assurance deployment is in progress.
To check the VMware Telco Cloud Service Assurance job Instantiation & Terminate status, use the following kubectl command:
```
root [ ~/tcx-deployer/scripts ]# kubectl get tcxproduct
```
Getting an error message during migration of IP, SAM, and ESM domain managers.
Following error message appears, during migration of IP, SAM, and ESM. And, .conflict files are created for sm_merge and version.pm:
```
----
Merge Process aborted/opt/InCharge11/IP/smarts/local/bin/system/sm_merge: 
line 1: $'\177ELF\002\001\001': command not found/opt/InCharge11/IP/smarts/local/bin/system/sm_merge: 
line 2: $'w\267P\343\316\301\024W\026': command not found
------
```
There is no functional impact, and errors can be ignored. It is safe to delete the sm_merge.conflict and version.pm.conflict files, before starting the server.
The sm_merge.conflict and version.pm.conflict file must be deleted before starting the server.
The log_level messages are displaying 'unknown' in Service logs (Kibana logs).

When user navigates to Administration > Service Logs, and clicks on the application service logs, the filter log level displays 'unknown' fields for log_level messages for any service. For example: Apiservice, elasticsearch, and so on.. .
VMware Telco Cloud Service Assurance currently does not support connections to SAM server when Broker is configured in secure mode.

Currently there is no workaround. Broker must configured in non-authenticate mode.

Note: EDAA related operations including the Acknowledge, Ownership, Server Tools, Browse Details > Containment and Browse Details > Domain Manager are not supported when Broker is configured in secure mode.
When the number of hops of connectivity is increased, you may experience performance issues in the topology maps.

There might be performance issues in the rendering of Redundancy Group and SDN connectivity map types in the Map Explorer view. This issue is observed on deployments with a complex topology where the topology maps may stop working when the number of hops of connectivity is increased.
Broker failover is not supported in VMware Telco Cloud Service Assurance.

Primary Broker fails in the Domain manager failover environment.
Currently when a Broker (multi-broker) failover happens, then it requires a manual intervention where you need to log in to VMware Telco Cloud Service Assurance and change the Broker IP address to point to the new Broker IP.

Procedure:
1. Go to https://IPaddress of the Control Plane Node.
2. Navigate to Administration > Configuration > Smarts Integration
3. Delete the existing Smarts Integration Details.
4. Re-add the Smarts Integration Details by pointing it to secondary Broker.
Weekly indexes are not displayed while creating custom reports, only daily and hourly index are shown part of reports.
Procedure for workaround:
1. Select Configurations > Data Sources from the left side menu bar
2. Click Add Data Source.
3. Select Elasticsearch.
4. Enter relevant name based on the metric-type for which the weekly index needs to be created (for example: Week-Network-Interface) and the Elastic http url as http://elasticsearch:9200, refer any other VMware Telco Cloud Service Assurance data sources
5. Enter Index Name based on the metric type for which the weekly index needs to be create ([vsametrics-week-networkinterface-]YYYY.MM) and select Pattern "Monthly"
6. Enter the Time Field Name timestamp and Version 7+.
7. Keep the rest of the fields to default value.
8. Click Save & Test.
Notification count mismatch between SAM and VMware Telco Cloud Service Assurance UI due to non-filtering of notification with Owner field set to SYSTEM. By default in VMware Telco Cloud Service Assurance there are no filters set.
Manually apply the filter to remove notifications with Owner field not containing SYSTEM in VMware Telco Cloud Service Assurance Notification Console window by following below steps:
1. Go to Default Notification Console.
2. Click Customize View.
3. Go to Filters and provide Filter Set Name, for example Filterout SYSTEM Notifications.
4. Filter Section Add Attribute with below condition:
  
  Property = Owner
  
  Expression = regex
  
  Value = ~(SYSTEM+)
5. Click Update.
Verify the Default Notification Console has only those notifications whose owner not set to SYSTEM. The default notification count must match between SAM and VMware Telco Cloud Service Assurance UI.
The Containment, Browse detail, Notification Acknowledge/Unacknowledge does not work when the primary Tomcat server fails in a HA environment.

In a Failover deployment, when the primary Tomcat fails, the UI operations including the Notification Acknowledgement, Containment, Browse Detail, and Domain Managers fail.
When the primary Tomcat instance fails in a failover environment, then you can manually point the VMware Telco Cloud Service Assurance to a secondary Tomcat instance.

Procedure:
1. Go to https://IPaddress of the Control Plane Node.
2. Navigate to Administration > Configuration > Smarts Integration
3. Delete the existing Smarts Integration Details.
4. Re-add the Smarts Integration Details by editing the EDAA URL and pointing it to the secondary Tomcat Instance.
The SAM server is getting listed in the Domain Manager section instead of Presentation SAM section.

During Smart integration and configuration, INCHARGE SA (SAM server) is getting listed in the Domain Manager section. This problem occurs only when, the SAM server is started in Non-EDAA Mode.

To get listed under Presentation SAM section, start the SAM server in EDAA Mode.
User needs to mandatorily discover ESX Servers for getting Virtual Machine Down event. Currently the Virtual Machine Down event is not generated if the corresponding ESX Servers are not discovered in IP Server. So, its recommended to discover Virtual Machines to get proper Root cause events.
Topology synchronization is taking more than 10 minutes for 25K devices, when latency between SAM and VMware Telco Cloud Service Assurance is more than 5 milliseconds.

When the latency increases topology synchronization time increases.

Ensure that the latency between VMware Telco Cloud Service Assurance (Topology Collector) and SAM Presentation server is less than 5 milliseconds.
Notification processing rate is slower, when the latency between SAM and VMware Telco Cloud Service Assurance is greater than 5 milliseconds.

When the latency between VMware Telco Cloud Service Assurance and SAM Presentation server increases, notification processing rate goes down.

Ensure that the latency between VMware Telco Cloud Service Assurance (Notification Collector) and SAM Presentation server is less than 5 milliseconds.