The release notes cover the following topics:

About VMware Telco Cloud Operations

VMware Telco Cloud Operations is a real-time automated service assurance solution designed to bridge the gap between the virtual and physical worlds. It provides holistic monitoring and network management across all layers for rapid insights, lowers costs, and improved customer experience. Powered by machine learning (ML) capabilities, VMware Telco Cloud Operations automatically establishes dynamic performance baselines, identifies anomalies, and alerts operators when abnormal behavior is detected.

VMware Telco Cloud Operations simplifies the approach to data extraction, enrichment, and analysis of network data across multi-vendor environments into actionable notifications and alerts to manage the growing business needs of Telco in an SDN environment.

For information about setting up and using VMware Telco Cloud Operations, see the VMware Telco Cloud Operations Documentation.

What's New in this Release

The VMware Telco Cloud Operations v1.3 introduces the following enhancements:

  • Refined Enrichment User Experience:

    • Refined wizard type enrichment user interface to tag metrics, events, and topology data in VMware Telco Cloud Operations based on external data.

    • Simplified external data upload to allow the user to upload a CSV file containing external data through the enrichment user interface.

  • Kafka Collector and Mapper

    • Provides ability to consume infra metric data into VMware Telco Cloud Operations through Kafka open messaging interface.

    • Provides ability to map consumed metric data in VMware Telco Cloud Operations metric format for KPI computation, anomaly detection, reporting and dashboarding.

  • VMware Telco Cloud Operations Services High Availability Support:

    • The HA support for Event, Catalog, DM adapter, Esdb-proxy, and Persistence services has been introduced.

  • SDWAN - VeloCloud support:

    • VeloCloud versions up to 4.2.0 is supported in this release.

For information about system requirements, hardware requirements, patch installation, and sizing guidelines, see the VMware Telco Cloud Operations Deployment Guide.

Resolved Issues

  • Enrichment stream name field is not editable.

    If user wants to edit stream name after creating the enrichment stream, the option to edit enrichment name is not available.

Known Issues

  • A possible cause for the deployment to fail is when you use the automated deployment tool.

    When you deploy VMware Telco Cloud Operations using the automated deployment tool, the deployment of the worker node may fail with the error: Failed to send data.

    Workaround: Modify the VCENTER_IP configuration parameter in the deploy.settings file to use the fully qualified domain name (FQDN). For more information about modifying the deploy.settings file, see the VMware Telco Cloud Operations Deployment Guide.

  • When the number of hops of connectivity is increased, you may experience performance issues in the topology maps.

    There might be performance issues in the rendering of  Redundancy Group, MPLS, Metro-E, and SDN connectivity map types in the Map Explorer view. This issue is observed on deployments with a complex topology where the topology maps may stop working when the number of hops of connectivity is increased.

  • VMware Telco Cloud Operations currently does not support connections to SAM server with broker authentication, EDAA authentication, and Edge Kafka authentication. 

    For a workaround, see the Security Recommendation section in the VMware Telco Cloud Operations Deployment Guide.

    Note: EDAA related operations including the Acknowledge, Ownership, Server Tools, Browse Details > Containment and Browse Details > Domain Manager are not supported when Smarts Broker is configured in secure mode.

  • Broker failover is not supported in VMware Telco Cloud Operations.

    Primary Broker fails in the Smart Assurance failover environment.

    Workaround: Currently when a Broker (multi-broker) failover happens in Smart Assurance, then it requires a manual intervention where you need to log in to VMware Telco Cloud Operations and change the Broker IP address to point to the new Broker IP.
    Procedure:

    1. Go to https://IPaddress of the Control Plane Node.
    2. Navigate to Administration > Configuration > Smarts Integration
    3. Delete the existing Smarts Integration Details.
    4. Re-add the Smarts Integration Details by pointing it to secondary Broker.
  • Statistics - Tunnel reports for SDWAN displays unknown elastic error, if the specific device is not selected in Edge filter.

    Workaround: To avoid the error, remove ALL option for Edge.

    Procedure to disable the ALL option: Statistics Tunnel > Dashboard Settings > Variables > Edge > Disable Include All option.

  • When Smarts is restarted without repos for multiple times, the Viptela ControlNode controller status is going to OTHER/UNKNOWN state.

    Workaround: Use below command in master node to delete the respective Viptela stale collectors:

    kubectl delete deployments.apps <viptela deployment app instance>
     

  • VMware Telco Cloud Operations Health Status Pod report displays empty value for some pods. They indicate that some pods ran for sometime, consumed some CPU and Memory resources, but no longer exist.

    Workaround: To select a small range, you can go to the Gear icon on the top right of the reports and uncheck the option Hide time picker and go back to the reports.

  • Weekly indexes are not displayed while creating custom reports, only daily and hourly index are shown part of reports.

    Workaround:

    1. Select  Configurations > Data Sources from the left side menu bar
    2. Click Add Data Source.
    3. Select Elasticsearch.
    4. Enter relevant name based on the metric-type for which the weekly index needs to be created (for example: Week-Network-Interface) and the Elastic http url as  http://elasticsearch:9200, refer any other TCOps data sources
    5. Enter Index Name based on the metric type for which the weekly index needs to be create ([vsametrics-week-networkinterface-]YYYY.MM) and select Pattern "Monthly"
    6. Enter the Time Field Name timestamp and Version 7+.
    7. Keep the rest of the fields to default  value.
    8. Click Save & Test.
  • Notification count mismatch between SAM and VMware Telco Cloud Operations UI due to non-filtering of notification with Owner field set to SYSTEM​. By default in TCOPs there are no filters set. ​

    Workaround: Manually apply the filter to remove notifications with Owner field not containing SYSTEM in VMware Telco Cloud Operations Notification Console window by following below steps:

    1. Go to Default Notification Console.
    2.  Click Customize View.
    3. Go to Filters and provide Filter Set Name, for example Filterout SYSTEM Notifications.
    4. Filter Section Add Attribute with below condition:

      Property = Owner

      Expression = regex

      Value = ~(SYSTEM+)

    5. Click Update.

    Verify the Default Notification Console has only those notifications whose owner not set to SYSTEM. The default notification count must match between SAM and VMware Telco Cloud Operations UI.

  • Netflow-9 Statistics, Netfow-9 Trends, Netflow-5 Statistics, and Netflow-5 Trends reports display error message - Failed to parse query with the Default Time interval of 3 hours.

    Workaround: You need to select smaller time intervals. For example: 15 minutes, 30 minutes, 1 hour, etc.

  • When the Kafka server is configured to a wrong IP or the Kafka node goes down during discovery, then the Velocloud discovery hangs for 20 minutes before exiting the discovery. This is the case even when the messagePollTimeout of the VCO Access setting is set to a lower value.

    WorkaroundIn the esm-param.conf file add the below line replacing the <kafka ip address> and <time in seconds>, and restart the server.

    MessagePollTimeoutPeriodInSeconds-<kafka ip address> <time in seconds>

  • The Containment, Browse detail, Notification Acknowledge/Unacknowledge does not work when the primary Tomcat server fails in a Smart Assurance HA environment.

    In a Smart Assurance Failover deployment, when the primary Tomcat fails, the UI operations including the Notification Acknowledgement, Containment, Browse Detail, and Domain Managers fail.

    Workaround: When the primary Tomcat instance fails in a Smart Assurance failover environment, then you can manually point the VMware Telco Cloud Operations to a secondary Tomcat instance.
    Procedure:

    1. Go to https://IPaddress of the Control Plane Node.
    2. Navigate to Administration > Configuration > Smarts Integration
    3. Delete the existing Smarts Integration Details.
    4. Re-add the Smarts Integration Details by editing the EDAA URL and pointing it to the secondary Tomcat Instance.
  • An error message appears in Grafana report.

    When user logs out from Operational UI and tries to launch report from Grafana user interface, an error message appears.  

    Workaround: Refresh or relaunch Grafana UI to logout.

  • The SDWAN Flow Top N Summary reports displays an error message.

    In case of SDWAN Flow Top N Summary report, the Grafana Bar Gauge widget does not support substantial time intervals.

    Workaround: You need to set smaller time interval (24 hour) for the flow reports. Follow the procedure to set the substantial time interval:

    1. Click Edit from the report.
    2. Expand the Interval in the last row (Date Histogram) of query, and set it to higher interval like (7d or so on).
    3. Save the report.
  • The SAM server is getting listed in the Domain Manager section instead of Presentation SAM section.

    During Smart integration and configuration, INCHARGE SA (SAM server) is getting listed in the Domain Manager sectionThis problem occurs only when, the SAM server is started in Non-EDAA Mode.

    Workaround: To get listed under Presentation SAM section, start the SAM server in EDAA Mode.

  • Disk usage is not mentioned in the VMware Telco Cloud Operations Health Status Node report.

    In Health Status Node Report, the disk usage is not specified for which kubernetes cluster node (Controlplane, Arango, ElsticSearch, Domain Manager, Kafka, and so on) the disk is used.

    Workaround: 

    1. Click Edit in the panel of Disk usage.
    2. Click Field tab.
    3. Click Display Name and No value fields (no need to enter any value).

      Node names appear.

    4. Click Save and Apply.
  • Some of the DataCenter Summary reports are taking longer time to display.

    On 100k footprint with 10 Million records sent per polling, the DataCenter reports are taking more than usual time to display

    Workaround: Perform the following procedure on report side:

    1. Reduce the default time interval from 24hr to 12hr or 6hr.
    2. If the issue still persists, point the datasource to hourly index for the panel which is showing the error.
  • Topology pod is down, due to Redis service failure.

    In one of the 100k deployment, Topology pod is down due to redis service failure, and notification sync in VMware Telco Cloud Operations is very slow.

    Workaround: Following procedure can be applied to restart Redis cluster and restart dependant services. On Master node perform below steps:

    1. Scale down events  pods using command:(kubectl scale deployment <events_POD> --replicas=0)
    2. Scale down topology pods using command:(kubectl scale deployment <topology_POD> --replicas=0)
    3. Delete Redis deployment using command:(kubectl delete deployment redis)
    4. cd to /home/clusteradmin/kubernetes
      • kubectl apply -f redis.yaml.
    5. Once Redis comes up, Scale up Topology and Events Pods. 
      • kubectl scale deployment <events_POD> --replicas=1
      • kubectl scale deployment <topology_POD> --replicas=1
  • Security Vulnerability

    CVE-2021-3449 -- An OpenSSL TLS server may crash if sent a maliciously crafted renegotiation ClientHello message from a client. If a TLSv1.2 renegotiation ClientHello omits the signature_algorithms extension (where it was present in the initial ClientHello), but includes a signature_algorithms_cert extension then a NULL pointer dereference will result, leading to a crash.

  • When Arangoworker node hosting the Flink services (Job Manager/Task Manager) goes down, Ingestion of Topology, Metrics, and Events might not work correctly.

    Flink services are not deployed in HA mode. If an Arango worker node goes down, the enrichment service might not be fully operational, which results in ingestion services to stop processing until the node is restored.

    Workaround: You need to bring up the Arango worker node and restart the enrichment streams post the node is up. Refer, VMware Telco Cloud Operations Troubleshooting Guide for more information.

  • Authorization error message appears in html code, when user does not have Grafana edit permission.

    When a Role is created for a user with only "Dashboard & Reporting" view permission, and the user attempts to edit any Dashboard or Reporting settings in Grafana, the authorization error appears in html format .

  • A workaround must be applied to a VMware Telco Cloud Operations 1.2 cluster before installing VMware Telco Cloud Operations 1.3 update.

    A defect in VMware Telco Cloud Operations 1.2 prevents installation of the VMware Telco Cloud Operations 1.3 patch, unless the following workaround is applied.

    Workaround: Extend expiry of the patcher account on all nodes in the VMware Telco Cloud Operations cluster. Run the following script on the Control Plane Node:

    #!/bin/sh
    
    if [ "$SSH_PASSWORD" = "" ]; then
        echo "Please set the SSH_PASSWORD environment variable to the root password"
        exit 1
    fi
    
    for ip in $(kubectl get nodes -o wide | awk '{print $6}' | grep -v 'INTERNAL-IP'); do
        sshpass -p "$SSH_PASSWORD" ssh root@$ip -o StrictHostKeyChecking=no "chage -m 0 -M 99999 -I -1 -E -1 patcher"
    done
    
    

    The root password must be same for all the nodes in this script, and to be exported as SSH_PASSWORD, in the environment before running the script.

    For example, if the script was created in /tmp as file 'unexpire.sh', and the password was 'rootpassword', the script must be run as:

    # export SSH_PASSWORD=rootpassword
    # /tmp/unexpire.sh
  • On 50k and 100k footprint deployments, disk space may exhausted on the arangoworker nodes due to checkpoint files not being removed.

    In some situations, checkpoint files used in the stream processing service are not removed when no longer required. This can eventually result in disk space exhaustion on some of the arangoworker nodes, which can lead to stream processing tasks, such as KPI computation and enrichment, failing.

    Workaround: Unwanted checkpoint files must be removed.

    If disk exhaustion occurs, execute the following script on all arangoworker nodes:

    echo "Cleaning up old checkpoints..."
    CHECKPOINT_PATH="/var/vmware/flink/checkpoints"
    if [ -d $CHECKPOINT_PATH ]; then
        JOB_IDS=$(ls $CHECKPOINT_PATH)
        if [ ! -z "$JOB_IDS" ]; then
            for JOB_ID in $JOB_IDS; do
                echo "cleaning up job $JOB_ID"
                if [ "$(ls -A $CHECKPOINT_PATH/$JOB_ID)" ]; then
                    find $CHECKPOINT_PATH/$JOB_ID/* -maxdepth 0 -mtime +1 -exec rm -rf {} \;
                fi
            done
        fi
    fi
    

    To avoid disk exhaustion occurring in the first place:

    1. ssh to the control plane node as the clusteradmin user
    2. Change to the /home/clusteradmin/kubernetes directory
    3. Create a file called flink-cleanup.yaml with the following content:
      apiVersion: apps/v1
      kind: DaemonSet
      metadata:
        name: flink-cleanup
        namespace: vmware-smarts
      spec:
        selector:
          matchLabels:
            run: flink-cleanup
        template:
          metadata:
            labels:
              run: flink-cleanup
          spec:
            containers:
            - name: flink-cleanup
              image: registry.cluster.omega.local:8443/omega/omega-patching-runner:1.3.0-9
              command: 
              - sh
              - "-c"
              - |
                echo "Starting flink cleanup script."
                while true; do 
      
                  echo "Cleaning up old checkpoints..."
                  CHECKPOINT_PATH="/var/vmware/flink/checkpoints"
                  if [ -d $CHECKPOINT_PATH ]; then
                    JOB_IDS=$(ls $CHECKPOINT_PATH)
                    if [ ! -z "$JOB_IDS" ]; then
                      for JOB_ID in $JOB_IDS
                      do
                          echo "cleaning up job $JOB_ID"
                          if [ "$(ls -A $CHECKPOINT_PATH/$JOB_ID)" ]; then
                            find $CHECKPOINT_PATH/$JOB_ID/* -maxdepth 0 -mtime +1 -exec rm -rf {} \;
                          fi
                      done
                    fi
                  fi
      
                  echo "going to sleep..."
                  sleep 6h;
                done
              volumeMounts:
              - name: flink-data
                mountPath: /var/vmware/flink
            nodeSelector:
              runin: arango
            volumes:
              - name: flink-data
                persistentVolumeClaim:
                    claimName: flink-pvc  
      
      
    4. Apply the configuration by running the command:
      kubectl apply -f flink-cleanup.yaml
  • Enrichment key content assist dropdown menu is displayed partially.

    In the Enricher configuration UI, when user enters a back slash ( \ ) is in the Enrichment Key field, the content assist dropdown menu is displayed partially. Only the first few characters of each property name is displayed.

    Workaround: User can still click the partially displayed entry or enter the first few characters in the Enrichment Key field after the back slash ( \ ), upon which it is completely displayed in the Enrichment Key field with the correct syntax. Refer the following lists to get names and description of all the entries in the dropdown to assist in making choices:

    For data type TCOps Metric and MnR metric, here are the property names in the same order of the dropdown list:

    Data Source: An IP address or a name indicates the event data source
    Device Name: Name of the device where metric is collected
    Device Type: Type of the device where metric is collected
    Entity Name: Name of the entity on a device
    Entity Type: Type of the entity on a device
    Instance: Event instance
    Metric Type: Metric type under the event type
    Tags: Event tags
    Type: Type of the event, usually indicates the event type per vender interface

    For data type TCOps Event, here are the property names in the same order of the dropdown list:

    Acknowledged: Indicates if this event has been acknowledged
    Active: Indicates if this event is currently active
    Category: Category of this event. The event category represents a broad categorization of the event, for example: availability vs. performance.
    Certainty: The certainty of this event.
    Class Display Name: Display name for the event class.
    Class Name: Class name of the object where this event occurred.  This attribute along with InstanceName and EventName uniquely identify this event.
    Clear On Acknowledge: Indicates if this event should be cleared when it is acknowledged. Set this to TRUE only for events that do not expire nor have sources that generate a clear.
    Closed At: ClosedAt
    Element Class Name: The class name of the topology element associated with the event in the repository where this event resides. This may or may not have the same value as ClassName.
    Element Name: The name of the topology element associated with the event in the repository where this event resides.  This may or may not have the same value as InstanceName.  The string is empty if there is no related element.
    Event Display Name: Display name for the event Name.
    Event Name: Name of the event. This attribute along with ClassName and InstanceName uniquely identify this event.
    Event State: The current state of this event. ACTIVE: The event is currently active. WAS_ACTIVE: The event was active, but we lost contact with the event source.  INACTIVE: The event is inactive.  UNINITIALIZED:  The event has not been notified yet; the object does not yet represent a notified event.
    Event Text: The textual representation of the event.
    Event Type: Indicates the nature of the event.  A MOMENTARY event has no duration.  An authentication failure is a good example. A DURABLE event has a period during which the event is active and after which the event is no longer active.  An example of a durable event is a link failure.
    First Notified At: First notification time
    Impact: A quantification of the impact of this event on the infrastructure and/or business processes.  There are no pre-defined semantics to the value of this attribute other than a larger numeric value indicates a larger impact.
    In Maintenance: Indicate if this event occurs during maintenance.
    Instance Display Name: Display name for the event instance.
    Instance Name: Instance name of the object where this event occurred.  This attribute along with ClassName and EventName uniquely identify this event.
    Is Problem: A notification is a problem when all of the original event types are PROBLEM or UNKNOWN.  There must be at least one PROBLEM, i.e. UNKNOWN by itself is not a problem.
    Is Root: Is this a root notification?
    Last Changed At: Time of last event change.
    Name: Name of object.
    Occurrence Count: The number of occurrences of this event starting from FirstNotifiedAt until LastNotifiedAt.
    Opened At: The number of occurrences of this event starting from FirstNotifiedAt until LastNotifiedAt.
    Owner: The name of the user that is responsible for handling this event.
    Polling State: The name of the user that is responsible for handling this event.
    Severity: An enumerated value that describes the severity of the event from the notifier's point of view: 1 - Critical is used to indicate action is needed NOW and the scope is broad, e.g. an outage to a critical resource.2 - Major is used to indicate action is needed NOW.3 - Minor should be used to indicate action is needed, but the situation is not serious at this time.4 - Unknown indicates that the element is unreachable, disconnected or in an otherwise unknown state.5 - Normal is used when an event is purely informational.
    Source: Source of this event.
    Source Domain Name : The name(s) of the domain(s)or domainGroups that have originally diagnosed and notified - directly or indirectly - current occurrences of this event.  If there are more than one original domain, the attribute lists each separated by a comma. When the notification is cleared, the last clearing domain stays in the value.
    Source Event Type: The type(s) of the events(s), i.e.  'PROBLEM', 'EVENT', 'AGGREGATE' in the source domains that have notified current occurrences of this event.  If there is more than one domain the attribute lists each separated by a comma, in the same order as SourceDomainName.
    Source Info: The number of occurrences of this event starting from FirstNotifiedAt until LastNotifiedAt.
    Source Specific: Source Specific.
    Trouble Ticket ID: Trouble ticket ID
    User Defined1: User defined field1.
    User Defined10: User defined field10.
    User Defined11: User defined field11.
    User Defined12: User defined field12.
    User Defined13: User defined field13.
    User Defined14: User defined field14.
    User Defined15: User defined field15.
    User Defined16: User defined field16.
    User Defined17: User defined field17.
    User Defined18: User defined field18.
    User Defined19: User defined field19.
    User Defined2: User defined field2.
    User Defined20: User defined field20.
    User Defined3: User defined field3.
    User Defined4: User defined field4.
    User Defined5: User defined field5.
    User Defined6: User defined field6.
    User Defined7: User defined field7.
    User Defined8: User defined field8.
    User Defined9: User defined field9.

    For data type TCOps Topology, here are the property names in the same order of the dropdown list:

    Action: Action of the topology record
    Collector Name: Name of the collector that collects the topology record
    Collector Type: Type of the Collector where the topology record is collected
    Creation Class Name: The name of the most-derived class of this instance
    Description: A textual description of the object
    Discovery ID: Discovery ID of the topology record
    Display Class Name: The string shown in the GUI when this object's class is displayed
    Display Name: The string shown in the GUI when this object's name is displayed.
    Force Refresh: Indicate whether to force refresh topology information
    Group Name: Group name of the topology record
    ID: ID of the topology record
    Initialized: Indicate whether the topology information is initialized
    Is Managed: The IsManaged attribute determines if an ICIM_ManagedSystemElement should be monitored by the management system.  An unmanaged object will never have associated instrumentation. This attribute is readonly.
    Job ID: Job ID of the topology record
    Name: Name of the topology record
    Network Number: The network number (computed from Address and Netmask)
    Observer: Indicate whether this record is an observer
    Opened At: Timestamp when the topology instance opened at
    Service Name:  Name of external server used for imported events and instrumented attributes
    Source: Creation MIB source for this entity.
    System Name: The name of the ICIM_System containing this element.
    Type: Type of the topology record
    Value: Value of the topology record

check-circle-line exclamation-circle-line close-line
Scroll to top icon