VMware Telco Cloud Service Assurance Reporting Troubleshooting

In this topic, you can find information about common issues and solutions from VMware Telco Cloud Service Assurance Dashboard and Reports.


Issues	Cause	Solution
Unable to start Dashboards & Reports. The web-page displays HTTP Status 500 -Internal Server Error.	The Grafana pod is not fully up and running.	Verify the pod events and logs by running appropriate command. Resolve the issues reported and ensure that the Grafana pod is running.
Able to start Dashboards & Reports successfully, but the reports throw Query error: 502 error.	The Elasticsearch pod is not fully up and running.	Verify the pod events and logs by running appropriate command. Resolve the issues reported and ensure that the Elasticsearch pod is running.
Grafana reports displays, “No data to show” or. “No data” or “No data points”	No data exists in Elasticsearch indices for that specific reports.	Verify data is being collected from Smart Assurance domains or from the network by the metric collectors. Verify the data collected is being ingested into relevant indices in Elasticsearch datastore. Note: If you are not sure from which Elasticsearch indices the report queries. And, then check the data source configured in the report “Query” field and look for the index pattern set in that data source (under Configuration > Data Sources).
Observing data discrepancy between Smart Assurance domains and the data shown in reports.	It can be possible that objects are deleted or unmanaged in Smart Assurance domains.	Elasticsearch stores time-series data. If objects have been deleted or unmanaged in Smart Assurance domains or network, then adjust the time (at the top-right corner) range in the Grafana dashboard to see right data in reports.
A report displays “Unknown elastic error response” error.	It can possible that the time interval to plot the report is short and the number records are considerably high. Since Elasticsearch’s default max bucket size is 10k, any query result goes beyond the limit can throw such errors.	Edit the report and change the time interval to 24 h. Save the report and verify that the error is fixed or not.
In some reports, the count (primarily in reports that show the objects count) displayed is not 100% accurate.	This issue can be possible in reports where ‘Unique Count’ metric is used to calculate the number of unique records. The accuracy of ‘Unique Count’ in Elasticsearch is based on the precision threshold. The default precision threshold is 3k. When the unique count goes beyond that threshold, then accuracy cannot be 100%.	To have more accuracy on such reports, the precision threshold value can be changed by editing the report. Note: Increasing the precision threshold can normally consume more memory.
In some tabular reports, all the records are not displayed.	It can be possible that the report is configured to show only top N records.	To change the size to “No limit”, edit the report .
In some reports, “Failed to parse query [<filed>]” error is displayed. For instance, the following message says value passed for the field ‘entityType.keyword’ is empty.	In many reports, the query is based on the value selected from the drop-down menu (top left on the dashboard). If Elasticsearch does not have data for that report, then the default value in the drop-down menu shown as “None” which is used by Grafana to make a query. So, it throws such errors.	Verify why the data is missing in Elasticsearch and ensure that the necessary indices exist and also data is being ingested.
VMware Telco Cloud Service Assurance Health Status Pod reports displays empty value for some pods. They indicate that some pods ran for sometime, consumed some CPU and Memory resources, but no longer exist.	The time picker control is hidden by default and the range of this report by default is Last 12h. The reason behind it is, narrow time window loads ElasticSearch with heavy queries.	To select a small range, you can go to the Gear icon on the top right of the reports and uncheck the option Hide time picker and go back to the reports.
If a variable of a reports does not have any value(s) listed or selected when it is opened, grafana displays an error on the report.	NA	It is a default behaviour of grafana report. If empty value of a variable is passed as part of query while fetching the report. If value exists and it is not selected by default then you may select it to query the report data.
Dashboard and Report functionality in VMware Telco Cloud Service Assurance provides summary and granular views for the performance metrics and events.	NA	Summary view needs to query large data set across a given Grafana data source to show the summarized global view for all device types and entity types. For example , Data center summary view need to query metrics for all VirtualMachines , Hypervisors within a data center to show the summarized health of all virtual resources. Similarly, other summary views like Network Health & Inventory, etc.. needs all device types and entity types to show the summarized global view of networks. Granular view needs to query smaller set of data for the given Grafana data source with set of filters. For example, health trends of all VirtualMachines for a given Hypervisor or Network health of the selected device type , etc…In case of large deployments, summary view query for a time interval above 1 day would span across raw indices with large data set causing slowness in Elasticsearch read response which intern causes a slowness in report rendering and sometimes timeout errors. In such scenarios, workaround is to point the report to hourly or daily index . It would reduce the data set and allow to achieve the purpose of Summary report for the summarized global view for time interval above 1 day. The flip side with hour or daily index is that , reports wont show any data for time interval less than an hour or day. In that case , switch the index to default raw index to see the reports for time interval less than 1 hr or a day. Granular views with adequate filters won’t have the slowness issue. Grafana data sources for Hour and Day index are pre-created in VMware Telco Cloud Service Assurance for all the supported metrictypes. Simple step to change the data source to point to hour or day index : Edit the report which is experiencing slowness or elasticsearch timeout errors. Select the hour index data source from the datasource dropdown in the query panel. Example if the query points to Network-Interface, select Hour-Network-Interface or Day-Network-Interface. Save the report and refresh.