Problems can occur within different layers of the VMware Telco Cloud Service Assurance stack. The symptoms can manifest as application unavailability, incorrect operations, or degradation in performance.These problems can occur during deployment as well as after during operation.
What to read next
Elasticsearch-Kibana troubleshooting Elasticsearch fails in the middle of Kibana initialization, and this failure during initialization is critical to Kibana. If it fails, you must manually remove the index because it is in a broken state.
VMware Telco Cloud Service Assurance Installation issue VMware Telco Cloud Service Assurance installation is triggered by the execution of CLI called the tcxctl . For information on using tcxctl
, see Using tcxctl Commands in the VMware Telco Cloud Service Assurance Deployment Guide .
Pod crashes after deployment Post deployment, if hdfs-datanode pod crashes.
Getting additional information for CNFs The VMware Telco Cloud Automation manager UI provides information about CNF instantiation.
Resolving edge services port conflict issue During initial deployment, the kafka-edge
service assigns a random port to the ingress gateway in charge of exposing the kafka-edge
to external clients. This random port, in certain circumstances, may conflict with another port assigned in other parts of the deployment.
Support Bundle for offline troubleshooting Another way to gather troubleshooting information is using the Application Support Bundle as follows.
Service logs for troubleshooting Service logs are collected through ELK pipeline and presented in the service logs page. You can search and explore the logs through the embedded Kibana log browser.
ElasticSearch data and the Events pods are crashing in longevity setup
Arango database cluster not reconciled
Postgress and dependent services not reconciled state Sometimes postgress and dependent services like Keycloak, Grafana, Apiservice, Analytics-service, Alerting-rest, and Admin-api are not getting reconciled during the deployment.
VMware Telco Cloud Service Assurance user interface displays an error message After successful login, intermittently VMware Telco Cloud Service Assurance user interface displays “Internal Server Error” message.
Flink service not reconciled Flink service does not get reconciled after stopping and starting of AKS cluster on Azure.
For some of the application reconciliation is failing If for some of the applications reconciliation fails with the error etcdserver : leader changed .
VMware Telco Cloud Service Assurance pods are not coming up After restart of seven worker nodes, postgres pods are not coming up in VMware Telco Cloud Service Assurance .
Kafka-Strimizi and Kafka-Edge Service not Reconciled Sometimes Kafka Strimzi service does not reconcile.
Cannot Deploy VMware Telco Cloud Service Assurance on AWS Unable to deploy VMware Telco Cloud Service on AWS due to aws-load-balancer-controller
issue.
Flink Manager and Topology Pods are Crashing after the Cluster Node VMs are Restarted
Clicking on the Topology Map or Topology Browser in VMware Telco Cloud Service Assurance is Throwing Error The following error occurs when using the Topology Map and Topology Browser in the VMware Telco Cloud Service Assurance UI: Database not found, response code 404, error code 1228
.
Unable to reach the Kubernetes Cluster when the Control Plane Node 1 is completely Down Unable to reach the Kubernetes cluster through the KUBECONFIG file from outside the VM or the deployer host when the control plane node1 is completely down.
Demo Footprint of VMware Telco Cloud Service Assurance with Native Kubernetes Troubleshooting In this chapter, you can find information about issues and solutions to deploying Demo footprint of VMware Telco Cloud Service Assurance in VMs with Native Kubernetes.
Pod fails with ImagePullBackOff error Post the upgrade of the VMware Telco Cloud Service Assurance in a VMbased deployment, Isitio-edge-ingressgateway pod fails with ImagePullBackOff error.
Patching the CaaS Deployment Using the TCX CaaS Installer Patching refers to applying an update to one or many components of the CaaS deployment. The update could be a simple configuration change like changing the location of a log file or a major upgrade of the Load Balancer, Storage interfaces or the Harbor container registry. The CaaS installer is implemented using Ansible so the patching process leverages the idempotency, failure resiliency and capabilities to skip/select specific tasks provided by Ansible and its modules.
Multi-Attach error when the node VM shutdown during upgrade in VM Based deployment
Image not found error during upgrade in VM Based Deployment
Upgrading VMware Telco Cloud Service Assurance 2.3 to 2.4 on Azure Fails This section provides you the troubleshooting procedure to fix the failure related to the redis-cluster application that causes the failure of upgrading VMware Telco Cloud Service Assurance 2.3 to 2.4 on Azure.
VMware Telco Cloud Service Assurance Topology Troubleshooting
ImagePullBackOff Error Post Deployment of VMware Telco Cloud Service Assurance Post deployment of VMware Telco Cloud Service Assurance , if the ImagePullBackOff error is observed for a few pods, perform the following procedure to resolve it.
SSL Handshake Error Post Upgrade If there are any SSL Handshake errors and the discovery fails in the Domain Managers while discovering VMware Aria Operations, VMware Telco Cloud Automation , Cisco ACI, VCD, and vIMS after upgrading the Domain Manager and VMware Telco Cloud Service Assurance core then perform the following steps.
Backup-and-restore and crds-tcx apps are not deleted from default namespace during upgrade While upgrading VMware Telco Cloud Service Assurance to 2.4, backup-and-restore and crds-tcx apps are not deleted from the default namespace. It shows the following error.
vSphere-csi pod crashes when the Node VM gets restarted in VM Based Deployment vSphere-csi pod crashes when the Node VM gets restarted in VM Based Deployment. When a node VM gets restarted, VRRP entry might be missing in that node and this issues occurs.
crds-tcx-platform Appilcation Fails to Reconcile or Killed During Deployment If the crds-tcx-platform application fails to reconcile or killed then you must increate the KAPP controller resource on TKG.