Problems can occur within different layers of the VMware Telco Cloud Service Assurance stack. The symptoms can manifest as application unavailability, incorrect operations, or degradation in performance.These problems can occur during deployment as well as after during operation.
Cannot connect to the kubernetes cluster from the deployment container Running kubectl get nodes --kubeconfig /root/.kube/<your-kubernetes-cluster-kubeconfig-file> from inside the deployment container sometimes hangs.
Contents of the deployer bundle are not visible inside the container Recreating the folder by extracting the tar.gz file again or creating the folder using mkdir does not display the contents within the container. Also the contents may not be visible if the tcx-deployer folder was somehow deleted on the host.
Elasticsearch-Kibana troubleshooting Elasticsearch fails in the middle of Kibana initialization, and this failure during initialization is critical to Kibana. If it fails, you must manually remove the index because it is in a broken state.
VMware Telco Cloud Service Assurance Installation issue VMware Telco Cloud Service Assurance installation is triggered by the execution of the tcx_app_deployment.sh script. This script executes two main stages: The initialization and the installation stage.
Pod crashes after deployment Post deployment, if hdfs-datanode pod crashes.
Getting additional information for CNFs The VMware Telco Cloud Automation manager UI provides information about CNF instantiation.
Resolving edge services port conflict issue During initial deployment, the kafka-edge
service assigns a random port to the ingress gateway in charge of exposing the kafka-edge
to external clients. This random port, in certain circumstances, may conflict with another port assigned in other parts of the deployment.
Support Bundle for offline troubleshooting Another way to gather troubleshooting information is using the Application Support Bundle as follows.
Service logs for troubleshooting Service logs are collected through ELK pipeline and presented in the service logs page. You can search and explore the logs through the embedded Kibana log browser.
ElasticSearch data and the Events pods are crashing in longevity setup
Arango database cluster not reconciled
Postgress and dependent services not reconciled state Sometimes postgress and dependent services like Keycloak, Grafana, Apiservice, Analytics-service, Alerting-rest, and Admin-api are not getting reconciled during the deployment.
VMware Telco Cloud Service Assurance user interface displays an error message After successful login, intermittently VMware Telco Cloud Service Assurance user interface displays “Internal Server Error” message.
Flink service not reconciled Flink service does not get reconciled after stopping and starting of AKS cluster on Azure.
For some of the application reconciliation is failing If for some of the applications reconciliation fails with the error etcdserver : leader changed .
VMware Telco Cloud Service Assurance pods are not coming up After restart of seven worker nodes, postgres pods are not coming up in VMware Telco Cloud Service Assurance .
Kafka-Strimizi and Kafka-Edge service not reconciled Sometimes Kafka Strimzi service does not reconcile.