In this topic, you can find information about common issues and solutions from VMware Telco Cloud Operations Deployment. During deployment, few system-level services run automatically on each node (VM). These services ensure that the node might join the VMware Telco Cloud Operations cluster and start all the application services. After deployment, if VMware Telco Cloud Operations does not appear to function, the first step is to Verify the logs for these services on the node (control plane node or worker node).

  1. Ensure that VMware Telco Cloud Operations is deployed and started correctly:
    SSH into the control plane as root and run the following commands.
    Note: If you cannot SSH into the control plane node VM, then start the VM console from the vCenter and login to the node as the root user.

    If you log in as clusteradmin, run the su command to change to root:

    1. Determine if the VMware Telco Cloud Operations cluster has formed by running the command:

      # kubectl get nodes

      All the nodes must appear as “Ready”.

    2. Verify all VMware Telco Cloud Operations services are running, by running the command:

      # kubectl get pods --all-namespaces

      All pods must display a "Running" under the STATUS column and a "1/1" under the READY column.
      Note: It might take a few minutes for all services to reach this state.

    If the status is not displayed, then it is likely VMware Telco Cloud Operations did not deploy or start correctly. Follow the steps to troubleshoot.

  2. Check if the cluster deployment completed correctly. The cluster deployment process log is found in the control plane node. The log is examined to check for errors when the deployment fails. Access the log as follows:
    1. The control plane node VM must be deployed i.e. VM deployment must be complete.
    2. Log in (SSH) to the control plane node VM as user clusteradmin and the password you entered when you loaded the Node Configuration using the Administration UI. If you used the Deploy-Tool to complete the full deployment, for example, both phases then the clusteradmin password is the one you set in the deploy.settings file.

      $ ssh clusteradmin@CONTROL_PLANE_NODE_VM_IP

    3. Change the root using the su command.

      Examine the logs at /var/log/deployServer/. You will see a log file beginning with tcoDeploy. Check for any errors.

  3. Indicators of the incorrect deployment or startup:
    1. If the node marked "control-plane-node" is not "Ready":

      Log in to the control plane node VM and run the following commands as root. Verify the service logs to make sure that there are no errors or exceptions.

      1. bootstrap-cluster :

        # journalctl -u bootstrap-cluster

      2. sa-registry:

        # journalctl -u docker-compose@sa-registry

      3. harbor-init:

        # journalctl -u harbor-init

    2. If a worker node is not "Ready":

      Log in to the worker node VM. Run the following command as root and Verify that the service log does not have errors or exceptions.

      # journalctl -u bootstrap-cluster

    3. Administration UI not available:

      If Administration UI at port https://CONTROL_PLANE_NODE_IP:1081 is not available, then log in to the control plane node VM. Run the following command as root and Verify that the service log does not have errors or exceptions:

      # journalctl -u docker-compose@admin.service 
      OR
      # journalctl -u docker-compose@admin
    4. VMware Telco Cloud Operations services are not available:

      If the VMware Telco Cloud Operations services do not display "Running" and "1/1" in the STATUS and READY columns, then log in to the control plane node VM. Run the following command as root and Verify that the service log does not have errors or exceptions:

      # journalctl -u tco-init

      Also, check the logs of the bootstrap-cluster, sa-registry-init, and harbor-init services as described earlier.

  4. During an automated deployment, if one of the worker nodes fails to become “Ready”, follow the steps to redeploy the worker node:
    1. To remove the unresponsive node, run the following uninstall script:

      $ uninstall vm-name

      Where vm-name is the name of the unresponsive VM.

    2. Run the deploy-cluster script again.
  5. While logging to VMware Telco Cloud Operations with any non-preconfigured users (other than admin, maint, default, oper), the following error appears:

    Unexpected error while handling authentication request to identity provider

    The cause of the error is as follows:
    • LDAP integrated with keycloak is SSL configured , but LDAP certs were not imported in CPN node.
    • The user is trying to login with an incorrect user name.
  6. While logging to VMware Telco Cloud Operations with any non-preconfigured users (other than admin, maint, default, oper), the following error appears:

    Failed to process request, cause JSONObject["groups"] is not a JSONArray

    The cause of this error is that the user is trying to login, which is not associated to any group.