Troubleshooting Tanzu Application Catalog

This section helps users troubleshoot problems on their Kubernetes clusters when deploying applications from the VMware Tanzu Application Catalog (Tanzu Application Catalog).

Troubleshoot Tanzu Application Catalog Helm Charts

Tanzu Application Catalog Helm charts provide an easy way to install and manage applications on Kubernetes, while following best practices in terms of security, efficiency and performance.

Common issues

The following are the most common issues that users face:

  • Credential errors while upgrading chart releases
  • Issues with existing Persistence Volumes (PVs) from previous releases
  • Permission errors when enabling persistence

Use the sections below to identify and debug these issues.

Troubleshoot credential errors while upgrading chart releases

Tanzu Application Catalog Helm charts support different alternatives for managing credentials such as passwords, keys or tokens:

  • Setting the available parameters in the charts to specify any desired value
  • Using existing Secrets (manually created before the chart installation) that contains the required credentials
  • Generating random alphanumeric credentials when none of the previous methods are not chosen (not recommended for production environments)

Relying on Helm to generate random credentials is the root of many issues when dealing with chart upgrades on stateful apps. However, this option is offered to improve the UX for developers.

Here is an example illustrating the issue:

  1. Install a chart that uses a stateful application, such as the PostgreSQL chart. Replace the REPOSITORY placeholder with a reference to your Tanzu Application Catalog chart repository.
    bash
    $ helm install MY-RELEASE REPOSITORY/postgresql
    
    
  2. Since no credentials were specified and no existing Secrets are being reused, a random alphanumeric password is generated for the postgres (admin) user of the database. This password is also used by readiness and liveness probes on PostgreSQL pods. Obtain the generated password using the commands below:

    $ export POSTGRES_PASSWORD=$(kubectl get secret --namespace default MY-RELEASE-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
    $ echo $POSTGRES_PASSWORD
    8aQdvrEhr5
    
  3. Verify the password by logging in to the PostgreSQL server using this password:

    $ kubectl run postgresql-client --rm --tty -i --restart='Never' --namespace default --image REPOSITORY/postgresql:latest --env="PGPASSWORD=$POSTGRES_PASSWORD" --command -- psql --host MY-RELEASE-postgresql -U postgres -d postgres -p 5432
    ...
    postgres=#
    
    
  4. Upgrade the chart release:

    $ helm upgrade MY-RELEASE REPOSITORY/postgresql
    
    
  5. Obtain the password again. You will find that a new password has been generated, as shown in the example below:

    $ export POSTGRES_PASSWORD=$(kubectl get secret --namespace default MY-RELEASE-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
    $ echo $POSTGRES_PASSWORD
    7C91EMpVDH
    
    
  6. Try to log in to PostgreSQL using the new password:

    $ kubectl run postgresql-client --rm --tty -i --restart='Never' --namespace default --image REPOSITORY/postgresql:latest --env="PGPASSWORD=$POSTGRES_PASSWORD" --command -- psql --host MY-RELEASE-postgresql -U postgres -d postgres -p 5432
    psql: FATAL:  password authentication failed for user "postgres"
    As can be seen above, the credentials available in the Secret are no longer valid to access PostgreSQL after upgrading.
    
    

The reason for this behaviour lies in how Tanzu Application Catalog containers/charts work:

  • Tanzu Application Catalog containers configure/initialize applications the first time they are deployed. However, they skip this configuration/initialization step and reuse persisted data if detected on subsequent deployments or upgrades.
  • Tanzu Application Catalog charts packaging stateful apps enable persistence by default using Persistent Volume Claims (PVCs).

Therefore, even if the containers are forcefully restarted after upgrading the chart, they will continue to reuse the persistent data that was created when the chart was first deployed. As a result, the persistent data and the Secret go out-of-sync with each other.

Note

Some validations have been recently added to some charts to warn users when trying to upgrade without specifying credentials.

Solutions

This issue is easily resolved by rolling back the chart release. Continuing with the previous example:

  1. Obtain the history of the release:

    $ helm history MY-RELEASE
    REVISION    UPDATED                    STATUS       CHART               APP VERSION DESCRIPTION
    1           Thu Oct 22 16:12:34 2020   superseded   postgresql-9.8.5    11.9.0      Install complete
    2           Thu Oct 22 16:16:42 2020   deployed     postgresql-9.8.5    11.9.0      Upgrade complete
    
  2. Rollback to the first revision:

    $ helm rollback MY-RELEASE 1
    Rollback was a success! Happy Helming!
    
  3. Obtain the original credentials and upgrade the release by passing the original credentials to the upgrade process:

    $ export POSTGRESQL_PASSWORD=$(kubectl get secret --namespace default MY-RELEASE-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
    $ helm upgrade MY-RELEASE REPOSITORY/postgresql --set postgresqlPassword=$POSTGRESQL_PASSWORD
    
  4. Check the history again:

    $ helm history MY-RELEASE-2
    REVISION    UPDATED                    STATUS       CHART               APP VERSION DESCRIPTION
    1           Thu Oct 22 16:12:34 2020   superseded   postgresql-9.8.5    11.9.0      Install complete
    2           Thu Oct 22 16:16:42 2020   deployed     postgresql-9.8.5    11.9.0      Upgrade complete
    3           Thu Oct 22 16:37:22 2020   superseded   postgresql-9.8.5    11.9.0      Rollback to 1
    4           Thu Oct 22 16:45:07 2020   deployed     postgresql-9.8.5    11.9.0      Upgrade complete
    
  5. Log in to PostgreSQL using the original credentials:
    $ kubectl run postgresql-client --rm --tty -i --restart='Never' --namespace default --image REPOSITORY/postgresql:latest --env="PGPASSWORD=$POSTGRES_PASSWORD" --command -- psql --host MY-RELEASE-postgresql -U postgres -d postgres -p 5432
    ...
    postgres=#
    Login should now succeed.
    
    
Note

When specifying an existing secret, the password(s) inside the Secret will not be autogenerated by Helm and therefore will remain at their original value(s).

Troubleshoot Persistence Volumes (PVs) retained from previous releases

If a Helm Chart includes a Statefulset which uses VolumeClaimTemplates to generate new Persistent Volume Claims (PVCs) for each replica created, Helm does not track those PVCs. Therefore, when uninstalling a chart release with these characteristics, the PVCs (and associated Persistent Volumes) are not removed from the cluster. This a known limitation with Helm.

Reproduce the issue by following the steps below:

  1. Install a chart that uses a Statefulset such as the MariaDB chart and wait for the release to be marked as successful. Replace the REPOSITORY placeholder with a reference to your Tanzu Application Catalog chart repository.

    $ helm install MY-RELEASE REPOSITORY/mariadb --wait
    
  2. Uninstall the release:

    $ helm uninstall MY-RELEASE
    
  3. List the available PVC(s) to confirm they were not removed:
    $ kubectl get pvc
    NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    data-MY-RELEASE-mariadb-0   Bound    pvc-ec438751-efa8-4eb2-ad91-f5d808f913ba   8Gi        RWO            standard       1m52s
    This can cause issues when reinstalling charts that reuse the release name. For instance, following the example shown above:
    
    
  4. Install the MariaDB chart again using the same release name:

    $ helm install MY-RELEASE REPOSITORY/mariadb
    
  5. Helm does not complain since the previous release was uninstalled and therefore no name conflict is detected. However, if the PVCs are listed again, it can be seen that the previous PVCs are being reused:
    $ kubectl get pvc
    NAME                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    data-MY-RELEASE-mariadb-0   Bound    pvc-ec438751-efa8-4eb2-ad91-f5d808f913ba   8Gi        RWO            standard       4m32s
    
    

This is problematic for the same reasons as those listed in the previous section related to credentials.

Solution

This issue appears when installing a new release. In other words, it happens before any data is added to the application. Therefore, it can be solved by simply uninstalling the chart, removing the existing PVCs and installing the chart again:

$ helm uninstall MY-RELEASE REPOSITORY/mariadb
$ kubectl delete pvc data-MY-RELEASE-mariadb-0

Troubleshoot permission errors when enabling persistence

The great majority of Tanzu Application Catalog containers are, by default, non-root. This means they are executed with a non-privileged user to add an extra layer of security. However, because they run as a non-root user, privileged tasks are typically off-limits and there are a few considerations to keep in mind when using them.

The guide container hardening best practices explains the key considerations when using non-root containers. As explained there, one of the main drawbacks of using non-root containers relates to mounting volumes in these containers, as they do not have the necessary privileges to modify the ownership of the filesystem as needed.

The example below analyzes a common issue faced by Tanzu Application Catalog users in this context:

  1. Install a chart that uses a non-root container, such as the MongoDB chart. Replace the REPOSITORY placeholder with a reference to your Tanzu Application Catalog chart repository.

    $ helm install MY-RELEASE REPOSITORY/mongodb
    
  2. A CrashLoopBackOff error may occur:

    $ kubectl get pods
    NAME                                  READY   STATUS             RESTARTS   AGE
    MY-RELEASE-mongodb-58f6f48f87-vvc7m   0/1     CrashLoopBackOff   1          48s
    
  3. Inspect the logs of the container:

    $ kubectl logs MY-RELEASE-mongodb-58f6f48f87-vvc7m
    ...
    mongodb 08:56:27.12 INFO  ==> ** Starting MongoDB setup **
    mongodb 08:56:27.14 INFO  ==> Validating settings in MONGODB_* env vars...
    mkdir: cannot create directory '.../mongodb/data': Permission denied
    * As the log displays a "Permission denied" error, inspect the pod:
    
    ```bash
    $ kubectl describe pod MY-RELEASE-mongodb-58f6f48f87-vvc7m
    ...
    Volumes:
      datadir:
        Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
        ClaimName:  MY-RELEASE-mongodb
        ReadOnly:   false
    
    

The container is seen to be exiting with an error code, since it cannot create a directory under the directory where a Persistent Volume has been mounted. This is caused due to inadequate filesystem permissions.

Solution

Tanzu Application Catalog charts are configured to use, by default, a Kubernetes SecurityContext to automatically modify the ownership of the attached volumes. However, this feature does not work if:

  • Your Kubernetes distribution has no support for SecurityContexts.
  • The Storage Class used to provision the Persistent Volumes has no support to modify the volumes’ filesystem.

To address this challenge, Tanzu Application Catalog charts also support, as an alternative mechanism, using an initContainer to change the ownership of the volume before mounting it. Continuing the example above, resolve the issue using the commands below:

  1. Upgrade the chart release enabling the initContainer that adapts the permissions:

    $ helm upgrade MY-RELEASE REPOSITORY/mongodb --set volumePermissions.enabled=true
    
  2. List the pods again to confirm that the CrashLoopBackOff error no longer occurs:
    $ kubectl get pods
    NAME                                  READY   STATUS    RESTARTS   AGE
    MY-RELEASE-mongodb-6d78bdc996-tdj2d   1/1     Running   0          1m31s
    
    

## Troubleshoot Kubernetes service names using DNS

When the Service is using a different namespace or it is simply not available, it can be caused because the Service has not been properly registered.

Solution

  1. The first step is to check if the Service name you are using is correct. To do so, run these commands to check if the Service is registered and the pods selected:
    $ kubectl get svc
    $ kubectl get endpoints
    
  2. If the service is registered, run the kubectl get pods command to get the UID for your pod. Then run the command below, this will give you an indication if the DNS resolution is working or not. (Remember to replace the POD-UID placeholder with the pod UID and the SVC-NAME placeholder with the DNS name of the service).

    $ kubectl exec -ti POD-UID nslookup SVC-NAME
    

If the error persists, then confirm that DNS is enabled for your Kubernetes cluster. For more information on how to debug DNS resolution, see the Kubernetes official documentation.

Tip For Minikube users, the command minikube addons list will give you a list of all enabled features. If it is deactivated, enable it and try again.

Troubleshoot IP address issues

You are not able to find the external IP address of a node.

Solution

To get the IP address, execute:

$ kubectl get nodes -o yaml

Check node IP address

If you are using Minikube, you can alternatively use the command below:

$ minikube ip

Troubleshoot kubectl issues

There are two common scenarios due to errors when running kubectl:

  • Error 1: kubectl does not permit access to certain resources. This occurs when you run the kubectl command and see the following error:
...the server does not allow access to the requested resource
  • Error 2: kubectl is not able to find your nodes. This occurs when you run the kubectl get nodes command and see the following error:
...the server doesn't have a resource type "nodes"

Solutions

Error 1: kubectl does not permit access to certain resources

You probably are experiencing issues with the Kubernetes Role-Based Access Control (RBAC) policies. This typically occurs when RBAC is enabled, which is the default situation from Kubernetes 1.6 onwards.

To resolve this issue, you must create and deploy the necessary RBAC policies for users and resources.

Error 2: kubectl is not able to find your nodes

This occurs because the authentication credentials are not correctly set.

To resolve this, copy the configuration file /etc/kubernetes/admin.conf to ~/.kube/config in a regular user account (with sudo if necessary) and try again.

Note

This command should not be performed in the root user account.

$ cp /etc/kubernetes/admin.conf ~/.kube/config

Troubleshoot persistence volumes

To check the status of your persistence volumes, run the kubectl get pvc command. If the output message shows that your PVC status is pending, this may be because your cluster does not support dynamic provisioning (such as a bare metal cluster). In this case, the pod is unable to start because the cluster is unable to fulfil the request for a persistent volume and attach it to the container.

Check the status of your PVC

Solution

To fix this, you must manually configure some persistent volumes or set up a StorageClass resource and provisioner for dynamic volume provisioning, such as the NFS provisioner. Learn more about dynamic provisioning and storage classes.

Troubleshoot performance of persistence volumes

If the application does not finish the initialization and continuously restarts, it may be because both the readinessProbes and livenessProbes are failing. You can use the kubectl logs command to check the startup logs and obtained more detailed information.

To detect if the problem lies with the the Persistence Volume Claim (PVC) technology, launch the chart again, but this time, deactivate persistence by adding the persistence.enabled=false parameter.

If the startup goes fine, then the drop in the performance is due to the PVC.

Solutions

There are two different ways to solve this issue:

  • Option 1: Check the available storage classes that your Kubernetes solution provides (that would depend on the cloud provider you choose). The best option here is to check the cloud provider documentation and find available alternatives. If alternatives exist, try them to see if they provide better performance.

  • Option 2: If changing the PVC storage class is not an available option, the best solution is to tweak the liveness and readiness probes to higher values (such as the initialDelaySeconds or the periodSeconds values).

Troubleshoot pods

To check if the status of your pods is healthy, the easiest way is to run the kubectl get pods command. After that, you can use kubectl describe and kubectl logs to obtain more detailed information.

Check the status of your pods

Solutions

Once you run kubectl get pods, you may find your pods showing any of the following statuses:

  • Pending or CrashLoopBackOff
  • ImagePullBackOff or ErrImagePull

Pod status is Pending or CrashLoopBackOff

If you see any of your pods showing a Pending or CrashLoopBackOff status when running the kubectl get pods command, this means that the pod could not be scheduled on a node.

Usually, this is because of insufficient CPU or memory resources. It could also arise due to the absence of a network overlay or a volume provider. To confirm the cause, note the pod identifier from the previous command and then run the command below, replacing the POD-UID placeholder with the correct identifier:

$ kubectl describe pod POD-UID

You should see an output message providing some information about why the pod is pending. Check the example below:

Check information about a pod with a pending status

If available resources are insufficient, try the following steps:

  • Option 1: Add more nodes to the cluster. Check the documentation of the chart you have installed to learn how to add more nodes to the cluster.
  • Option 2: Free up existing cluster resources by terminating unneeded pods and nodes. To do so:

  • Confirm the name of the node you want to remove using kubectl get nodes and make sure that all the pods on the node can terminate without issues (replace NODE-NAME with the name of the node you want to terminate):

    $ kubectl get nodes
    $ kubectl get pods -o wide | grep NODE-NAME
    
  • Evict all user pods from the node:

    $ kubectl drain NODE-NAME
    
  • Now you can safely remove the node:

    $ kubectl delete node NODE-NAME
    
  • Free up the cluster by terminating unneeded deployments. First check the name of your releases using the helm list` command and then, delete the unneeded deployment (replace RELEASE-NAME with the name of the release you want to delete):

    $ helm list
    $ helm delete RELEASE-NAME
    

Pod status is ImagePullBackOff or ErrImagePull

When your pod status is ImagePullBackOff or ErrImagePull, this means that the pod could not run because it could not pull the image. To confirm this, note the pod identifier from the previous command and then run the command below, replacing the POD-UID placeholder with the correct identifier:

$ kubectl describe pod POD-UID

The output of the command will show you more information about the failed pull. Check that the image name is correct and try pulling the image manually on the host using docker pull. For example, to manually pull an image from Docker Hub, use the command below by replacing IMAGE with the image ID:

$ docker pull IMAGE

Troubleshoot services

Sometimes, when you start a new installation in Kubernetes, you may find that a Service doesn’t respond when you try to access it, although it was created by a pod in a deployment.

Solutions

  1. Check that the Service you are trying to access actually exists, by running the following (replace SVC-NAME with the name of the service you want to access):

    $ kubectl get svc SVC-NAME
    
    • If the Service exists, you should see an output message similar to this:
    NAME           TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)            AGE
    SVC-NAME       LoadBalancer   10.63.245.23    35.242.213.194   5432:32749/TCP     1m
    
    • If the Service doesn’t exist, you will get an output message similar to this:
    No resources found.
    Error from server (NotFound): services "SVC-NAME" not found
    
  2. Create the service if it does not exist. To create a Service you can either start the Service with a .yaml file or create it directly from the terminal window.

Here is an example .yaml file. Remember to replace the SVC-NAME, PORT, and TARGET-PORT placeholders with the name of the Service, and the ports you want to set for your Service.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: SVC-NAME
  name: SVC-NAME
spec:
  selector:
    app: SVC-NAME
  ports:
  - name: default
    protocol: TCP
    port: PORT
    targetPort: TARGET-PORT

It is also possible to create a Service with the following commands. Remember to replace the SVC-NAME, PORT, and TARGET-PORT placeholders with the name of the Service, and the ports you want to set for your Service.

$ kubectl expose deployment SVC-NAME --port=PORT --target-port=TARGET-PORT

You should get a message similar to this:

service/SVC-NAME exposed
  1. Check that the Service was correctly created by running the kubectl get svc command again:
$ kubectl get svc SVC-NAME

Troubleshoot application customization issues

Sometimes when a user uploads a customization file, the build fails and throws an error message in the “Customization info” section. There are four possible scenarios that users may encounter when customizing a container in Tanzu Application Catalog as explained below.

Build status is Build failed. Please contact support for more information on this issue.

This error message means that it was an unknown error during the build process.

Solution

File a support ticket via the Cloud Services Portal by following these instructions.

Build status is Build failed. There was an error applying the customization. Please try updating and uploading the customization again.

This error message means that there was an error applying the customization due to, for example, the user included a script in its customization files and this failed to execute. In this case, the output message will provide some detaisl about the error including the error returned by the customization script executed.

Click the “Download source code” to download a file that includes all the receipts needed to build the original image and the customization tar.gz file.

Check information about a pod with a pending status

Solution

Edit your customization and upload a new file by following these instructions.

Build status is Build failed. Something went wrong in the testing phase. The application will be unavailable until the issue is resolved.

This error message means that there was an issue during the testing phase and the team is investigating it.

Solution

This requires no action from the user. Wait until the container shows its release status as “Released”. To learn how to check the customization status, see this section.

Build status is Build failed. Something went wrong in the build phase. The application will be unavailable until the issue is resolved.

There was an issue during the build process and the team is investigating it.

Solution

This requires no action from the user. Wait until the container shows its release status as “Released”. To learn how to check the customization status, see this section.

check-circle-line exclamation-circle-line close-line
Scroll to top icon