This section helps users troubleshoot problems on their Kubernetes clusters when deploying applications from the VMware Tanzu Application Catalog (Tanzu Application Catalog).
Tanzu Application Catalog Helm charts provide an easy way to install and manage applications on Kubernetes, while following best practices in terms of security, efficiency and performance.
The following are the most common issues that users face:
Use the sections below to identify and debug these issues.
Tanzu Application Catalog Helm charts support different alternatives for managing credentials such as passwords, keys or tokens:
Relying on Helm to generate random credentials is the root of many issues when dealing with chart upgrades on stateful apps. However, this option is offered to improve the UX for developers.
Here is an example illustrating the issue:
bash
$ helm install MY-RELEASE REPOSITORY/postgresql
Since no credentials were specified and no existing Secrets are being reused, a random alphanumeric password is generated for the postgres
(admin) user of the database. This password is also used by readiness and liveness probes on PostgreSQL pods. Obtain the generated password using the commands below:
$ export POSTGRES_PASSWORD=$(kubectl get secret --namespace default MY-RELEASE-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
$ echo $POSTGRES_PASSWORD
8aQdvrEhr5
Verify the password by logging in to the PostgreSQL server using this password:
$ kubectl run postgresql-client --rm --tty -i --restart='Never' --namespace default --image REPOSITORY/postgresql:latest --env="PGPASSWORD=$POSTGRES_PASSWORD" --command -- psql --host MY-RELEASE-postgresql -U postgres -d postgres -p 5432
...
postgres=#
Upgrade the chart release:
$ helm upgrade MY-RELEASE REPOSITORY/postgresql
Obtain the password again. You will find that a new password has been generated, as shown in the example below:
$ export POSTGRES_PASSWORD=$(kubectl get secret --namespace default MY-RELEASE-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
$ echo $POSTGRES_PASSWORD
7C91EMpVDH
Try to log in to PostgreSQL using the new password:
$ kubectl run postgresql-client --rm --tty -i --restart='Never' --namespace default --image REPOSITORY/postgresql:latest --env="PGPASSWORD=$POSTGRES_PASSWORD" --command -- psql --host MY-RELEASE-postgresql -U postgres -d postgres -p 5432
psql: FATAL: password authentication failed for user "postgres"
As can be seen above, the credentials available in the Secret are no longer valid to access PostgreSQL after upgrading.
The reason for this behaviour lies in how Tanzu Application Catalog containers/charts work:
Therefore, even if the containers are forcefully restarted after upgrading the chart, they will continue to reuse the persistent data that was created when the chart was first deployed. As a result, the persistent data and the Secret go out-of-sync with each other.
NoteSome validations have been recently added to some charts to warn users when trying to upgrade without specifying credentials.
This issue is easily resolved by rolling back the chart release. Continuing with the previous example:
Obtain the history of the release:
$ helm history MY-RELEASE
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Thu Oct 22 16:12:34 2020 superseded postgresql-9.8.5 11.9.0 Install complete
2 Thu Oct 22 16:16:42 2020 deployed postgresql-9.8.5 11.9.0 Upgrade complete
Rollback to the first revision:
$ helm rollback MY-RELEASE 1
Rollback was a success! Happy Helming!
Obtain the original credentials and upgrade the release by passing the original credentials to the upgrade process:
$ export POSTGRESQL_PASSWORD=$(kubectl get secret --namespace default MY-RELEASE-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode)
$ helm upgrade MY-RELEASE REPOSITORY/postgresql --set postgresqlPassword=$POSTGRESQL_PASSWORD
Check the history again:
$ helm history MY-RELEASE-2
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Thu Oct 22 16:12:34 2020 superseded postgresql-9.8.5 11.9.0 Install complete
2 Thu Oct 22 16:16:42 2020 deployed postgresql-9.8.5 11.9.0 Upgrade complete
3 Thu Oct 22 16:37:22 2020 superseded postgresql-9.8.5 11.9.0 Rollback to 1
4 Thu Oct 22 16:45:07 2020 deployed postgresql-9.8.5 11.9.0 Upgrade complete
$ kubectl run postgresql-client --rm --tty -i --restart='Never' --namespace default --image REPOSITORY/postgresql:latest --env="PGPASSWORD=$POSTGRES_PASSWORD" --command -- psql --host MY-RELEASE-postgresql -U postgres -d postgres -p 5432
...
postgres=#
Login should now succeed.
NoteWhen specifying an existing secret, the password(s) inside the Secret will not be autogenerated by Helm and therefore will remain at their original value(s).
If a Helm Chart includes a Statefulset which uses VolumeClaimTemplates to generate new Persistent Volume Claims (PVCs) for each replica created, Helm does not track those PVCs. Therefore, when uninstalling a chart release with these characteristics, the PVCs (and associated Persistent Volumes) are not removed from the cluster. This a known limitation with Helm.
Reproduce the issue by following the steps below:
Install a chart that uses a Statefulset such as the MariaDB chart and wait for the release to be marked as successful. Replace the REPOSITORY placeholder with a reference to your Tanzu Application Catalog chart repository.
$ helm install MY-RELEASE REPOSITORY/mariadb --wait
Uninstall the release:
$ helm uninstall MY-RELEASE
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-MY-RELEASE-mariadb-0 Bound pvc-ec438751-efa8-4eb2-ad91-f5d808f913ba 8Gi RWO standard 1m52s
This can cause issues when reinstalling charts that reuse the release name. For instance, following the example shown above:
Install the MariaDB chart again using the same release name:
$ helm install MY-RELEASE REPOSITORY/mariadb
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-MY-RELEASE-mariadb-0 Bound pvc-ec438751-efa8-4eb2-ad91-f5d808f913ba 8Gi RWO standard 4m32s
This is problematic for the same reasons as those listed in the previous section related to credentials.
This issue appears when installing a new release. In other words, it happens before any data is added to the application. Therefore, it can be solved by simply uninstalling the chart, removing the existing PVCs and installing the chart again:
$ helm uninstall MY-RELEASE REPOSITORY/mariadb
$ kubectl delete pvc data-MY-RELEASE-mariadb-0
The great majority of Tanzu Application Catalog containers are, by default, non-root. This means they are executed with a non-privileged user to add an extra layer of security. However, because they run as a non-root user, privileged tasks are typically off-limits and there are a few considerations to keep in mind when using them.
The guide container hardening best practices explains the key considerations when using non-root containers. As explained there, one of the main drawbacks of using non-root containers relates to mounting volumes in these containers, as they do not have the necessary privileges to modify the ownership of the filesystem as needed.
The example below analyzes a common issue faced by Tanzu Application Catalog users in this context:
Install a chart that uses a non-root container, such as the MongoDB chart. Replace the REPOSITORY placeholder with a reference to your Tanzu Application Catalog chart repository.
$ helm install MY-RELEASE REPOSITORY/mongodb
A CrashLoopBackOff
error may occur:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
MY-RELEASE-mongodb-58f6f48f87-vvc7m 0/1 CrashLoopBackOff 1 48s
Inspect the logs of the container:
$ kubectl logs MY-RELEASE-mongodb-58f6f48f87-vvc7m
...
mongodb 08:56:27.12 INFO ==> ** Starting MongoDB setup **
mongodb 08:56:27.14 INFO ==> Validating settings in MONGODB_* env vars...
mkdir: cannot create directory '.../mongodb/data': Permission denied
* As the log displays a "Permission denied" error, inspect the pod:
```bash
$ kubectl describe pod MY-RELEASE-mongodb-58f6f48f87-vvc7m
...
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: MY-RELEASE-mongodb
ReadOnly: false
The container is seen to be exiting with an error code, since it cannot create a directory under the directory where a Persistent Volume has been mounted. This is caused due to inadequate filesystem permissions.
Tanzu Application Catalog charts are configured to use, by default, a Kubernetes SecurityContext to automatically modify the ownership of the attached volumes. However, this feature does not work if:
To address this challenge, Tanzu Application Catalog charts also support, as an alternative mechanism, using an initContainer
to change the ownership of the volume before mounting it. Continuing the example above, resolve the issue using the commands below:
Upgrade the chart release enabling the initContainer
that adapts the permissions:
$ helm upgrade MY-RELEASE REPOSITORY/mongodb --set volumePermissions.enabled=true
CrashLoopBackOff
error no longer occurs: $ kubectl get pods
NAME READY STATUS RESTARTS AGE
MY-RELEASE-mongodb-6d78bdc996-tdj2d 1/1 Running 0 1m31s
## Troubleshoot Kubernetes service names using DNS
When the Service is using a different namespace or it is simply not available, it can be caused because the Service has not been properly registered.
$ kubectl get svc
$ kubectl get endpoints
If the service is registered, run the kubectl get pods
command to get the UID for your pod. Then run the command below, this will give you an indication if the DNS resolution is working or not. (Remember to replace the POD-UID placeholder with the pod UID and the SVC-NAME placeholder with the DNS name of the service).
$ kubectl exec -ti POD-UID nslookup SVC-NAME
If the error persists, then confirm that DNS is enabled for your Kubernetes cluster. For more information on how to debug DNS resolution, see the Kubernetes official documentation.
Tip For Minikube users, the command
minikube addons list
will give you a list of all enabled features. If it is deactivated, enable it and try again.
You are not able to find the external IP address of a node.
To get the IP address, execute:
$ kubectl get nodes -o yaml
If you are using Minikube, you can alternatively use the command below:
$ minikube ip
kubectl
issuesThere are two common scenarios due to errors when running kubectl
:
kubectl
does not permit access to certain resources. This occurs when you run the kubectl
command and see the following error:...the server does not allow access to the requested resource
kubectl
is not able to find your nodes. This occurs when you run the kubectl get nodes
command and see the following error:...the server doesn't have a resource type "nodes"
kubectl
does not permit access to certain resourcesYou probably are experiencing issues with the Kubernetes Role-Based Access Control (RBAC) policies. This typically occurs when RBAC is enabled, which is the default situation from Kubernetes 1.6 onwards.
To resolve this issue, you must create and deploy the necessary RBAC policies for users and resources.
kubectl
is not able to find your nodesThis occurs because the authentication credentials are not correctly set.
To resolve this, copy the configuration file /etc/kubernetes/admin.conf
to ~/.kube/config
in a regular user account (with sudo
if necessary) and try again.
NoteThis command should not be performed in the
root
user account.
$ cp /etc/kubernetes/admin.conf ~/.kube/config
To check the status of your persistence volumes, run the kubectl get pvc
command. If the output message shows that your PVC status is pending
, this may be because your cluster does not support dynamic provisioning (such as a bare metal cluster). In this case, the pod is unable to start because the cluster is unable to fulfil the request for a persistent volume and attach it to the container.
To fix this, you must manually configure some persistent volumes or set up a StorageClass resource and provisioner for dynamic volume provisioning, such as the NFS provisioner. Learn more about dynamic provisioning and storage classes.
If the application does not finish the initialization and continuously restarts, it may be because both the readinessProbes
and livenessProbes
are failing. You can use the kubectl logs
command to check the startup logs and obtained more detailed information.
To detect if the problem lies with the the Persistence Volume Claim (PVC) technology, launch the chart again, but this time, deactivate persistence by adding the persistence.enabled=false
parameter.
If the startup goes fine, then the drop in the performance is due to the PVC.
There are two different ways to solve this issue:
Option 1: Check the available storage classes that your Kubernetes solution provides (that would depend on the cloud provider you choose). The best option here is to check the cloud provider documentation and find available alternatives. If alternatives exist, try them to see if they provide better performance.
Option 2: If changing the PVC storage class is not an available option, the best solution is to tweak the liveness and readiness probes to higher values (such as the initialDelaySeconds
or the periodSeconds
values).
To check if the status of your pods is healthy, the easiest way is to run the kubectl get pods
command. After that, you can use kubectl describe
and kubectl logs
to obtain more detailed information.
Once you run kubectl get pods
, you may find your pods showing any of the following statuses:
Pending
or CrashLoopBackOff
ImagePullBackOff
or ErrImagePull
Pending
or CrashLoopBackOff
If you see any of your pods showing a Pending
or CrashLoopBackOff
status when running the kubectl get pods
command, this means that the pod could not be scheduled on a node.
Usually, this is because of insufficient CPU or memory resources. It could also arise due to the absence of a network overlay or a volume provider. To confirm the cause, note the pod identifier from the previous command and then run the command below, replacing the POD-UID placeholder with the correct identifier:
$ kubectl describe pod POD-UID
You should see an output message providing some information about why the pod is pending. Check the example below:
If available resources are insufficient, try the following steps:
Option 2: Free up existing cluster resources by terminating unneeded pods and nodes. To do so:
Confirm the name of the node you want to remove using kubectl get nodes
and make sure that all the pods on the node can terminate without issues (replace NODE-NAME with the name of the node you want to terminate):
$ kubectl get nodes
$ kubectl get pods -o wide | grep NODE-NAME
Evict all user pods from the node:
$ kubectl drain NODE-NAME
Now you can safely remove the node:
$ kubectl delete node NODE-NAME
Free up the cluster by terminating unneeded deployments. First check the name of your releases using the helm list` command and then, delete the unneeded deployment (replace RELEASE-NAME with the name of the release you want to delete):
$ helm list
$ helm delete RELEASE-NAME
ImagePullBackOff
or ErrImagePull
When your pod status is ImagePullBackOff
or ErrImagePull
, this means that the pod could not run because it could not pull the image. To confirm this, note the pod identifier from the previous command and then run the command below, replacing the POD-UID placeholder with the correct identifier:
$ kubectl describe pod POD-UID
The output of the command will show you more information about the failed pull. Check that the image name is correct and try pulling the image manually on the host using docker pull
. For example, to manually pull an image from Docker Hub, use the command below by replacing IMAGE with the image ID:
$ docker pull IMAGE
Sometimes, when you start a new installation in Kubernetes, you may find that a Service doesn’t respond when you try to access it, although it was created by a pod in a deployment.
Check that the Service you are trying to access actually exists, by running the following (replace SVC-NAME with the name of the service you want to access):
$ kubectl get svc SVC-NAME
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
SVC-NAME LoadBalancer 10.63.245.23 35.242.213.194 5432:32749/TCP 1m
No resources found.
Error from server (NotFound): services "SVC-NAME" not found
Create the service if it does not exist. To create a Service you can either start the Service with a .yaml
file or create it directly from the terminal window.
Here is an example .yaml
file. Remember to replace the SVC-NAME, PORT, and TARGET-PORT placeholders with the name of the Service, and the ports you want to set for your Service.
apiVersion: v1
kind: Service
metadata:
labels:
app: SVC-NAME
name: SVC-NAME
spec:
selector:
app: SVC-NAME
ports:
- name: default
protocol: TCP
port: PORT
targetPort: TARGET-PORT
It is also possible to create a Service with the following commands. Remember to replace the SVC-NAME, PORT, and TARGET-PORT placeholders with the name of the Service, and the ports you want to set for your Service.
$ kubectl expose deployment SVC-NAME --port=PORT --target-port=TARGET-PORT
You should get a message similar to this:
service/SVC-NAME exposed
kubectl get svc
command again:$ kubectl get svc SVC-NAME
Sometimes when a user uploads a customization file, the build fails and throws an error message in the “Customization info” section. There are four possible scenarios that users may encounter when customizing a container in Tanzu Application Catalog as explained below.
Build failed. Please contact support for more information on this issue.
This error message means that it was an unknown error during the build process.
File a support ticket via the Cloud Services Portal by following these instructions.
Build failed. There was an error applying the customization. Please try updating and uploading the customization again.
This error message means that there was an error applying the customization due to, for example, the user included a script in its customization files and this failed to execute. In this case, the output message will provide some detaisl about the error including the error returned by the customization script executed.
Click the “Download source code” to download a file that includes all the receipts needed to build the original image and the customization tar.gz file.
Edit your customization and upload a new file by following these instructions.
Build failed. Something went wrong in the testing phase. The application will be unavailable until the issue is resolved.
This error message means that there was an issue during the testing phase and the team is investigating it.
This requires no action from the user. Wait until the container shows its release status as “Released”. To learn how to check the customization status, see this section.
Build failed. Something went wrong in the build phase. The application will be unavailable until the issue is resolved.
There was an issue during the build process and the team is investigating it.
This requires no action from the user. Wait until the container shows its release status as “Released”. To learn how to check the customization status, see this section.