Troubleshoot using Tanzu Application Platform

In this topic, you’ll find troubleshooting information to help resolve issues using Tanzu Application Platform.

Missing build logs after creating a workload

You create a workload, but no logs appear when you check for logs by running the following command:

tanzu apps workload tail workload-name --since 10m --timestamp

Explanation

Common causes include:

  • Misconfigured repository
  • Misconfigured service account
  • Misconfigured registry credentials

Solution

To resolve this issue, run each of the following commands to receive the relevant error message:

kubectl get clusterbuilder.kpack.io -o yaml
kubectl get image.kpack.io <workload-name> -o yaml
kubectl get build.kpack.io -o yaml

“Workload already exists” error after updating the workload

When you update the workload, you receive the following error:

Error: workload "default/APP-NAME" already exists
Error: exit status 1

Where APP-NAME is the name of the app.

For example, when you run:

$ tanzu apps workload create tanzu-java-web-app \
--git-repo https://github.com/dbuchko/tanzu-java-web-app \
--git-branch main \
--type web \
--label apps.tanzu.vmware.com/has-tests=true \
--yes

You receive the following error

Error: workload "default/tanzu-java-web-app" already exists
Error: exit status 1

Explanation

The app is running before performing a live update using the same app name.

Solution

To resolve this issue, either delete the app or use a different name for the app.

Telemetry component logs show errors fetching the “reg-creds” secret

When you view the logs of the tap-telemetry controller by running kubectl logs -n tap-telemetry <tap-telemetry-controller-<hash> -f, you see the following error:

"Error retrieving secret reg-creds on namespace tap-telemetry","error":"secrets \"reg-creds\" is forbidden: User \"system:serviceaccount:tap-telemetry:controller\" cannot get resource \"secrets\" in API group \"\" in the namespace \"tap-telemetry\""

Explanation

The tap-telemetry namespace misses a Role that allows the controller to list secrets in the tap-telemetry namespace. For more information about Roles, see Role and ClusterRole in Using RBAC Authorization in the Kubernetes documentation.

Solution

To resolve this issue, run:

kubectl patch roles -n tap-telemetry tap-telemetry-controller --type='json' -p='[{"op": "add", "path": "/rules/-", "value": {"apiGroups": [""],"resources": ["secrets"],"verbs": ["get", "list", "watch"]} }]'

Debug convention may not apply

If you upgrade from Tanzu Application Platform v0.4, the debug convention may not apply to the app run image.

Explanation

The Tanzu Application Platform v0.4 lacks SBOM data.

Solution

Delete existing app images that were built using Tanzu Application Platform v0.4.

Execute bit not set for App Accelerator build scripts

You cannot execute a build script provided as part of an accelerator.

Explanation

Build scripts provided as part of an accelerator do not have the execute bit set when a new project is generated from the accelerator.

Solution

Explicitly set the execute bit by running the chmod command:

chmod +x BUILD-SCRIPT-NAME

Where BUILD-SCRIPT-NAME is the name of the build script.

For example, for a project generated from the “Spring PetClinic” accelerator, run:

chmod +x ./mvnw

“No live information for pod with ID” error

After deploying Tanzu Application Platform workloads, Tanzu Application Platform GUI shows a “No live information for pod with ID” error.

Explanation

The connector must discover the application instances and render the details in Tanzu Application Platform GUI.

Solution

Recreate the Application Live View Connector pod by running:

kubectl -n app-live-view delete pods -l=name=application-live-view-connector

This allows the connector to discover the application instances and render the details in Tanzu Application Platform GUI.

“image-policy-webhook-service not found” error

When installing a Tanzu Application Platform profile, you receive the following error:

Internal error occurred: failed calling webhook "image-policy-webhook.signing.apps.tanzu.vmware.com": failed to call webhook: Post "https://image-policy-webhook-service.image-policy-system.svc:443/signing-policy-check?timeout=10s": service "image-policy-webhook-service" not found

Explanation

The “image-policy-webhook-service” service cannot be found.

Solution

Redeploy the trainingPortal resource.

“Increase your cluster resources” error

You receive an “Increase your cluster’s resources” error.

Explanation

Node pressure may be caused by an insufficient number of nodes or a lack of resources on nodes necessary to deploy the workloads that you have.

Solution

Follow instructions from your cloud provider to scale out or scale up your cluster.

MutatingWebhookConfiguration prevents pod admission

Admission of all pods is prevented when the image-policy-controller-manager deployment pods do not start before the MutatingWebhookConfiguration is applied to the cluster.

Explanation

Pods can be prevented from starting if nodes in a cluster are scaled to zero and the webhook is forced to restart at the same time as other system components. A deadlock can occur when some components expect the webhook to verify their image signatures and the webhook is not yet running.

A known rare condition during Tanzu Application Platform profiles installation can cause this. If so, you may see a message similar to one of the following in component statuses:

Events:
  Type     Reason            Age                   From                   Message
  ----     ------            ----                  ----                   -------
  Warning  FailedCreate      4m28s                 replicaset-controller  Error creating: Internal error occurred: failed calling webhook "image-policy-webhook.signing.apps.tanzu.vmware.com": Post "https://image-policy-webhook-service.image-policy-system.svc:443/signing-policy-check?timeout=10s": no endpoints available for service "image-policy-webhook-service"
Events:
  Type     Reason            Age                   From                   Message
  ----     ------            ----                  ----                   -------
  Warning FailedCreate 10m replicaset-controller Error creating: Internal error occurred: failed calling webhook "image-policy-webhook.signing.apps.tanzu.vmware.com": Post "https://image-policy-webhook-service.image-policy-system.svc:443/signing-policy-check?timeout=10s": service "image-policy-webhook-service" not found

Solution

Delete the MutatingWebhookConfiguration resource to resolve the deadlock and enable the system to restart. After the system is stable, restore the MutatingWebhookConfiguration resource to re-enable image signing enforcement.

Important: These steps temporarily disable signature verification in your cluster.

  1. Back up MutatingWebhookConfiguration to a file by running:

    kubectl get MutatingWebhookConfiguration image-policy-mutating-webhook-configuration -o yaml > image-policy-mutating-webhook-configuration.yaml
    
  2. Delete MutatingWebhookConfiguration by running:

    kubectl delete MutatingWebhookConfiguration image-policy-mutating-webhook-configuration
    
  3. Wait until all components are up and running in your cluster, including the image-policy-controller-manager pods (namespace image-policy-system).

  4. Re-apply MutatingWebhookConfiguration by running:

    kubectl apply -f image-policy-mutating-webhook-configuration.yaml
    

Priority class of webhook’s pods preempts less privileged pods

When viewing the output of kubectl get events, you see events similar to the following:

$ kubectl get events
LAST SEEN   TYPE      REASON             OBJECT               MESSAGE
28s         Normal    Preempted          pod/testpod          Preempted by image-policy-system/image-policy-controller-manager-59dc669d99-frwcp on node test-node

Explanation

The Supply Chain Security Tools - Sign component uses a privileged PriorityClass to start its pods to prevent node pressure from preempting its pods. This can cause less privileged components to have their pods preempted or evicted instead.

Solution

  • Solution 1: Reduce the number of pods deployed by the Sign component: If your deployment of the Sign component runs more pods than necessary, scale down the deployment down as follows:

    1. Create a values file named scst-sign-values.yaml with the following contents:

      ---
      replicas: N
      

      Where N is an integer indicating the lowest number of pods you necessary for your current cluster configuration.

    2. Apply the new configuration by running:

      tanzu package installed update image-policy-webhook \
        --package-name image-policy-webhook.signing.apps.tanzu.vmware.com \
        --version 1.0.0-beta.3 \
        --namespace tap-install \
        --values-file scst-sign-values.yaml
      
    3. Wait a few minutes for your configuration to take effect in the cluster.

  • Solution 2: Increase your cluster’s resources: Node pressure may be caused by an insufficient number of nodes or a lack of resources on nodes necessary to deploy the workloads that you have. Follow instructions from your cloud provider to scale out or scale up your cluster.

CrashLoopBackOff from password authentication fails

Supply Chain Security Tools - Store does not start. You see the following error in the metadata-store-app Pod logs:

```console
$ kubectl logs pod/metadata-store-app-* -n metadata-store -c metadata-store-app
...
[error] failed to initialize database, got error failed to connect to `host=metadata-store-db user=metadata-store-user database=metadata-store`: server error (FATAL: password authentication failed for user "metadata-store-user" (SQLSTATE 28P01))
```

Explanation

The database password has been changed between deployments. This is not supported.

Solution

Redeploy the app either with the original database password or follow these steps below to erase the data on the volume:

  1. Deploy metadata-store app with kapp.

  2. Verify that the metadata-store-db-* Pod fails.

  3. Run:

    kubectl exec -it metadata-store-db-KUBERNETES-ID -n metadata-store /bin/bash
    

    Where KUBERNETES-ID is the ID generated by Kubernetes and appended to the Pod name.

  4. To delete all database data, run:

    rm -rf /var/lib/postgresql/data/*
    

    This is the path found in postgres-db-deployment.yaml.

  5. Delete the metadata-store app with kapp.

  6. Deploy the metadata-store app with kapp.

Password authentication fails

Supply Chain Security Tools - Store does not start. You see the following error in the metadata-store-app Pod logs:

$ kubectl logs pod/metadata-store-app-* -n metadata-store -c metadata-store-app
...
[error] failed to initialize database, got error failed to connect to `host=metadata-store-db user=metadata-store-user database=metadata-store`: server error (FATAL: password authentication failed for user "metadata-store-user" (SQLSTATE 28P01))

Explanation

The database password has been changed between deployments. This is not supported.

Solution

Redeploy the app either with the original database password or follow these steps below to erase the data on the volume:

  1. Deploy metadata-store app with kapp.

  2. Verify that the metadata-store-db-* Pod fails.

  3. Run:

    kubectl exec -it metadata-store-db-KUBERNETES-ID -n metadata-store /bin/bash
    

    Where KUBERNETES-ID is the ID generated by Kubernetes and appended to the Pod name.

  4. To delete all database data, run:

    rm -rf /var/lib/postgresql/data/*
    

    This is the path found in postgres-db-deployment.yaml.

  5. Delete the metadata-store app with kapp.

  6. Deploy the metadata-store app with kapp.

metadata-store-db pod fails to start

When Supply Chain Security Tools - Store is deployed, deleted, and then redeployed, the metadata-store-db Pod fails to start if the database password changed during redeployment.

Explanation

The persistent volume used by postgres retains old data, even though the retention policy is set to DELETE.

Solution

Redeploy the app either with the original database password or follow these steps below to erase the data on the volume:

  1. Deploy metadata-store app with kapp.

  2. Verify that the metadata-store-db-* Pod fails.

  3. Run:

    kubectl exec -it metadata-store-db-KUBERNETES-ID -n metadata-store /bin/bash
    

    Where KUBERNETES-ID is the ID generated by Kubernetes and appended to the Pod name.

  4. To delete all database data, run:

    rm -rf /var/lib/postgresql/data/*
    

    This is the path found in postgres-db-deployment.yaml.

  5. Delete the metadata-store app with kapp.

  6. Deploy the metadata-store app with kapp.

Missing persistent volume

After Supply Chain Security Tools - Store is deployed, metadata-store-db Pod fails for missing volume while postgres-db-pv-claim pvc is in the PENDING state.

Explanation

The cluster where Supply Chain Security Tools - Store is deployed does not have storageclass defined. The provisioner of storageclass is responsible for creating the persistent volume after metadata-store-db attaches postgres-db-pv-claim.

Solution

  1. Verify that your cluster has storageclass by running:

    kubectl get storageclass
    
  2. Create a storageclass in your cluster before deploying Supply Chain Security Tools - Store. For example:

    # This is the storageclass that Kind uses
    kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
    
    # set the storage class as default
    kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
    

Supply Chain Security Tools - Sign rejects images

Supply Chain Security Tools - Sign rejects images from private registries.

Explanation

The image is deployed to a non-default namespace.

Solution

Make the private registry secret available to the default namespace.

Supply Chain Security Tools - Scan unable to decode CycloneDX

Supply Chain Security Tools - Scan has a known issue where it sets the phase of a scan to Error with the message unable to decode cyclonedx. This is an intermittent issue that cuts the CycloneDX XML stream to the logs such that the scan controller is unable to process the results properly.

Explanation The root cause of the problem is unknown.

Workaround: See the Troubleshooting Guide for how to exit this error state.

check-circle-line exclamation-circle-line close-line
Scroll to top icon