This topic tells you how to troubleshoot Tanzu Build Service when used with Tanzu Application Platform (commonly known as TAP).
After installing or upgrading Tanzu Application Platform on an Amazon Elastic Kubernetes Service (EKS) cluster running Kubernetes v1.23, build pods show:
'running PreBind plugin "VolumeBinding": binding volumes: timed out waiting
for the condition'
This is due to the CSIMigrationAWS in this Kubernetes version, which requires users to install the Amazon EBS CSI driver to use AWS Elastic Block Store (EBS) volumes. See the Amazon documentation. For more information about EKS support for Kubernetes v1.23, see the Amazon blog post.
Tanzu Application Platform uses the default storage class which uses EBS volumes by default on EKS.
Follow the AWS documentation to install the Amazon EBS CSI driver before installing Tanzu Application Platform, or before upgrading to Kubernetes v1.23. See
When using dockerd as the cluster’s container runtime, you might see the smart-warmer-image-fetcher
pods report a status of ErrImagePull
.
This error might be due to dockerd’s layer depth limitation, in which the maximum supported image layer depth is 125.
To verify that the ErrImagePull
status is due to dockerd’s maximum supported image layer depth, check for event messages containing the words max depth exceeded
. For example:
$ kubectl get events -A | grep "max depth exceeded"
build-service 73s Warning Failed pod/smart-warmer-image-fetcher-wxtr8 Failed to pull image
"harbor.somewhere.com/aws-repo/build-service:clusterbuilder-full@sha256:065bb361fd914a3970ad3dd93c603241e69cca214707feaa6
d8617019e20b65e": rpc error: code = Unknown desc = failed to register layer: max depth exceeded
To work around this issue, configure your cluster to use containerd or CRI-O as its default container runtime. For instructions, see the following documentation for your Kubernetes cluster provider.
For AWS, see:
For AKS, see:
For GKE, see:
For OpenShift, see:
You see the following error, or similar, in a node status:
Warning ContainerGCFailed 119s (x2523 over 42h) kubelet rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (16779959 vs. 16777216)
This is due to the way that the container runtime interface (CRI) handles garbage collection for unused images and containers.
Do not use Docker as the CRI because it is not supported. Some versions of EKS default to Docker as the runtime.
While upgrading apps to a later stack, you might encounter the build platform erroneously reusing the old build cache.
If you encounter this issue, delete, and recreate the workload in Tanzu Application Platform, or delete and recreate the image in Tanzu Build Service.
buildservice.kp_default_repository
to shared.image_registry
After switching to using the shared.image_registry
fields in tap-values.yaml
, your workloads might start failing with a TemplateRejectedByAPIServer
error, with the error message: admission webhook "validation.webhook.kpack.io" denied the request: validation failed: Immutable field changed: spec.tag
.
Tanzu Application Platform automatically appends /buildservice
to the end of the repository specified in shared.image_registry.project_path
. This updates the existing workload image tags, which is not allowed by Tanzu Build Service.
Delete the images.kpack.io
, it has the same name as the workload. The workload then recreates it with valid values.
Alternatively, re-add the buildservice.kp_default_repository_*
fields in the tap-values.yaml
. You must set both the repository and the authentication fields to override the shared values. Set kp_default_repository
, kp_default_repository_secret.name
, and kp_default_repository_secret.namespace
.