This topic describes how you can troubleshoot Cartographer Conventions.
When a PodIntent
is submitted, no convention
is applied.
When there are no convention servers
(ClusterPodConvention) deployed in the cluster or none of the existing convention servers applied any conventions, the PodIntent
is not mutating.
Deploy a convention server
(ClusterPodConvention) in the cluster.
PodIntent
is submitted, the conventions
are not applied.The convention-controller
logs report an error failed to get CABundle
as follows:
{
"level": "error",
"ts": 1638222343.6839523,
"logger": "controllers.PodIntent.PodIntent.ResolveConventions",
"msg": "failed to get CABundle",
"ClusterPodConvention": "base-convention",
"error": "unable to find valid certificaterequests for certificate \"convention-template/webhook-certificate\"",
"stacktrace": "reflect.Value.Call\n\treflect/value.go:339\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*SyncReconciler).sync\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:287\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*SyncReconciler).Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:276\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.Sequence.Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:815\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*ParentReconciler).reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:146\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*ParentReconciler).Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:120\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"
convention server
(ClusterPodConvention) is configured with the wrong certificates. The convention-controller
cannot figure out the CA Bundle to perform the request to the server.
Ensure that the convention server
(ClusterPodConvention) is configured with the correct certificates. To do so, verify the value of annotation conventions.carto.run/inject-ca-from
which must be set to the used Certificate.
ImportantDo not set annotation
conventions.carto.run/inject-ca-from
if no certificate is used.
PodIntent
is submitted, the convention
is not applied.The convention-controller
logs report failed to apply convention
error like this.
{"level":"error","ts":1638205387.8813763,"logger":"controllers.PodIntent.PodIntent.ApplyConventions","msg":"failed to apply convention","Convention":{"Name":"base-convention","Selectors":null,"Priority":"Normal","ClientConfig":{"service":{"namespace":"convention-template","name":"webhook","port":443},"caBundle":"..."}},"error":"Post \"https://webhook.convention-template.svc:443/?timeout=30s\": EOF","stacktrace":"reflect.Value.call\n\treflect/value.go:543\nreflect.Value.Call\n\treflect/value.go:339\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*SyncReconciler).sync\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:287\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*SyncReconciler).Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:276\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.Sequence.Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:815\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*ParentReconciler).reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:146\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*ParentReconciler).Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:120\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
When a PodIntent
status message is updated with failed to apply convention from source base-convention: Post "https://webhook.convention-template.svc:443/?timeout=30s": EOF
.
An unmanaged error occurs in the convention server
when processing a request.
Identify the error and deploy a fixed version of convention server
:
Inspect the convention server
logs to identify the cause of the error. Retrieve the convention server
logs by running:
kubectl -n convention-template logs deployment/webhook
Where:
Deployment
.webhook
is the name of the convention server Deployment
.convention-template
is the namespace where the convention server is deployed.Identify the error and deploy a fixed version of convention server
.
The new deployment is not applied to the existing PodIntent
s. It is only applied to the new PodIntent
resources. To apply a new deployment to an existing PodIntent
, update the PodIntent
so that the reconciler applies if it matches the criteria.
PodIntent
is submitted, the convention
is not applied.The convention-controller
logs report a connection-refused error as follows:
{"level":"error","ts":1638202791.5734537,"logger":"controllers.PodIntent.PodIntent.ApplyConventions","msg":"failed to apply convention","Convention":{"Name":"base-convention","Selectors":null,"Priority":"Normal","ClientConfig":{"service":{"namespace":"convention-template","name":"webhook","port":443},"caBundle":"..."}},"error":"Post \"https://webhook.convention-template.svc:443/?timeout=30s\": dial tcp 10.56.13.206:443: connect: connection refused","stacktrace":"reflect.Value.call\n\treflect/value.go:543\nreflect.Value.Call\n\treflect/value.go:339\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*SyncReconciler).sync\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:287\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*SyncReconciler).Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:276\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.Sequence.Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:815\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*ParentReconciler).reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:146\ngithub.com/vmware-labs/reconciler-runtime/reconcilers.(*ParentReconciler).Reconcile\n\tgithub.com/vmware-labs/[email protected]/reconcilers/reconcilers.go:120\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}
The convention server
fails to start because server gave HTTP response to HTTPS client
:
convention server
events by running:kubectl -n convention-template describe pod webhook-594d75d69b-4w4s8
Where:
Deployment
.webhook-594d75d69b-4w4s8
is the name of the convention server
Pod.convention-template
is the namespace where the convention server is deployed.For example:
$ kubectl -n convention-template describe pod webhook-594d75d69b-4w4s8
Name: webhook-594d75d69b-4w4s8
Namespace: convention-template
...
Containers:
webhook:
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned convention-template/webhook-594d75d69b-4w4s8 to pool
Normal Pulling 14m kubelet Pulling image "awesome-repo/awesome-user/awesome-convention-..."
Normal Pulled 14m kubelet Successfully pulled image "awesome-repo/awesome-user/awesome-convention..." in 1.06032653s
Normal Created 13m (x2 over 14m) kubelet Created container webhook
Normal Started 13m (x2 over 14m) kubelet Started container webhook
Warning Unhealthy 13m (x9 over 14m) kubelet Readiness probe failed: Get "https://10.52.2.74:8443/healthz": http: server gave HTTP response to HTTPS client
Warning Unhealthy 13m (x6 over 14m) kubelet Liveness probe failed: Get "https://10.52.2.74:8443/healthz": http: server gave HTTP response to HTTPS client
Normal Pulled 9m13s (x6 over 13m) kubelet Container image "awesome-repo/awesome-user/awesome-convention" already present on machine
Warning BackOff 4m22s (x32 over 11m) kubelet Back-off restarting failed container
When a convention server
is provided without using Transport Layer Security (TLS) but the Deployment
is configured to use TLS, Kubernetes fails to deploy the Pod
because of the liveness probe
.
Create a differently configured ClusterPodConvention
resource:
convention server
with TLS enabled.ClusterPodConvention
resource for the convention server with annotation conventions.carto.run/inject-ca-from
as a pointer to the deployed Certificate
resource.The self-signed CA for a registry is not propagated to the Convention Service.
When you provide the self-signed CA for a registry through convention-controller.ca_cert_data
, the self-signed CA cannot be propagated to the Convention Service.
Define the CA by using the available .shared.ca_cert_data
top-level key to supply the CA to the Convention Service.
When a PodIntent
is submitted:
unauthorized to access repository
or fetching metadata for Images failed
error when you inspect the workload.The errors appear when a workload
is created in a developer namespace where imagePullSecrets
are not defined on the default
serviceAccount or on the preferred serviceAccount.
Add the imagePullSecrets
name to the default serviceAccount or the preferred serviceAccount.
For example:
kind: ServiceAccount
metadata:
name: default
namespace: my-workload-namespace
imagePullSecrets:
- name: registry-credentials # ensure this secret is defined
secrets:
- name: registry-credentials
OOMKilled
convention controllerWhile processing workloads with a large SBOM, the Cartographer Convention controller manager pod can fail with the status CrashLoopBackOff
or OOMKilled
.
To work around this problem you can increase the memory limit to 512Mi
to fix the pod crash.
Symptom example:
NAME READY STATUS RESTARTS AGE
cartographer-conventions-controller-manager-ff4cdf59d-5nzl5 0/1 CrashLoopBackOff 1292 (109s ago) 5d3h
The following is an example controller pod status:
containerStatuses:
- containerID: containerd://b7b7159a9e00ef726944d642a1b649108bba610b34d8d10f9b5270ea25d3db94
image: sha256:9827e8e5b30d47c9373a1907dc5e7e15a76d2a4581e803eb6f2cb24e3a9ea62e
imageID: my.image.registry.com/tanzu-application-platform/tap-packages@sha256:3cd1ae92f534ff935fbaf992b8308aa3dac3d1b6cbc8cf8a856451c8c92540f66
lastState:
terminated:
containerID: containerd://b7b7159a9e00ef726944d642a1b649108bba610b34d8d10f9b5270ea25d3db94
exitCode: 137
finishedAt: "2023-11-06T21:02:56Z"
reason: OOMKilled
startedAt: "2023-11-06T21:02:10Z"
name: manager
This error usually occurs when a workload
image, built by the supply chain, contains a large SBOM. The default resource limit set during installation might not be large enough to process the pod conventions which can lead to the controller pod crashing.
Increase the Cartographer Convention controller manager memory limit through tap-values.yaml
. For example:
To increase the memory limit for the convention server:
Increase the memory limit, add the desired resource limit under key cartographer_conventions
in tap-values.yaml
:
cartographer_conventions:
resource:
memory: 512Mi
Update Tanzu Application Platform by running:
tanzu package installed update tap -p tap.tanzu.vmware.com -v 1.11.0 \
--values-file tap-values.yaml -n tap-install
For information about the package customization, see Customize your package installation.
You might need to increase the memory limit for the following convention webhook servers:
Use this procedure to increase the memory limit:
Create a Secret
with the following ytt overlay.
apiVersion: v1
kind: Secret
metadata:
name: patch-app-live-view-conventions
namespace: tap-install
stringData:
patch-conventions-controller.yaml: |
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"Deployment", "metadata":{"name":"appliveview-webhook", "namespace": "app-live-view-conventions"}})
---
spec:
template:
spec:
containers:
#@overlay/match by=overlay.subset({"name": "webhook"})
- name: webhook
resources:
limits:
memory: 512Mi
---
apiVersion: v1
kind: Secret
metadata:
name: patch-spring-boot-conventions
namespace: tap-install
stringData:
patch-conventions-controller.yaml: |
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"Deployment", "metadata":{"name":"spring-boot-webhook", "namespace": "spring-boot-convention"}})
---
spec:
template:
spec:
containers:
#@overlay/match by=overlay.subset({"name": "webhook"})
- name: webhook
resources:
limits:
memory: 512Mi
---
apiVersion: v1
kind: Secret
metadata:
name: patch-developer-conventions
namespace: tap-install
stringData:
patch-conventions-controller.yaml: |
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind":"Deployment", "metadata":{"name":"webhook", "namespace": "developer-conventions"}})
---
spec:
template:
spec:
containers:
#@overlay/match by=overlay.subset({"name": "webhook"})
- name: webhook
resources:
limits:
memory: 512Mi
Update tap-values.yaml
to include a package_overlays
field as follows:
package_overlays:
- name: appliveview-conventions
secrets:
- name: patch-app-live-view-conventions
- name: spring-boot-conventions
secrets:
- name: patch-spring-boot-conventions
- name: developer-conventions
secrets:
- name: patch-developer-conventions
Update Tanzu Application Platform by running:
tanzu package installed update tap -p tap.tanzu.vmware.com -v 1.11.0 \
--values-file tap-values.yaml -n tap-install
For information about the package customization, see Customize your package installation.
An error similar to the following appears when processing a workload with a config-provider
step:
message: >-unable to apply object [workload-name] for resource [config-provider] in supply chain \
[source-test-scan-to-url]: create: Internal error occurred: failed calling webhook \
"podintents.conventions.carto.run": failed to call webhook: Post \
"https://cartographer-conventions-webhook-service.cartographer-system.svc:443/mutate-conventions-carto-run-v1alpha1-podintent?timeout=10s":x509: certificate signed by unknown authority
The CA certificate used to secure TLS communications to the Cartographer Conventions webhook pod might have fallen out of sync between the running webhook pod and the certificate that the MutatingWebhookConfiguration
and ValidatingWebhookConfiguration
resources configured.
Force cert-manager
to re-create the certificates and ensure that they are in sync across the different places they are used:
Delete the Cartographer Conventions webhook configurations by running:
kubectl delete mutatingwebhookconfiguration cartographer-conventions-mutating-webhook-configuration \
-n conventions-system
kubectl delete validatingwebhookconfiguration cartographer-conventions-validating-webhook-configuration \
-n conventions-system
The two webhook configurations are re-created, but their caBundle
fields might be empty. If the caBundle
fields are empty then cert-manager
might be failing. If cert-manager
is failing, force cert-manager
deployments to restart by running:
kubectl rollout restart deployment cert-manager -n cert-manager
kubectl rollout restart deployment cert-manager-cainjector -n cert-manager
kubectl rollout restart deployment cert-manager-webhook -n cert-manager
Force the Cartographer Conventions deployment to restart and detect any new certificates by running:
kubectl rollout restart deployment cartographer-conventions-controller-manager -n conventions-system
Re-create the workload.