This topic tells you how to troubleshoot Cloud Native Runtimes, commonly known as CNRs, installation or configuration.
After upgrading to Tanzu Application Platform v1.6.4 or later, if you attempt to update a web workload created in Tanzu Application Platform v1.6.3 or earlier you see the following error:
API server says: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: annotation value is immutable: metadata.annotations.serving.knative.dev/creator (reason: BadRequest)
Kapp controller, which is the orchestrator underneath workloads, deploys resources exactly as requested. However, Knative adds annotations to Knative Services to track the creator and last modified time of a resource. This conflict between kapp controller and Knative is a known issue and expected behavior that is mitigated by a kapp configuration that the supply chain defines and uses at deploy time. The kapp config specifies that the annotations Knative adds must not be modified during updates.
As of Tanzu Application Platform v1.6.4, the kapp configuration moved from the delivery supply chain to the build supply chain. When a web workload is being updated, the delivery supply chain no longer provides the kapp configuration, which causes the validation error. Although the kapp configuration exists on v1.6.4 in a different part of the supply chain, existing deliverables are not rebuilt to include it.
To workaround this issue:
Deploy the following overlay as a secret to your Tanzu Application Platform installation namespace. In the following example, Tanzu Application Platform is installed to the tap-install
namespace:
apiVersion: v1
kind: Secret
metadata:
name: old-deliverables-patch
namespace: tap-install #! namespace where tap is installed
stringData:
app-deploy-overlay.yaml: |
#@ load("@ytt:overlay", "overlay")
#@ def kapp_config_replace(left, right):
#@ return left + "\n" + right
#@ end
#@overlay/match by=overlay.subset({"kind": "ClusterDeploymentTemplate", "metadata": {"name": "app-deploy"}})
---
spec:
#@overlay/replace via=kapp_config_replace
ytt: |
#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:yaml", "yaml")
#@ def kapp_config_temp():
apiVersion: kapp.k14s.io/v1alpha1
kind: Config
rebaseRules:
- path: [metadata, annotations, serving.knative.dev/creator]
type: copy
sources: [new, existing]
resourceMatchers: &matchers
- apiVersionKindMatcher: {apiVersion: serving.knative.dev/v1, kind: Service}
- path: [metadata, annotations, serving.knative.dev/lastModifier]
type: copy
sources: [new, existing]
resourceMatchers: *matchers
waitRules:
- resourceMatchers:
- apiVersionKindMatcher:
apiVersion: serving.knative.dev/v1
kind: Service
conditionMatchers:
- type: Ready
status: "True"
success: true
- type: Ready
status: "False"
failure: true
ownershipLabelRules:
- path: [ spec, template, metadata, labels ]
resourceMatchers:
- apiVersionKindMatcher: { apiVersion: serving.knative.dev/v1, kind: Service }
#@ end
#@overlay/match by=overlay.subset({"apiVersion": "kappctrl.k14s.io/v1alpha1", "kind": "App", "metadata": { "name": data.values.deliverable.metadata.name}})
---
spec:
fetch:
#@overlay/append
- inline:
paths:
overlay-config.yml: #@ yaml.encode(kapp_config_temp())
If you installed Tanzu Application Platform using a profile, apply the overlay to the ootb-templates
package by following the instructions in Customize a package that was installed by using a profile.
After you complete the steps, updates to the application will deploy.
NoteVMware plans to include a fix in future releases.
On AWS, you see the following error when connecting to your app:
curl: (6) Could not resolve host: a***********************7.us-west-2.elb.amazonaws.com
Try connecting to your app again after 5 minutes. The AWS LoadBalancer name resolution takes several minutes to propagate.
On minikube, you see the following error when installing Cloud Native Runtimes:
3:03:59PM: error: reconcile job/contour-certgen-v1.10.0 (batch/v1) namespace: contour-internal
Pod watching error: Creating Pod watcher: Get "https://192.168.64.17:8443/api/v1/pods?labelSelector=kapp.k14s.io%2Fapp%3D1618232545704878000&watch=true": dial tcp 192.168.64.17:8443: connect: connection refused
kapp: Error: waiting on reconcile job/contour-certgen-v1.10.0 (batch/v1) namespace: CONTOUR-NS:
Errored:
Listing schema.GroupVersionResource{Group:"", Version:"v1", Resource:"pods"}, namespaced: true:
Get "https://192.168.64.17:8443/api/v1/pods?labelSelector=kapp.k14s.io%2Fassociation%3Dv1.572a543d96e0723f858367fcf8c6af4e": unexpected EOF
Where CONTOUR-NS is the namespace where Contour is installed on your cluster. If Cloud Native Runtimes was installed as part of a Tanzu Application Profile, this value will likely be tanzu-system-ingress
.
Increase your available system RAM to at least 4 GB.
When relocating an image to a private registry and later pulling that image with imgpkg pull --lock LOCK-OUTPUT -o ./cloud-native-runtimes
, the contents of the cloud-native-runtimes are overwritten.
Upgrade the imgpkg version to v0.13.0 or later.
When installing Cloud Native Runtimes, you see one of the following errors:
11:41:16AM: ongoing: reconcile app/cloud-native-runtimes (kappctrl.k14s.io/v1alpha1) namespace: cloud-native-runtime
11:41:16AM: ^ Waiting for generation 1 to be observed
kapp: Error: Timed out waiting after 15m0s
Or,
3:15:34PM: ^ Reconciling
3:16:09PM: fail: reconcile app/cloud-native-runtimes (kappctrl.k14s.io/v1alpha1) namespace: cloud-native-runtimes
3:16:09PM: ^ Reconcile failed: (message: Deploying: Error (see .status.usefulErrorMessage for details))
kapp: Error: waiting on reconcile app/cloud-native-runtimes (kappctrl.k14s.io/v1alpha1) namespace: cloud-native-runtimes:
Finished unsuccessfully (Reconcile failed: (message: Deploying: Error (see .status.usefulErrorMessage for details)))
The cloud-native-runtimes
deployment app installs the subcomponents of Cloud Native Runtimes. Error messages about reconciling indicate that one or more subcomponents have failed to install.
Use the following procedure to examine logs:
Get the logs from the cloud-native-runtimes
app by running:
kubectl get app/cloud-native-runtimes -n cloud-native-runtimes -o jsonpath="{.status.deploy.stdout}"
Note: If the command does not return log messages, then kapp-controller is not installed or is not running correctly.
Follow these steps to identify and resolve the problem of the cloud provider not supporting services of type LoadBalancer
:
Search the log output for Load balancer
, for example by running:
kubectl -n cloud-native-runtimes get app cloud-native-runtimes -ojsonpath="{.status.deploy.stdout}" | grep "Load balancer" -C 1
If the output looks similar to the following, ensure that your cloud provider supports services of type LoadBalancer
. For more information, see Prerequisites.
6:30:22PM: ongoing: reconcile service/envoy (v1) namespace: CONTOUR-NS
6:30:22PM: ^ Load balancer ingress is empty
6:30:29PM: ---- waiting on 1 changes [322/323 done] ----
Where CONTOUR-NS is the namespace where Contour is installed on your cluster. If Cloud Native Runtimes was installed as part of a Tanzu Application Profile, this value will likely be tanzu-system-ingress
.
Follow these steps to identify and resolve the problem of the webhook
deployment failing in the vmware-sources
namespace:
Review the logs for output similar to the following:
10:51:58PM: ok: reconcile customresourcedefinition/httpproxies.projectcontour.io (apiextensions.k8s.io/v1) cluster
10:51:58PM: fail: reconcile deployment/webhook (apps/v1) namespace: vmware-sources
10:51:58PM: ^ Deployment is not progressing: ProgressDeadlineExceeded (message: ReplicaSet "webhook-6f5d979b7d" has timed out progressing.)
Run kubectl get pods
to find the name of the pod:
kubectl get pods --show-labels -n NAMESPACE
Where NAMESPACE
is the namespace associated with the reconcile error, for example, vmware-sources
.
For example,
$ kubectl get pods --show-labels -n vmware-sources
NAME READY STATUS RESTARTS AGE LABELS
webhook-6f5d979b7d-cxr9k 0/1 Pending 0 44h app=webhook,kapp.k14s.io/app=1626302357703846007,kapp.k14s.io/association=v1.9621e0a793b4e925077dd557acedbcfe,pod-template-hash=6f5d979b7d,role=webhook,sources.tanzu.vmware.com/release=v0.23.0
Run kubectl logs
and kubectl describe
:
kubectl logs PODNAME -n NAMESPACE
kubectl describe pod PODNAME -n NAMESPACE
Where:
PODNAME
is found in the output of step 3, for example webhook-6f5d979b7d-cxr9k
.NAMESPACE
is the namespace associated with the reconcile error, for example, vmware-sources
.For example:
$ kubectl logs webhook-6f5d979b7d-cxr9k -n vmware-sources
$ kubectl describe pod webhook-6f5d979b7d-cxr9k -n vmware-sources
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 80s (x14 over 14m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Review the output from the kubectl logs
and kubectl describe
commands and take further action.
For this example of the webhook deployment, the output indicates that the scheduler does not have enough CPU to run the pod. In this case, the solution is to add nodes or CPU cores to the cluster. If you are using Tanzu Mission Control (TMC), increase the number of workers in the node pool to three or more through the TMC UI. See Edit a Node Pool, in the TMC documentation.
You see the following error message when you run the install script:
Could not proceed with installation. Refer to Cloud Native Runtimes documentation for details on how to utilize an existing Contour installation. Another app owns the custom resource definitions listed below.
Follow the procedure in Install Cloud Native Runtimes on a Cluster with Your Existing Contour Instances to resolve the issue.
When creating a Knative Service, it does not reach ready status. The corresponding Route resource has the status Ready=Unknown
with Reason=EndpointsNotReady
. When you check the logs for the net-contour-controller
, you see an error like this:
{"severity":"ERROR","timestamp":"2022-12-08T16:27:08.320604183Z","logger":"net-contour-controller","caller":"ingress/reconciler.go:313","message":"Returned an error","commit":"041f9e3","knative.dev/controller":"knative.dev.net-contour.pkg.reconciler.contour.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"9d615387-f552-449c-a8cd-04c69dd1849e","knative.dev/key":"cody/foo-java","targetMethod":"ReconcileKind","error":"HTTPProxy.projectcontour.io \"foo-java-contour-5f549ae3e6f584a5f33d069a0650c0d8foo-java.cody.\" is invalid: metadata.name: Invalid value: \"foo-java-contour-5f549ae3e6f584a5f33d069a0650c0d8foo-java.cody.\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')","stacktrace":"knative.dev/networking/pkg/client/injection/reconciler/networking/v1alpha1/ingress.(*reconcilerImpl).Reconcile\n\tknative.dev/[email protected]/pkg/client/injection/reconciler/networking/v1alpha1/ingress/reconciler.go:313\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/[email protected]/controller/controller.go:542\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/[email protected]/controller/controller.go:491"}
Due to a known upstream Knative issue, certain combinations of Name + Namespace + Domain yield invalid names for HTTPProxy resources due to the way the name is hashed and trimmed to fit the size requirement. It can end up with non-alphanumeric characters at the end of the name.
Resolving this will be unique to each Knative service. It will likely involve renaming your app to be shorter so that after the hash + trim procedure, the name gets cut to end on an alphanumeric character.
For example, foo-java.cody.iterate.tanzu-azure-lab.winterfell.fun
gets hashed and trimmed into foo-java-contour-5f549ae3e6f584a5f33d069a0650c0d8foo-java.cody.
, leaving an invalid .
at the end.
However, changing the app name to foo-jav
will result in foo-jav-contour-<some different hash>foo-jav.cody.it
, which is a valid name.
CertificateNotReady
.When creating a Knative Service, it does not reach ready status. The Knative Service has the status CertificateNotReady
. When you check the status of the kcert
resource that belongs to the Knative Service you see a message like this:
kubectl -n your-namespace get kcert route-76e387a2-cc35-4580-b2f1-bf7561371891 -ojsonpath='{.status}'
Output:
{
"conditions":[
{
"lastTransitionTime":"2023-06-05T11:26:53Z",
"message":"error creating Certmanager Certificate: cannot create valid length CommonName: (where-for-dinner.medium.longevityaks253.tapalong.cloudfocused.in) still longer than 63 characters, cannot shorten",
"reason":"CommonName Too Long",
"status":"False",
"type":"Ready"
}],
"observedGeneration":1}
Due to a restriction imposed by cert-manager, CNs cannot be longer than 64 bytes. For more information, see this cert-manager issue in GitHub. For Knative using cert-manager, this means that the FQDN for a Knative Service, usually comprised of <ksvc name>.<namespace>.<domain>
but configurable using domain_template
in Cloud Native Runtimes, must not exceed 64 bytes.
Recent improvements to Knative have been able to catch this in some cases. When <ksvc name>.<namespace>
is longer than 25 characters, Knative will attempt to hash that value, and create a new common name in the form of <hash>.<domain>
. However, if <ksvc name>.<namespace>
is less than 25 characters long, it will not attempt to hash.
Knative is limited to a 25 character hash to preserve uniqueness in CommonNames. It also cannot shorten the domain portion, because that will break DNS resolution when performing HTTP01 Challenges.
As a result, this catches some cases, but not all. It is possible that your <domain>
portion is still too long.
There is an issue in Knative Serving community that aims to solve this.
The quickest way to avoid this is to disable TLS. See Cloud Native Runtimes docs on disabling auto tls for more details.
If you wish to continue using TLS, there are a few ways to resolve this on your own, though each comes with its own risks and limitations.
domain_template
Changing the domain_template
alters how Knative will create FQDNs for Knative Services. See Cloud Native Runtimes instructions on configuring External DNS.
You can use this option to shorten the template, either by shortening one of the fields:
{{.Name}}.{{slice .Namespace 0 3}}.{{.Domain}}
Note: Knative was not designed with shortening the name or namespace in mind. Due to a quirk in Knative’s domain template validation, you can only slice up to a max of 3 characters.
Or by removing a field altogether:
{{.Name}}.{{.Domain}}
Warning: Removing the namespace from the
domain_template
makes it possible for Knative to create non-unique FQDNs for Knative Services across different namespaces. It will require manual care in naming Knative Services to make sure FQDNs remain unique.
Another option is to shorten the names of your Knative Services and/or Namespaces, if you have that ability. This will also require some manual calculation to make sure that the shortened Name, Namespace, and domain (including .
s) come out to less that 64 bytes.