This topic tells you how to troubleshoot Cloud Native Runtimes, commonly known as CNR, installation or configuration.
On AWS, you see the following error when connecting to your app:
curl: (6) Could not resolve host: a***********************7.us-west-2.elb.amazonaws.com
Try connecting to your app again after 5 minutes. The AWS LoadBalancer name resolution takes several minutes to propagate.
On minikube, you see the following error when installing Cloud Native Runtimes:
3:03:59PM: error: reconcile job/contour-certgen-v1.10.0 (batch/v1) namespace: contour-internal
Pod watching error: Creating Pod watcher: Get "https://192.168.64.17:8443/api/v1/pods?labelSelector=kapp.k14s.io%2Fapp%3D1618232545704878000&watch=true": dial tcp 192.168.64.17:8443: connect: connection refused
kapp: Error: waiting on reconcile job/contour-certgen-v1.10.0 (batch/v1) namespace: CONTOUR-NS:
Errored:
Listing schema.GroupVersionResource{Group:"", Version:"v1", Resource:"pods"}, namespaced: true:
Get "https://192.168.64.17:8443/api/v1/pods?labelSelector=kapp.k14s.io%2Fassociation%3Dv1.572a543d96e0723f858367fcf8c6af4e": unexpected EOF
Where CONTOUR-NS is the namespace where Contour is installed on your cluster. If Cloud Native Runtimes was installed as part of a Tanzu Application Profile, this value will likely be tanzu-system-ingress.
Increase your available system RAM to at least 4 GB.
When relocating an image to a private registry and later pulling that image with imgpkg pull --lock LOCK-OUTPUT -o ./cloud-native-runtimes, the contents of the cloud-native-runtimes are overwritten.
Upgrade the imgpkg version to v0.13.0 or later.
When installing Cloud Native Runtimes, you see one of the following errors:
11:41:16AM: ongoing: reconcile app/cloud-native-runtimes (kappctrl.k14s.io/v1alpha1) namespace: cloud-native-runtime
11:41:16AM: ^ Waiting for generation 1 to be observed
kapp: Error: Timed out waiting after 15m0s
Or,
3:15:34PM: ^ Reconciling
3:16:09PM: fail: reconcile app/cloud-native-runtimes (kappctrl.k14s.io/v1alpha1) namespace: cloud-native-runtimes
3:16:09PM: ^ Reconcile failed: (message: Deploying: Error (see .status.usefulErrorMessage for details))
kapp: Error: waiting on reconcile app/cloud-native-runtimes (kappctrl.k14s.io/v1alpha1) namespace: cloud-native-runtimes:
Finished unsuccessfully (Reconcile failed: (message: Deploying: Error (see .status.usefulErrorMessage for details)))
The cloud-native-runtimes deployment app installs the subcomponents of Cloud Native Runtimes. Error messages about reconciling indicate that one or more subcomponents have failed to install.
Use the following procedure to examine logs:
Get the logs from the cloud-native-runtimes app by running:
kubectl get app/cloud-native-runtimes -n cloud-native-runtimes -o jsonpath="{.status.deploy.stdout}"
Note: If the command does not return log messages, then kapp-controller is not installed or is not running correctly.
Follow these steps to identify and resolve the problem of the cloud provider not supporting services of type LoadBalancer:
Search the log output for Load balancer, for example by running:
kubectl -n cloud-native-runtimes get app cloud-native-runtimes -ojsonpath="{.status.deploy.stdout}" | grep "Load balancer" -C 1
If the output looks similar to the following, ensure that your cloud provider supports services of type LoadBalancer. For more information, see Prerequisites.
6:30:22PM: ongoing: reconcile service/envoy (v1) namespace: CONTOUR-NS
6:30:22PM: ^ Load balancer ingress is empty
6:30:29PM: ---- waiting on 1 changes [322/323 done] ----
Where CONTOUR-NS is the namespace where Contour is installed on your cluster. If Cloud Native Runtimes was installed as part of a Tanzu Application Profile, this value will likely be tanzu-system-ingress.
Follow these steps to identify and resolve the problem of the webhook deployment failing in the vmware-sources namespace:
Review the logs for output similar to the following:
10:51:58PM: ok: reconcile customresourcedefinition/httpproxies.projectcontour.io (apiextensions.k8s.io/v1) cluster
10:51:58PM: fail: reconcile deployment/webhook (apps/v1) namespace: vmware-sources
10:51:58PM: ^ Deployment is not progressing: ProgressDeadlineExceeded (message: ReplicaSet "webhook-6f5d979b7d" has timed out progressing.)
Run kubectl get pods to find the name of the pod:
kubectl get pods --show-labels -n NAMESPACE
Where NAMESPACE is the namespace associated with the reconcile error, for example, vmware-sources.
For example,
$ kubectl get pods --show-labels -n vmware-sources
NAME READY STATUS RESTARTS AGE LABELS
webhook-6f5d979b7d-cxr9k 0/1 Pending 0 44h app=webhook,kapp.k14s.io/app=1626302357703846007,kapp.k14s.io/association=v1.9621e0a793b4e925077dd557acedbcfe,pod-template-hash=6f5d979b7d,role=webhook,sources.tanzu.vmware.com/release=v0.23.0
Run kubectl logs and kubectl describe:
kubectl logs PODNAME -n NAMESPACE
kubectl describe pod PODNAME -n NAMESPACE
Where:
PODNAME is found in the output of step 3, for example webhook-6f5d979b7d-cxr9k.NAMESPACE is the namespace associated with the reconcile error, for example, vmware-sources.For example:
$ kubectl logs webhook-6f5d979b7d-cxr9k -n vmware-sources
$ kubectl describe pod webhook-6f5d979b7d-cxr9k -n vmware-sources
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 80s (x14 over 14m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
Review the output from the kubectl logs and kubectl describe commands and take further action.
For this example of the webhook deployment, the output indicates that the scheduler does not have enough CPU to run the pod. In this case, the solution is to add nodes or CPU cores to the cluster. If you are using Tanzu Mission Control (TMC), increase the number of workers in the node pool to three or more through the TMC UI. See Edit a Node Pool, in the TMC documentation.
You see the following error message when you run the install script:
Could not proceed with installation. Refer to Cloud Native Runtimes documentation for details on how to utilize an existing Contour installation. Another app owns the custom resource definitions listed below.
Follow the procedure in Install Cloud Native Runtimes on a Cluster with Your Existing Contour Instances to resolve the issue.
This only applies to CNR 2.0.1
You see the following error message when you describe a workload:
...
pods "<pod name>" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.containers[0].securityContext.runAsUser: Invalid value: 1000: must be in the ranges: [1000740000, 1000749999]
...
The service account (or user) used to create the workload is bound to the “restricted” SecurityContextConstraint, and thus is not able to run a pod with a UserID outside the set range.
You must bind the service accounts on your cluster to the “nonroot” SecurityContextConstraint to allow running as any nonroot UserID.
You can apply the following YAML to the run cluster to achieve this:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cnrs-nonroot
rules:
- apiGroups:
- security.openshift.io
resourceNames:
- nonroot
resources:
- securitycontextconstraints
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cnrs-nonroot-crb
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cnrs-nonroot
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
This will associate the nonroot SecurityContextConstraint to all ServiceAccounts. If you know the specific ServiceAccounts used to deploy workloads, you can modify the .subjects section of the ClusterRoleBinding above with specific subjects:
subjects:
- kind: ServiceAccount
name: <sa name>
namespace: <workload namespace>