Configuring observability for Cloud Native Runtimes

This topic tells you how to configure observability for Cloud Native Runtimes, commonly known as CNRs.

Overview

You can set up integrations with third-party observability tools to use logging, metrics, and tracing with Cloud Native Runtimes. These observability integrations allow you to monitor and collect detailed metrics from your clusters on Cloud Native Runtimes. You can collect logs and metrics for all workloads running on a cluster. This includes Cloud Native Runtimes components or any apps running on Cloud Native Runtimes. The integrations in this topic are recommended by VMware, however you can use any Kubernetes-compatible logging, metrics, and tracing platforms to monitor your cluster workload.

Logging

You can collect and forward logs for all workloads on a cluster, including Cloud Native Runtimes components and any apps running on Cloud Native Runtimes. You can use any logging platform that is compatible with Kubernetes to collect and forward logs for Cloud Native Runtimes workloads. VMware recommends using Fluent Bit to collect logs and forward logs to vRealize. The following sections describe configuring logging for Cloud Native Runtimes with Fluent Bit and vRealize as an example.

Configure Logging with Fluent Bit

You can use Fluent Bit to collect logs for all workloads on a cluster, including Cloud Native Runtimes components or any apps running on Cloud Native Runtimes. For more information about using Fluent Bit logs, see Fluent Bit Kubernetes Logging.

Fluent Bit lets you collect logs from Kubernetes containers, add Kubernetes metadata to these logs, and forward logs to third-party log storage services. For more information about collecting logs, see the Knative documentation.

If you are using Tanzu Mission Control (TMC), vSphere 7.0 with Tanzu, or Tanzu Kubernetes Cluster to manage your cloud-native environment, you must set up a role binding that grants required permissions to Fluent Bit containers to configure logging with any integration. Then, follow the instructions in the Fluent Bit documentation to complete the logging configuration. For more information about configuring Fluent Bit logging, see the Fluent Bit documentation.

To configure logging with Fluent Bit for your Cloud Native Runtimes environment:

VMware recommends that you add any integrations to the ConfigMap in your Knative Serving namespace. Follow the logging configuration steps in the Fluent Bit documentation to create the Namespace, ServiceAccount, Role, RoleBinding, and ConfigMap. To view these steps, see the Fluent Bit documentation.
If you are using TMC, vSphere with Tanzu, or Tanzu Kubernetes Cluster to manage your cloud-native environment, create a role binding in the Kubernetes namespace where your integration is deployed to grant permission for privileged Fluent Bit containers. For information about creating a role binding on a Tanzu platform, see Add a Role Binding. For information about viewing your Kubernetes namespaces, see the Kubernetes documentation.

Create the following role binding:
```
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: fluentbit-psp-rolebinding
  namespace: FLUENTBIT-NAMESPACE
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name:  PRIVILEGED-CLUSTERROLE
subjects:
- kind: ServiceAccount
  name: FLUENTBIT-SERVICEACCOUNT
  namespace: FLUENTBIT-NAMESPACE
```
Where:
- FLUENTBIT-NAMESPACE is your Fluent Bit namespace.
- PRIVILEGED-CLUSTERROLE is the name of your privileged cluster role.
- FLUENTBIT-SERVICEACCOUNT is your Fluent Bit service account.
To verify that you configured logging successfully, run the following to access logs through your web browser:
```
kubectl port-forward --namespace logging service/log-collector 8080:80
```
For more information about accessing Fluent Bit logs, see the Knative documentation.

Forward Logs to vRealize

After you configure log collection, you can forward logs to log management services. vRealize Log Insight is one service you can use with Cloud Native Runtimes. vRealize Log Insight is a scalable log management solution that provides log management, dashboards, analytics, and third-party extensibility for infrastructure and apps. For more information about vRealize Log Insight, see the VMware vRealize Log Insight Documentation.

To forward logs from your Cloud Native Runtimes environment to vRealize, you can use a new or existing instance of Tanzu Kubernetes Cluster. For information about how to configure log forwarding to vRealize from Tanzu Kubernetes Cluster, see the Configure Log forwarding from VMware Tanzu Kubernetes Cluster to vRealize Log Insight Cloud blog.

Metrics

Cloud Native Runtimes integrates with Prometheus and VMware Aria Operations for Applications (formerly known as Tanzu Observability by Wavefront) to collect metrics on components or apps. For more information about integrating with Prometheus, see the Prometheus documentation and the Wavefront documentation.

You can configure Prometheus endpoints on Cloud Native Runtimes components to collect metrics on your components or apps. For information about configuring this, see the Prometheus documentation.

You can use annotation based discovery with Prometheus to define which Kubernetes objects in your Cloud Native Runtimes environment to add metadata and collect metrics in a more automated way. For more information about using annotation based discovery, see the VMware Aria Operations for Applications documentation in GitHub.

You can then use the Wavefront Collector for Kubernetes collector to dynamically discover and scrape pods with the prometheus.io/scrape annotation prefix. For information about the Kubernetes collector, see the VMware Aria Operations for Applications documentation in GitHub.

Note
All Cloud Native Runtimes related metrics have the prefix tanzu.vmware.com/cloud-native-runtimes.*.

Tracing

Tracing is a method for understanding the performance of specific code paths in apps as they handle requests. You can configure tracing to collect performance metrics for your apps or Cloud Native Runtimes components. You can trace which aspects of Cloud Native Runtimes and workloads running on Cloud Native Runtimes are performing poorly.

Configuring Tracing

You can configure tracing for your applications on Cloud Native Runtimes. You configure tracing for Knative Serving by editing the ConfigMap config-tracing for your Knative namespaces.

VMware recommends that you add any integrations in your Serving namespaces. For information about how to enable request traces in each component, see the Knative documentation.

Forwarding trace data to an observability platform or data visualization tool

You can use the OpenTelemetry integration to forward trace data to a data visualization tool that can ingest data in Zipkin format. For more information about using Zipkin for tracing, see the Zipkin documentation.

VMware recommends integrating with VMware Aria Operations for Applications (formerly known as Tanzu Observability by Wavefront). For information about forwarding trace data, see the VMware Aria Operations for Applications documentation.

Sending trace data to VMware Aria Operations for Applications

You can send trace data to an observability and analytics platform such as VMware Aria Operations for Applications to view and monitor your trace data in dashboards. VMware Aria Operations for Applications offers several deployment options. During development, a single proxy is often sufficient for all data sources. For more information about other deployment options, see the VMware Aria Operations for Applications documentation.

To configure Cloud Native Runtimes to send traces to the Wavefront proxy and then, configure the Wavefront proxy to consume Zipkin spans:

Deploy the Wavefront Proxy. For more information about Wavefront proxies, see Install and Manage Wavefront Proxies.
Configure the namespace where the Wavefront Proxy was deployed with proper credentials to its container image registry.

The following example uses the Namespace Provisioner package to automatically configure namespaces labeled with apps.tanzu.vmware.com/tap-ns.
```
export WF_NAMESPACE=default
kubectl label namespace ${WF_NAMESPACE} apps.tanzu.vmware.com/tap-ns=""

export WF_REGISTRY_HOSTNAME=HOSTNAME
export WF_REGISTRY_USERNAME=USERNAME
export WF_REGISTRY_PASSWORD=PASSWORD
tanzu secret registry add registry-credentials \
 --username ${WF_REGISTRY_USERNAME} --password ${WF_REGISTRY_PASSWORD} \
 --server ${WF_REGISTRY_HOSTNAME} \
 --export-to-all-namespaces --yes --namespace tap-install
```
Where:
- WF_NAMESPACE is the namespace where you deployed the Wavefront Proxy.
- HOSTNAME is the image registry where the Wavefront Proxy image is located.
- USERNAME is your user name to access the image registry to pull the Wavefront Proxy image.
- PASSWORD is your password to access the image registry to pull the Wavefront Proxy image.
For more information about how to set up developer namespaces, see Provision developer namespaces. Provision developer namespaces.
Configure the Wavefront Proxy to allow Zipkin/Istio traces.

You can uncomment the lines indicated in the YAML file for the Wavefront Deployment to enable consumption of Zipkin traces. Edit the Wavefront Deployment to set the WAVEFRONT_PROXY_ARGS environment variable to the value --traceZipkinListenerPorts 9411. Also, edit the Wavefront Deployment to expose the containerPort 9411.
Confirm that the Wavefront Proxy is running and working.

Verify that pods are running. For more information about how to test a proxy, see the VMware Aria Operations for Applications documentation.
```
kubectl get pods -n ${WF_NAMESPACE}
```
Where WF_NAMESPACE is the namespace where you deployed the Wavefront Proxy.
Edit the Serving ConfigMap config-tracing to enable the Zipkin tracing integration.

You can configure Cloud Native Runtimes to send traces to the Wavefront proxy by editing the zipkin-endpoint property in the ConfigMap to point to the Wavefront proxy URL. You can configure the Wavefront proxy to consume Zipkin spans by listening to port 9411.

Note
There are two ways of editing a Knative ConfigMap on Cloud Native Runtimes. Depending on your installation, you can edit the ConfigMap directly on the cluster or by using overlays. For information about how to edit ConfigMaps using overlays, see Configuring Cloud Native Runtimes.

The following example of a Kubernetes secret contains a ytt overlay with the suggested changes to the ConfigMap config-tracing:
```
apiVersion: v1
kind: Secret
metadata:
  name: cnrs-patch
stringData:
  patch.yaml: |
    #@ load("@ytt:overlay", "overlay")
    #@overlay/match by=overlay.subset({"kind":"ConfigMap","metadata":{"name":"config-tracing","namespace":"knative-serving"}})
    ---
    data:
      #@overlay/match missing_ok=True
      backend: "zipkin"
      #@overlay/match missing_ok=True
      zipkin-endpoint: "http://wavefront-proxy.default.svc.cluster.local:9411/api/v2/spans"
```
After you follow the steps in Customizing Cloud Native Runtimes to configure your installation to use an overlay, you can examine the ConfigMap on the cluster to confirm that the changes were applied.
```
kubectl get configmap config-tracing --namespace knative-serving --output yaml
```
The ConfigMap looks like this example:
```
apiVersion: v1
kind: ConfigMap
metadata:
 name: config-tracing
data:
 _example: |
    ...
 backend: "zipkin"
 zipkin-endpoint: "http://wavefront-proxy.default.svc.cluster.local:9411/api/v2/spans"
```

Other resources: