This topic describes how to configure VMware Tanzu® Kubernetes Grid™ Integrated Edition (TKGI) cluster discovery in Healthwatch™ for VMware Tanzu® (Healthwatch).
In the TKGI Cluster Discovery pane of the Healthwatch tile, you configure the Prometheus instance in the Healthwatch tile to detect on-demand Kubernetes clusters created through the TKGI API and create scrape jobs for them. You only need to configure this pane if you have Ops Manager foundations with TKGI installed.
The Prometheus instance detects and scrapes TKGI clusters by connecting to the Kubernetes API through the TKGI API using a UAA client. To allow this, you must configure the Healthwatch tile, the Prometheus instance in the Healthwatch tile, the UAA client that the Prometheus instance uses to connect to the TKGI API, and the TKGI tile.
To configure TKGI cluster discovery:
Configure the TKGI Cluster Discovery pane in the Healthwatch tile. For more information, see Configure TKGI Cluster Discovery in Healthwatch below.
Configure TKGI to allow the Prometheus instance to scrape metrics from TKGI clusters. For more information, see Configure TKGI below.
If TKGI cluster discovery fails after you have completed both parts of the procedure in this topic, see Troubleshooting TKGI Cluster Discovery Failure below.
Note: To collect additional BOSH system metrics related to TKGI and view them in the Grafana UI, you must install and configure the Healthwatch Exporter for TKGI on your Ops Manager foundations with TKGI installed. To install the Healthwatch Exporter for TKGI tile, see Installing a Tile Manually. To configure the Healthwatch Exporter for TKGI tile, see Configuring Healthwatch Exporter for TKGI.
In the TKGI Cluster Discovery pane of the Healthwatch tile, you configure TKGI cluster discovery, including the UAA client that the Prometheus instance uses to connect to the Kubernetes API through the TKGI API.
To configure the TKGI Cluster Discovery pane:
Navigate to the Ops Manager Installation Dashboard.
Click the Healthwatch tile.
Select TKGI Cluster Discovery.
Under TKGI cluster discovery, select one of the following options:
For Discovery interval, enter in seconds how frequently you want the Prometheus instance detects and scrapes TKGI clusters. The minimum value is 60
.
(Optional) To allow the Prometheus instance to communicate with the TKGI API over TLS, configure one of the following options:
Click Save.
After you configure TKGI cluster discovery in the Healthwatch tile, you must configure TKGI to allow the Prometheus instance to scrape metrics from TKGI clusters.
To configure TKGI:
Return to the Ops Manager Installation Dashboard.
Click the Tanzu Kubernetes Grid Integrated Edition tile.
Select Host Monitoring.
Under Enable Telegraf Outputs?, select Yes.
Activate the Include etcd metrics checkbox to allow TKGI to send etcd server and debugging metrics to Healthwatch.
Activate the Include Kubernetes Controller Manager metrics checkbox to allow TKGI to send Kubernetes Controller Manager metrics to Healthwatch.
If you are using TKGI v1.14.2 or later, activate the Include Kubernetes Scheduler metrics checkbox to allow TKGI to send Kubernetes Scheduler metrics to Healthwatch.
For Setup Telegraf Outputs, provide the following TOML configuration file:
[[outputs.prometheus_client]]
listen = ":10200"
metric_version = 2
You must use 10200
as the listening port to allow the Prometheus instance to scrape Telegraf metrics from your TKGI clusters. For more information about creating a configuration file in TKGI, see the TKGI documentation.
Note: If you are configuring TKGI v1.12 or earlier, remove metric_version = 2
from the TOML configuration file.
Click Save.
For each plan you want to monitor:
For (Optional) Add-ons - Use with caution, enter the following YAML snippet to create the roles required to allow the Prometheus instance to scrape metrics from your TKGI clusters:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: healthwatch
rules:
- resources:
- pods/proxy
- pods
- nodes
- nodes/proxy
- namespace/pods
- endpoints
- services
verbs:
- get
- watch
- list
apiGroups:
- ""
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: healthwatch
roleRef:
apiGroup: ""
kind: ClusterRole
name: healthwatch
subjects:
- apiGroup: ""
kind: User
name: healthwatch
If (Optional) Add-ons - Use with caution already contains other API resource definitions, append the above YAML snippet to the end of the existing resource definitions, followed by a newline character.
Click Save.
Select Errands.
Ensure that the Upgrade all clusters errand is running. Running this errand configures your TKGI clusters with the roles you created in the (Optional) Add-ons - Use with caution field of the plans you monitor in a previous step.
Click Save.
TKGI cluster discovery can fail if the Prometheus instance fails to scrape metrics from your TKGI clusters. To troubleshoot TKGI cluster discovery failure, see Troubleshooting Missing TKGI Cluster Metrics in Troubleshooting Healthwatch.