Configuring Telegraf in TKGI

This topic describes how to configure Telegraf in VMware Tanzu Kubernetes Grid Integrated Edition (TKGI).

Overview

You can configure Telegraf to collect metrics from TKGI API, control plane node, and worker node VMs and send the metrics to a monitoring service, such as Wavefront or Datadog.

For more information about these metrics, see Metrics: Telegraf in Monitoring TKGI and TKGI-Provisioned Clusters.

Collect Metrics Using Telegraf

To collect metrics using Telegraf:

Create a configuration file for your output plugin. See Create a Configuration File below.
Configure Telegraf in the Tanzu Kubernetes Grid Integrated Edition tile. See Configure Telegraf in the Tile below.

Create a Configuration File

To connect a monitoring service to TKGI, you must create a configuration file for the service. The configuration file is written in a TOML format and consists of key-value pairs. After you create your configuration file, you can enter the file into the Tanzu Kubernetes Grid Integrated Edition tile to connect the service.

To create a configuration file for your monitoring service:

Locate the required format for your monitoring service in the README.md file for your service in telegraf in GitHub. For example, if you want to collect metrics from etcd, the etcd documentation recommends using the open-source Prometheus monitoring service.
Create your configuration file using the required format of your monitoring service. For example, if you want to create a configuration file for an HTTP output plugin, create a file similar to the following:
```
[[outputs.http]]
   url="https://example.com"
   method="POST"
   data_format="json"
[[processors.override]]
  [processors.override.tags]
    director = "bosh-director-1"
```
Note: You can add tags to your configuration file to label etcd metrics. For example, the above code snippet adds a bosh-director-1 tag to the etcd metrics. If you have multiple BOSH Directors, VMware recommends adding tags to filter your metrics in your monitoring service.

Configure Telegraf in the Tile

To configure TKGI to use Telegraf for metric collection:

Navigate to the Tanzu Kubernetes Grid Integrated Edition tile > Settings > Host Monitoring.
Under Enable Telegraf Outputs?, select Yes.

Configure Telegraf output settings as described in the table below.

Configuration Setting	Description…to send these metrics to your monitoring service
Prometheus input plugin Metric version	Controls the metrics mapping from Prometheus to telegraf when scraping metrics using the Prometheus input plugin. The Prometheus input plugin scrapes the following metrics: `node_exporter`, `kube_apiserver`, `kube_controller_manager`, `kube_scheduler`, and `etcd metrics`. Requires TKGI v1.13.7 or later. Your Prometheus client must be configured with the matching `metric_version` setting. For more information, see Prometheus Input Plugin in the telegraf GitHub repository.
Enable node exporter on TKGI API	Enable to send Node Exporter metrics from the TKGI API VM.
Enable node exporter on control plane	Enable to send Node Exporter metrics from Kubernetes control plane nodes.
Include etcd metrics	Enable to send etcd server and debugging metrics.
Enable node exporter on worker	Enable to send Node Exporter metrics from Kubernetes worker nodes.
Include Kubernetes Controller Manager metrics	Enable to send Kubernetes controller manager metrics. These metrics provide information about the state of each cluster.
Include Kubernetes API Server metrics	Enable to send Kubernetes API Server metrics.
Include kubelet metrics	Enable to send kubelet metrics for all workloads running in all your Kubernetes clusters. If you enable Include kubelet metrics, be prepared for a high volume of metrics.

Note: The telegraf output configuration options are visible to TKGI admins only.

In Setup Telegraf Outputs, replace the default value [[outputs.discard]] with the contents of the configuration file that you created in Create a Configuration File above. See the following example for an HTTP output plugin:
```
[[outputs.http]]
   url="https://example.com"
   method="POST"
   data_format="json"
[[processors.override]]
  [processors.override.tags]
    director = "bosh-director-1"
```
Note: In TKGI v1.13.6 and earlier, if you use the Prometheus Output plugin, your Prometheus Client must be configured with metric_version=2. For Telegraf Prometheus Output plugin configuration information, see Configuration in the Telegraf GitHub repository.
In Setup Telegraf Agent, replace the default Telegraf agent property values with your custom values for interval, buffering and debugging related properties. For more information about the configurable Telegraf agent properties, see Agent configuration in the Telegraf documentation.
Click Save.
To deploy the Tanzu Kubernetes Grid Integrated Edition tile, return to the Ops Manager Installation Dashboard and click Review Pending Changes > Apply Changes.

Troubleshoot etcd

VMware recommends working with Support to troubleshoot control plane/etcd node VMs. The monitoring and metrics data you retrieve from the control plane/etcd node VMs can help the Support team diagnose and troubleshoot errors.