VMware Edge Compute Stack provides integration into existing industry standard monitoring and metrics gathering tools, currently it provide built-in Telegraf agents to enable metrics gathering for both Virtual Machines and Kubernetes workloads.

Monitoring with Prometheus and Grafana

Edge Compute Stack hosts can be configured to send metrics to a Prometheus endpoint to enable metrics from both Virtual Machines and Kubernetes workloads, a sample deployment of both the Prometheus and Grafana are available in the "metrics-in-a-box" folder of the ECS Sample Git Repositry located here.

Note:

The current Edge Compute Stack Monitoring solution can only send metrics to Prometheus servers via un-encrypted HTTP

Technical Components

The ECS Monitoring solution uses Telegraf to collect metrics from the relevant Edge Compute Stack endpoints. It uses the following plugins for the metrics scraping:

  • inputs.vsphere → Edge Compute Stack host API

  • inputs.kubernetes → Kubelet interface

Edge Compute Stack Monitoring Configuration

The configuration of the Telegraf agent(s) responsible for scraping and sending the metrics to the data-lakes is done via a yaml manifest in the git repository associated with the Edge Compute Stack Host.

The following options can be configured:

  • metadata/name: the configuration name

  • spec/metrics-collection/enabled: a switch to enable/disable the metrics collection

  • spec/metrics-collection/metrics-sink/url: the HTTP url of the data-lake

  • spec/metrics-collection/metrics-sink/data-format: which data-serializer to use to format the data (we currently support prometheus, which allows the solution to send data to a push gateway, or prometheusremotewrite, this will send data directly to a Prometheus server, provided that the remote write receiver is enabled).

Example Configuration

apiVersion: esx.vmware.com/v1alpha1
kind: HostConfiguration
metadata:
  name: ecsmonitoring-metrics
  namespace: esx-system
spec:
  metrics-collection:
    enabled: true
    metrics-sink:
      # Edit the below URL to provide the worker node IP address
      url: "http://192.168.11.101:30777/api/v1/write"
      data-format: "prometheusremotewrite"