System Metrics Monitoring

This section describes System Metrics Monitoring on the Orchestrator.

Orchestrator System Metrics Monitoring Overview

The Orchestrator comes with a built-in system metrics monitoring stack, which includes a metrics collector and a time-series database. With the monitoring stack, you can easily check the health condition and the system load for the Orchestrator.

To enable the monitoring stack, run the following command on the orchestrator:

sudo /opt/vc/scripts/vco_observability_manager.sh enable

To check the status of the monitoring stack, run:

sudo /opt/vc/scripts/vco_observability_manager.sh status

To deactivate the monitoring stack, run:

sudo /opt/vc/scripts/vco_observability_manager.sh disable

The Metrics Collector

Telegraf is used as the Orchestrator system metrics collector, which includes plugins to collect system metrics. The following metrics are enabled by default.


Metric Name	Description
inputs.cpu	Metrics about CPU usage.
inputs.mem	Metrics about memory usage.
inputs.net	Metrics about network interfaces.
inputs.system	Metrics about system load and uptime.
inputs.processes	The number of processes grouped by status.
inputs.disk	Metrics about disk usage.
inputs.diskio	Metrics about disk IO by device.
inputs.procstat	CPU and memory usage for specific processes.
inputs.nginx	Nginx's basic status information (ngx_http_stub_status_module).
inputs.mysql	Statistic data from the MySQL server.
inputs.clickhouse	Metrics from one or many ClickHouse servers.
inputs.redis	Metrics from one or many redis servers.
inputs.filecount	The number and total size of files in specified directories.
inputs.ntpq	Standard NTP query metrics (requires ntpq executable).
Inputs.x509_cert	Metrics from a SSL certificate.

To activate more metrics or deactivate some enabled metrics, edit the Telegraf configuration file on the Orchestrator by the following:

sudo vi /etc/telegraf/telegraf.d/system_metrics_input.conf
sudo systemctl restart telegraf

The Time-series Database

Prometheus is used to store the system metrics collected by Telegraf. The metrics data will be kept in the database for three weeks at the most. By default, Prometheus listens on port 9090. If you have an external monitoring tool, provide the Prometheus database as a source, so that you can view the Orchestrator system metrics on your monitoring UI.