This section describes System Metrics Monitoring on the Orchestrator.
Orchestrator System Metrics Monitoring Overview
The Orchestrator comes with a built-in system metrics monitoring stack, which includes a metrics collector and a time-series database. With the monitoring stack, you can easily check the health condition and the system load for the Orchestrator.
To enable the monitoring stack, run the following command on the orchestrator:
sudo /opt/vc/scripts/vco_observability_manager.sh enable
To check the status of the monitoring stack, run:
sudo /opt/vc/scripts/vco_observability_manager.sh status
To deactivate the monitoring stack, run:
sudo /opt/vc/scripts/vco_observability_manager.sh disable
The Metrics Collector
Telegraf is used as the Orchestrator system metrics collector, which includes plugins to collect system metrics. The following metrics are enabled by default.
Metric Name | Description |
---|---|
inputs.cpu | Metrics about CPU usage. |
inputs.mem | Metrics about memory usage. |
inputs.net | Metrics about network interfaces. |
inputs.system | Metrics about system load and uptime. |
inputs.processes | The number of processes grouped by status. |
inputs.disk | Metrics about disk usage. |
inputs.diskio | Metrics about disk IO by device. |
inputs.procstat | CPU and memory usage for specific processes. |
inputs.nginx | Nginx's basic status information (ngx_http_stub_status_module). |
inputs.mysql | Statistic data from the MySQL server. |
inputs.clickhouse | Metrics from one or many ClickHouse servers. |
inputs.redis | Metrics from one or many redis servers. |
inputs.filecount | The number and total size of files in specified directories. |
inputs.ntpq | Standard NTP query metrics (requires ntpq executable). |
Inputs.x509_cert | Metrics from a SSL certificate. |
To activate more metrics or deactivate some enabled metrics, edit the Telegraf configuration file on the Orchestrator by the following:
- sudo vi /etc/telegraf/telegraf.d/system_metrics_input.conf
- sudo systemctl restart telegraf
The Time-series Database
Prometheus is used to store the system metrics collected by Telegraf. The metrics data will be kept in the database for three weeks at the most. By default, Prometheus listens on port 9090. If you have an external monitoring tool, provide the Prometheus database as a source, so that you can view the Orchestrator system metrics on your monitoring UI.