Monitoring your system metrics using Tanzu Salt

Tanzu Salt exposes several system metrics that can be used for monitoring and diagnostics. These metrics are available in graphical form on the Tanzu Salt user interface dashboard and in machine-readable form using the /metrics http endpoint.

For more information about visualizing reports in the Tanzu Salt user interface using the Dashboard, see Dashboard Reports.

Machine-readable metrics

Tanzu Salt exports system metrics in OpenMetrics text-based format. This format is directly consumable by Prometheus and other monitoring and alerting tools.

Tanzu Salt metrics configuration

Configuration for system metrics collection consists of the following settings in the /etc/raas/raas configuration file. Default values are shown.

# System metrics settings
metrics:
  enabled: true                         # If True, enable the collection of system metrics
  prometheus: false                     # If True, enable the Prometheus endpoint at /metrics
  prometheus_username:                  # Static username for retrieving /metrics
  prometheus_password:                  # Static password for retrieving /metrics
  snapshot_interval: 60                 # How often to record snapshot metrics, in seconds
  max_query_timedelta: 86400            # Maximum timedelta for a single call to get_system_metrics, in seconds
  keep: 30                              # How long to retain metrics data, in days

The following settings control the handling of machine-readable system metrics:

To disable metrics collection, set enabled:false. Note that this will also disable the Tanzu Salt built-in dashboard.
To enable the export of machine-readable metrics from the /metrics http endpoint, set prometheus:true. This setting does not affect the Tanzu Salt built-in dashboard.
To access the /metrics http endpoint by http Basic Authentication, set your credentials in plaintext using the prometheus_username and prometheus_password variables. To authenticate using encrypted credentials or environmental variables, leave these variables blank. See Configuring Prometheus to connect to Tanzu Salt for more information about authentication methods.

The other settings shown above relate to the Tanzu Salt built-in dashboard and do not affect the collection or reporting of machine-readable system metrics. In particular, the snapshot_interval setting determines how often, in seconds, metrics are recorded for display on the dashboard, and the keep setting determines how long, in days, metrics data will be kept in the database before being trimmed.

Although the /metrics http endpoint is the recommended way to gather machine-readable metrics data from Tanzu Salt, you can use the API (RaaS) to retrieve the data presented on the built-in dashboard. The stats.get_system_metrics() API call lets you query metrics data by metric name, source, and date range. The configuration item max_query_timedelta limits how much data Tanzu Salt will return from a single API call. To get metrics data from a longer time span, you can make multiple API calls with different start and end dates.

Salt Master metrics configuration

The availability of some metrics depends on the configuration of the Salt masters connected to Tanzu Salt:

Metrics on Salt events and job returns will be accurate only if the sseapi returner is configured on the Salt masters.
Tanzu Salt will collect low-level function runtime information from Salt masters that have master_stats:true set in their configuration. This option is disabled by default. See master_stats in the Salt documentation for details.
Metrics on salt job states will be accurate only if the job completion engine is enabled in the Salt Master Plugin:
```
engines:
   -jobcompletion:{}
```
This engine is enabled in the Salt Master Plugin default configuration.

Configuring Prometheus to connect to Tanzu Salt

You can authenticate Automation Config with Prometheus using one of these methods:

To store your credentials in an encrypted bundle on etc/raas/raas.secconf, run raas save_creds from the command-line interface (CLI). The CLI then prompts you for your Postgres, Redis, and Prometheus credentials. You can also pass your credentials through the CLI using this syntax, replacing the example text with your credentials:
```
raas save_creds 'postgres={"username": "root", "password": "salt"} redis={"username": "default", "password": "redis123"} prometheus={"username": "metrics", "password": "prometheus"}'. See Securing credentials in your SaltStack Enterprise configuration for more information.
```
To store your credentials in environmental variables for use with container images, use the variables PROMETHEUS_USERNAME and PROMETHEUS_PASSWORD.
To store your credentials in plaintext, set the prometheus_username and prometheus_password variables in the /etc/raas/raas configuration file, as noted earlier on this page. These credentials are stored only in the /etc/raas/raas configuration file, are not associated with any Tanzu Salt account, and cannot be used to authenticate to Tanzu Salt other than for accessing the /metrics http endpoint.

Note:
If you use all three authentication methods, Tanzu Salt prioritizes the encrypted credentials stored in etc/raas/raas.secconf first, then the environmental variables, and then the plaintext credentials stored in /etc/raas/raas. Setting credentials in plaintext in the configuration file won’t prevent the encrypted credentials or environmental variables from being used. All three options must be blank (unused) in order to disable the endpoint.

You can enable a Prometheus server to scrape metrics from Tanzu Salt by adding a scrape_configs job to the Prometheus configuration (typically prometheus.yml) for each API (RaaS) server instance you have. Use this example file as a guide, replacing the suggested variables with the variables for your environment:

scrape_configs:
  - job_name: 'sse'

    metrics_path: '/metrics'
    scheme: 'http'

    static_configs:
        - targets: ['localhost:8080']

    basic_auth:
      username: prometheus
      password: metrics

The credentials in the Prometheus configuration should match the prometheus_username and prometheus_password specified in the /etc/raas/raas configuration file, as noted above.

See the Prometheus project documentation for more information on setting up scrape targets and other Prometheus configuration topics.

Metric descriptions

The machine-readable metrics that Tanzu Salt exports fall into several categories:

Category	Metric Name	Metric Type	Labels	Description
Salt master low-level metrics	`salt_event_size_bytes`	Histogram	`master_id`	Salt event size, in bytes
	`salt_master_cmd_duration_seconds`	Histogram	`master_id`, `cmd`	Salt master command duration, in seconds. Reported only if `master_stats` is configured on the Salt master.
Salt Master Plugin metrics	`raas_master_commands_processed`	Counter	`master_id`	SSE commands processed
	`raas_master_master_grains_pushed`	Counter	`master_id`	Salt master grain updates pushed to Tanzu Salt
	`raas_master_minion_keys_pushed`	Counter	`master_id`	Minion key states updates pushed to Tanzu Salt
	`raas_master_minion_cached_pushed`	Counter	`master_id`	Minion cache updates pushed to Tanzu Salt
	`raas_master_masterfs_pushed`	Counter	`master_id`	MasterFS updates pushed to Tanzu Salt
	`raas_master_sseapi_engine_iteration_seconds`	Histogram	`master_id`	API (RaaS) engine iteration duration, in seconds
Server metrics	`redis_commands_executed`	Counter	`redis_instance`	Redis commands executed (system cache)
	`redis_memory_bytes`	Gauge	`redis_instance`	Redis memory usage (system cache)
	`celery_tasks_queued`	Counter	`raas_instance`, `task`	Celery tasks queued (background jobs)
	`celery_tasks_executed`	Counter	`raas_instance`, `task`	Celery tasks executed (background jobs)
	`celery_queue_length`	Gauge	`raas_instance`	Celery queue length (background jobs waiting)
	`raas_rpc_request_duration_seconds`	Histogram	`raas_instance`	SSE RPC API call duration, in seconds
PostgreSQL metrics	`postgres_connections`	Gauge	`postgres_instance`	Postgres connections
	`postgres_transactions`	Counter	`postgres_instance`	Postgres transactions committed
	`postgres_rows_read`	Counter	`postgres_instance`	Postgres rows read
	`postgres_rows_inserted`	Counter	`postgres_instance`	Postgres rows inserted
	`postgres_rows_updated`	Counter	`postgres_instance`	Postgres rows updated
	`postgres_rows_deleted`	Counter	`postgres_instance`	Postgres rows deleted
System metrics	`highstate_minions`	Gauge	None	Number of minions that ran a highstate job
	`highstate_minions_changed`	Gauge	None	Number of minions that ran a highstate job resulting in one or more changes
	`highstate_minions_succeeded`	Gauge	None	Number of minions that ran a highstate job with no failures
	`highstate_minion_duration_seconds`	Gauge	None	Average per-minion duration for a highstate run
	`highstate_states`	Gauge	None	Number of unique states applied in highstate runs
	`highstate_states_changed`	Gauge	None	Number of states applied in highstate runs that resulted in one or more changes
	`highstate_states_succeeded`	Gauge	None	Number of states applied in highstate runs with no failures
	`sse_jobs_in_progress`	Counter	None	Tanzu Salt jobs in progress
	`sse_jobs_complete_all_successful`	Counter	None	Tanzu Salt jobs complete with all successful returns
	`sse_jobs_complete_missing_returns`	Counter	None	Tanzu Salt jobs complete with one or more missing returns
	`sse_jobs_complete_with_errors`	Counter	None	Tanzu Salt jobs complete with one or more errors
	`sse_masters`	Gauge	None	Total Salt masters in Tanzu Salt
	`sse_minion_grains_deleted`	Counter	`master_id`	Number of minion grains deleted
	`sse_minion_grains_indexing_duration_seconds`	Counter	`raas_instance`	Minion grains indexing calculations duration, in seconds
	`sse_minion_grains_saved`	Counter	`master_id`	Number of minion grains saved
	`sse_minion_target_match_calcs`	Counter	`raas_instance`	Number of minion versus target group matching calculations
	`sse_minion_target_match_duration_seconds`	Counter	`raas_instance`	Minion versus target group matching calculations durations, in seconds
	`sse_minions`	Gauge	None	Total minions in Tanzu Salt
	`sse_minions_present`	Gauge	`master_id`	Minions present within the configured time limit `raas_presence_expiration`
	`sse_minions_lost`	Gauge	`master_id`	Minions not present within the time limit
	`sse_minions_unknown`	Gauge	`master_id`	Unknown minions (never present)
	`sse_users_authenticated`	Gauge	None	Users authenticated to Tanzu Salt