Automation Config exposes several system metrics that can be used for monitoring and diagnostics. These metrics are available in graphical form on the Automation Config user interface dashboard and in machine-readable form using the /metrics http endpoint.

For more information about visualizing reports in the Automation Config user interface using the Dashboard, see Dashboard Reports.

Machine-readable metrics

Automation Config exports system metrics in OpenMetrics text-based format. This format is directly consumable by Prometheus and other monitoring and alerting tools.

Automation Config metrics configuration

Configuration for system metrics collection consists of the following settings in the /etc/raas/raas configuration file. Default values are shown.

# System metrics settings
metrics:
  enabled: true                         # If True, enable the collection of system metrics
  prometheus: false                     # If True, enable the Prometheus endpoint at /metrics
  prometheus_username:                  # Static username for retrieving /metrics
  prometheus_password:                  # Static password for retrieving /metrics
  snapshot_interval: 60                 # How often to record snapshot metrics, in seconds
  max_query_timedelta: 86400            # Maximum timedelta for a single call to get_system_metrics, in seconds
  keep: 30                              # How long to retain metrics data, in days

The following settings control the handling of machine-readable system metrics:

  • To disable metrics collection, set enabled:false. Note that this will also disable the Automation Config built-in dashboard.
  • To enable the export of machine-readable metrics from the /metrics http endpoint, set prometheus:true. This setting does not affect the Automation Config built-in dashboard.
  • Access to the /metrics http endpoint is controlled by http Basic Authentication with credentials configured in prometheus_username and prometheus_password. Non-empty values are required for these settings to enable the /metrics endpoint. These credentials are stored only in the /etc/raas/raas configuration file, are not associated with any Automation Config account, and cannot be used to authenticate to Automation Config other than for accessing the /metrics http endpoint.

The other settings shown above relate to the Automation Config built-in dashboard and do not affect the collection or reporting of machine-readable system metrics. In particular, the snapshot_interval setting determines how often, in seconds, metrics are recorded for display on the dashboard, and the keep setting determines how long, in days, metrics data will be kept in the database before being trimmed.

Although the /metrics http endpoint is the recommended way to gather machine-readable metrics data from Automation Config, you can use the API (RaaS) to retrieve the data presented on the built-in dashboard. The stats.get_system_metrics() API call lets you query metrics data by metric name, source, and date range. The configuration item max_query_timedelta limits how much data Automation Config will return from a single API call. To get metrics data from a longer time span, you can make multiple API calls with different start and end dates.

Salt Master metrics configuration

The availability of some metrics depends on the configuration of the Salt masters connected to Automation Config:

  • Metrics on Salt events and job returns will be accurate only if the sseapi returner is configured on the Salt masters. Metrics collection will not work properly with the sse_pgjsonb (direct-to-database) returner.
  • Automation Config will collect low-level function runtime information from Salt masters that have master_stats:true set in their configuration. This option is disabled by default. See master_stats in the Salt documentation for details.
  • Metrics on salt job states will be accurate only if the job completion engine is enabled in the Salt Master Plugin:
    engines:
       -jobcompletion:{}

    This engine is enabled in the Salt Master Plugin default configuration.

Configuring Prometheus to connect to Automation Config

You can enable a Prometheus server to scrape metrics from Automation Config by adding a scrape_configs job to the Prometheus configuration (typically prometheus.yml) for each API (RaaS) server instance you have:

scrape_configs:
  - job_name: 'sse'

    metrics_path: '/metrics'
    scheme: 'http'

    static_configs:
        - targets: ['localhost:8080']

    basic_auth:
      username: prometheus
      password: metrics

The credentials in the Prometheus configuration should match the prometheus_username and prometheus_password specified in the /etc/raas/raas configuration file, as noted above.

See the Prometheus project documentation for more information on setting up scrape targets and other Prometheus configuration topics.

Note: As part of VMware’s initiative to remove problematic terminology, the term Salt master will be replaced with a better term in Automation Config and related products and documentation. This terminology update may take a few release cycles before it is fully complete.

Metric descriptions

The machine-readable metrics that Automation Config exports fall into several categories:

Category Metric Name Metric Type Labels Description
Salt master low-level metrics salt_event_size_bytes Histogram master_id Salt event size, in bytes
salt_master_cmd_duration_seconds Histogram master_id, cmd Salt master command duration, in seconds. Reported only if master_stats is configured on the Salt master.
         
Salt Master Plugin metrics raas_master_commands_processed Counter master_id SSE commands processed
raas_master_master_grains_pushed Counter master_id Salt master grain updates pushed to Automation Config
raas_master_minion_keys_pushed Counter master_id Minion key states updates pushed to Automation Config
raas_master_minion_cached_pushed Counter master_id Minion cache updates pushed to Automation Config
raas_master_masterfs_pushed Counter master_id MasterFS updates pushed to Automation Config
raas_master_sseapi_engine_iteration_seconds Histogram master_id API (RaaS) engine iteration duration, in seconds
         
Server metrics redis_commands_executed Counter redis_instance Redis commands executed (system cache)
redis_memory_bytes Gauge redis_instance Redis memory usage (system cache)
celery_tasks_queued Counter raas_instance, task Celery tasks queued (background jobs)
celery_tasks_executed Counter raas_instance, task Celery tasks executed (background jobs)
celery_queue_length Gauge raas_instance Celery queue length (background jobs waiting)
raas_rpc_request_duration_seconds Histogram raas_instance SSE RPC API call duration, in seconds
         
PostgreSQL metrics postgres_connections Gauge postgres_instance Postgres connections
postgres_transactions Counter postgres_instance Postgres transactions committed
postgres_rows_read Counter postgres_instance Postgres rows read
postgres_rows_inserted Counter postgres_instance Postgres rows inserted
postgres_rows_updated Counter postgres_instance Postgres rows updated
postgres_rows_deleted Counter postgres_instance Postgres rows deleted
         
System metrics highstate_minions Gauge None Number of minions that ran a highstate job
highstate_minions_changed Gauge None Number of minions that ran a highstate job resulting in one or more changes
highstate_minions_succeeded Gauge None Number of minions that ran a highstate job with no failures
highstate_minion_duration_seconds Gauge None Average per-minion duration for a highstate run
highstate_states Gauge None Number of unique states applied in highstate runs
highstate_states_changed Gauge None Number of states applied in highstate runs that resulted in one or more changes
highstate_states_succeeded Gauge None Number of states applied in highstate runs with no failures
sse_jobs_in_progress Counter None Automation Config jobs in progress
sse_jobs_complete_all_successful Counter None Automation Config jobs complete with all successful returns
sse_jobs_complete_missing_returns Counter None Automation Config jobs complete with one or more missing returns
sse_jobs_complete_with_errors Counter None Automation Config jobs complete with one or more errors
sse_masters Gauge None Total Salt masters in Automation Config
sse_minion_grains_deleted Counter master_id Number of minion grains deleted
sse_minion_grains_indexing_duration_seconds Counter raas_instance Minion grains indexing calculations duration, in seconds
sse_minion_grains_saved Counter master_id Number of minion grains saved
sse_minion_target_match_calcs Counter raas_instace Number of minion versus target group matching calculations
sse_minion_target_match_duration_seconds Counter raas_instance Minion versus target group matching calculations durations, in seconds
sse_minions Gauge None Total minions in Automation Config
sse_minions_present Gauge master_id Minions present within the configured time limit raas_presence_expiration
sse_minions_lost Gauge master_id Minions not present within the time limit
sse_minions_unknown Gauge master_id Unknown minions (never present)
sse_users_authenticated Gauge None Users authenticated to Automation Config