Monitoring Metric Store

Learn how to monitor Metric Store. It includes key scaling indicators (KSIs) to guide Metric Store scaling decisions.

You can scale Metric Store vertically. When disk resources are reaching complete consumption, Metric Store starts dropping the oldest data first. When memory or CPU resources are reaching complete consumption, then scale Metric Store vertically.

Key Scaling indicators

nozzle_dropped

Metric emitted by Metric Store

Metric Store publishes metrics for monitoring the Metric Store itself. You can use these metrics to observe the health of the Metric Store and verify that the VMs and disks are appropriately scaled.

Nozzle Job Metrics

The nozzle is the process that ingresses envelopes from Loggregator’s Reverse Log Proxy (RLP).

Reports	Metric	Type	Notes
Number of envelopes sent to co-located Metric Store	`metric_store_nozzle_ingress_envelopes_total`	counter	When increasing, nozzle is correctly consuming from the RLP
Number of envelopes dropped when reading from RLP	`metric_store_nozzle_dropped_envelopes_total`	counter	If this number is increasing at a steady rate, it might indicate that you need to scale up the size of your VMs
Number of points dropped in outbound channel	`metric_store_nozzle_dropped_points_total`	counter	Should always be zero. If not, it might be useful in debugging.
Number of points written to its co-located Metric Store	`metric_store_nozzle_egress_points_total`	counter
Number of errors writing to a remote node	`metric_store_nozzle_egress_errors_total`	counter	If this number consistently increasing, it might indicate network issues or an overloaded Metric Store node
Total duration spent writing to points.	`metric_store_nozzle_egres_duration_seconds`	gauge
Total number of envelopes skipped by a tag within the nozzle	`metric_store_nozzle_skipped_envelopes_by_tag_total`	counter	When Enable Envelope Selector is deactivated, which is the default, the value is 0.

CF Auth Proxy Job Metrics

Reports	Metric	Type	Notes
Duration in seconds of requests made to the auth proxy	`metric_store_auth_proxy_request_duration_seconds`	gauge
Duration in seconds of external requests made to CAPI	`metric_store_auth_proxy_capi_request_duration_seconds`	gauge

Metric Store Job Metrics

Reports	Metric	Type	Notes
Number of points ingressed to co-located Metric Store	`metric_store_ingress_points_total`	counter	This should be steadily increasing at a relatively consistent rate
Number of points successfully written to storage engine	`metric_store_written_points_total`	counter	This should be steadily increasing at a relatively consistent rate
Time spent writing points to the storage engine	`metric_store_write_duration_seconds`	gauge
Percentage of free space on persistent disk	`metric_store_disk_free_ratio`	gauge
Number of shards removed because of time-based expiration	`metric_store_expired_shards_total`	counter
Number of shards removed because of the disk space threshold	`metric_store_pruned_shards_total`	counter
Number of points dropped because of deleting the shard that is in the pending state	`metric_store_pending_deletion_dropped_points_total`	counter
metric_store_storage_days	`Days of data stored on disk`	gauge
Size of the index	`metric_store_index_size_bytes`	gauge
Number of unique series stored in the index	`metric_store_series_count`	gauge
Number of unique measurements stored in the index	`metric_store_measurements_count`	gauge
Number of errors encountered reading from the storage engine	`metric_store_read_errors_total`	counter
Time spent retrieving tag values from the storage engine	`metric_store_tag_values_query_duration_seconds`	gauge
Time spent retrieving measurement names from the storage engine	`metric_store_measurement_names_query_duration_seconds`	gauge

Metric Store Remote Metrics

Reports	Metric	Type
Size of a replayer queue	`metric_store_replayer_disk_usage_bytes`	gauge
Number of errors encountered writing to a replayer queue	`metric_store_replayer_queue_errors_total`	counter
Number of bytes written to a replayer queue	`metric_store_replayer_queued_bytes_total`	counter
Number of errors encountered reading from a replayer queue	`metric_store_replayer_read_errors_total`	counter
Number of errors encountered replaying writes to a remote node	`metric_store_replayer_replay_errors_total`	counter
Number of bytes successfully replayed to a remote node	`metric_store_replayer_replayed_bytes_total`	counter
Number of points dropped while writing to a remote node	`metric_store_dropped_points_total`	counter
Number of points successfully distributed to a remote node	`metric_store_distributed_points_total`	counter
Time spent distributing points to a remote node	`metric_store_distributed_request_duration_seconds`	gauge
Number of points collected by a metric-store instance from remote nodes	`metric_store_collected_points_total`	counter