Learn how to monitor Metric Store. It includes key scaling indicators (KSIs) to guide Metric Store scaling decisions.
You can scale Metric Store vertically. When disk resources are reaching complete consumption, Metric Store starts dropping the oldest data first. When memory or CPU resources are reaching complete consumption, then scale Metric Store vertically.
Metric Store publishes metrics for monitoring the Metric Store itself. You can use these metrics to observe the health of the Metric Store and verify that the VMs and disks are appropriately scaled.
The nozzle is the process that ingresses envelopes from Loggregator’s Reverse Log Proxy (RLP).
Reports | Metric | Type | Notes |
---|---|---|---|
Duration in seconds of requests made to the auth proxy | metric_store_auth_proxy_request_duration_seconds |
gauge | |
Duration in seconds of external requests made to CAPI | metric_store_auth_proxy_capi_request_duration_seconds |
gauge |
Reports | Metric | Type | Notes |
---|---|---|---|
Number of points ingressed to co-located Metric Store | metric_store_ingress_points_total |
counter | This should be steadily increasing at a relatively consistent rate |
Number of points successfully written to storage engine | metric_store_written_points_total |
counter | This should be steadily increasing at a relatively consistent rate |
Time spent writing points to the storage engine | metric_store_write_duration_seconds |
gauge | |
Percentage of free space on persistent disk | metric_store_disk_free_ratio |
gauge | |
Number of shards removed because of time-based expiration | metric_store_expired_shards_total |
counter | |
Number of shards removed because of the disk space threshold | metric_store_pruned_shards_total |
counter | |
Number of points dropped because of deleting the shard that is in the pending state | metric_store_pending_deletion_dropped_points_total |
counter | |
metric_store_storage_days | Days of data stored on disk |
gauge | |
Size of the index | metric_store_index_size_bytes |
gauge | |
Number of unique series stored in the index | metric_store_series_count |
gauge | |
Number of unique measurements stored in the index | metric_store_measurements_count |
gauge | |
Number of errors encountered reading from the storage engine | metric_store_read_errors_total |
counter | |
Time spent retrieving tag values from the storage engine | metric_store_tag_values_query_duration_seconds |
gauge | |
Time spent retrieving measurement names from the storage engine | metric_store_measurement_names_query_duration_seconds |
gauge |
Reports | Metric | Type | Notes |
---|---|---|---|
Size of a replayer queue | metric_store_replayer_disk_usage_bytes |
gauge | |
Number of errors encountered writing to a replayer queue | metric_store_replayer_queue_errors_total |
counter | |
Number of bytes written to a replayer queue | metric_store_replayer_queued_bytes_total |
counter | |
Number of errors encountered reading from a replayer queue | metric_store_replayer_read_errors_total |
counter | |
Number of errors encountered replaying writes to a remote node | metric_store_replayer_replay_errors_total |
counter | |
Number of bytes successfully replayed to a remote node | metric_store_replayer_replayed_bytes_total |
counter | |
Number of points dropped while writing to a remote node | metric_store_dropped_points_total |
counter | |
Number of points successfully distributed to a remote node | metric_store_distributed_points_total |
counter | |
Time spent distributing points to a remote node | metric_store_distributed_request_duration_seconds |
gauge | |
Number of points collected by a metric-store instance from remote nodes | metric_store_collected_points_total |
counter |