This topic describes how to monitor Log Store and the metrics that Log Store publishes.

Overview

You can use the metrics that Log Store publishes to observe the health of Log Store and determine if its VMs and disks are appropriately scaled.

You can scale Log Store vertically. When disk resources reach complete consumption, Log Store begins dropping the oldest data first. When memory or CPU resources reach complete consumption, you should scale Log Store up.

The dropped_envelopes_count metric is a key capacity scaling indicator you can use to guide your scaling decisions for Log Store. When the dropped_envelopes_count metric increases at a steady rate, you may need to scale the size of your VMs up. For more information about this metric, see Nozzle Job Metrics below.

Metrics Emitted by Log Store

To understand the metrics that Log Store publishes, see the following sections:

Nozzle Job Metrics

In Loggregator, nozzles receive log envelopes from the Reverse Log Proxy (RLP). For more information, see [Loggregator Architecture] (https://docs.pivotal.io/application-service/2-11/loggregator/architecture.html) in the VMware Tanzu Application Service for VMs (TAS for VMs) documentation.

The following table describes the nozzle-related metrics that Log Store publishes:

Metric Type Description Notes
ingress_count counter The number of logs a nozzle has received since the nozzle was deployed. When this metric increases, the nozzle is consuming metrics from the RLP as expected.
dropped_envelopes_count counter The number of envelopes a nozzle drops. If this number is increasing at a steady rate, it may indicate that the app is sending more logs than the nozzle can accommodate. To solve this issue, you may need scale the size of your VMs up.
ingress_byte_count counter The number of bytes a nozzle has received since the nozzle was deployed.
egress_failure_count counter The number of logs a nozzle has failed to send since the nozzle was deployed.
egress_count counter The number of logs a nozzle has sent since the nozzle was deployed.
queue_error_count counter The number of errors that have occurred while a nozzle has attempted to write to the handoff queue.
queued_log_count counter The number of logs a nozzle has queued for hinted handoff.
replay_error_count counter The number of errors that have occurred while a nozzle has attempted to replay logs.
replayed_log_count counter The number of logs from handoff replays a nozzle has made successfully.
queue_disk_usage gauge The amount of disk the handoff queue for a nozzle uses.
ingress_connection_count gauge The number of incoming TCP connections to a nozzle.
egress_connection_count gauge The number of outgoing TCP connections from a nozzle.

CF Auth Proxy Job Metrics

The following table describes the CF Auth Proxy-related metrics that Log Store publishes:

Metric Type Description Notes
metric_store_auth_proxy_request_duration_seconds histogram The duration in seconds of requests made to the CF Auth Proxy.
metric_store_auth_proxy_capi_request_duration_seconds histogram The duration in seconds of external requests CF Auth Proxy has made to the Cloud Controller API (CAPI).
http_request_seconds summary The summary of seconds CF Auth Proxy has spent making HTTP requests.
http_request_count counter The number of completed HTTP requests the CF Auth Proxy has made.

Log Store Job Metrics

The following table describes the Log Store-related metrics that Log Store publishes:

Metric Type Description Notes
retention_seconds gauge The age in seconds of the oldest stored log in Log Store.
ingress_count counter The number of logs Log Store has received from Loggregator.
ingress_byte_count counter The number of bytes Log Store has received from Loggregator.
prune_count counter The number of prunes Log Store has performed. This count increases by one when disk usage reaches 80% of the Log Store prune threshold.
expired_shards counter The number of expired shards in prunes Log Store has performed. The retention period for shards is 42 days by default. You can configure the retention period to last a maximum of 183 days by editing the Log Store retention period field in the App Metrics Component Config pane.
stored_bytes_count gauge The number of bytes Log Store is using on disk.
ingress_connection_count gauge The number of incoming TCP connections to Log Store.
shard_count gauge The number of shards Log Store has stored in a partition.
http_request_count counter The number of completed HTTP requests Log Store has made.
stored_log_count counter The number of logs Log Store has stored to disk since Log Store was deployed.
stored_log_failure_count counter The number of times Log Store has failed to store logs to a partition since Log Store was deployed.
http_request_seconds summary The summary of seconds Log Store has spent making HTTP requests.

Router Job Metrics

The following table describes the router-related metrics that Log Store publishes:

Metric Type Description Notes
ingress_count counter The number of logs the router has received since the router was deployed.
ingress_byte_count counter The number of bytes the router has received from Loggregator.
egress_failure_count counter The number of logs the router has failed to send since the router was deployed.
egress_count counter The number of logs the router has sent since the router was deployed.
queue_error_count counter The number of errors that have occurred while the router has attempted to write to the handoff queue.
queued_log_count counter The number of logs the router has queued for hinted handoff.
replay_error_count counter The number of errors that have occurred while the router has attempted to replay logs.
replayed_log_count counter The number of logs from handoff replays the router has made successfully.
queue_disk_usage gauge The amount of disk the handoff queue for the router uses.
ingress_connection_count gauge The number of incoming TCP connections to the router.
egress_connection_count gauge The number of outgoing TCP connections from the router.

check-circle-line exclamation-circle-line close-line
Scroll to top icon