This topic describes the metrics that the Healthwatch Exporter for VMware Tanzu® Application Service™ (TAS for VMs) tile and the Healthwatch Exporter for VMware Tanzu® Kubernetes Grid™ Integrated Edition (TKGI) generate.
Healthwatch Exporter for TAS for VMs and Healthwatch Exporter for TKGI deploy metric exporter VMs to generate component metrics and service level indicators (SLIs) related to the health of your TAS for VMs and TKGI deployments:
Each metric exporter VM exposes these metrics and SLIs on a Prometheus exposition endpoint, /metrics
.
The Prometheus instance that exists within your metrics monitoring system then scrapes each /metrics
endpoints on the metric exporter VMs and imports those metrics into your monitoring system. You can configure the frequency at which the Prometheus instance scrapes the /metrics
endpoints in the Prometheus pane of the Healthwatch for VMware Tanzu tile. To configure the scrape interval for the Prometheus instance, see Configure Prometheus in Configuring Healthwatch.
The name of each metric is in PromQL format. For more information, see the Prometheus documentation.
In a VMware Tanzu® Operations Manager™ (Ops Manager) foundation, the BOSH Director manages the VMs that each tile deploys. If the BOSH Director fails or is not responsive, the VMs that the BOSH Director manages also fail.
Healthwatch Exporter for TAS for VMs and Healthwatch Exporter for TKGI deploy two VMs that continuously test the functionality of the BOSH Director: the BOSH health metric exporter VM and the BOSH deployment metric exporter:
The BOSH health metric exporter VM, bosh-health-exporter
, creates a BOSH deployment called bosh-health
every ten minutes. This BOSH deployment deploys another VM, bosh-health-check
, that runs a suite of SLI tests to validate the functionality of the BOSH Director. After the SLI tests are complete, the BOSH health metric exporter VM collects the metrics from the bosh-health-check
VM, then deletes the bosh-health
deployment and the bosh-health-check
VM.
The following table describes each metric the BOSH health metric exporter VM generates:
Metric | Description |
---|---|
bosh_sli_duration_seconds_bucket{exported_job=“bosh-health-exporter”} |
The number of seconds the BOSH health SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of BOSH health SLI test suite duration metrics. |
bosh_sli_duration_seconds_count{exported_job=“bosh-health-exporter”} |
The total number of duration metrics across all BOSH health SLI test suite duration metric buckets. |
bosh_sli_duration_seconds_sum{exported_job=“bosh-health-exporter”} |
The total value of the duration metrics across all BOSH health SLI test suite duration metric buckets. |
bosh_sli_exporter_status{exported_job=“bosh-health-exporter”} |
The health status of the BOSH health metric exporter VM. A value of 1 indicates that the BOSH health metric exporter VM is running and healthy. |
bosh_sli_failures_total{exported_job=“bosh-health-exporter”} |
The total number of times the BOSH health SLI test suite fails. A failed test suite is one in which any number of tests within the test suite fail. |
bosh_sli_run_duration_seconds{exported_job=“bosh-health-exporter”} |
The number of seconds a single BOSH health SLI test suite takes to run. |
bosh_sli_runs_total{exported_job=“bosh-health-exporter”} |
The total number of times the BOSH health SLI test suite runs. To see the failure rate of bosh_sli_runs_total{exported_job=“bosh-health-exporter”} , divide the value of bosh_sli_failures_total{exported_job=“bosh-health-exporter”} by the value of bosh_sli_runs_total{exported_job=“bosh-health-exporter”} . |
bosh_sli_task_duration_seconds_bucket{exported_job=“bosh-health-exporter”} |
The number of seconds it takes a task within the BOSH health SLI test suite to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics. |
bosh_sli_task_duration_seconds_count{exported_job=“bosh-health-exporter”} |
The total number of duration metrics across all task duration metric buckets. |
bosh_sli_task_duration_seconds_sum{exported_job=“bosh-health-exporter”} |
The total value of the duration metrics across all task duration metric buckets. |
bosh_sli_task_run_duration_seconds{exported_job=“bosh-health-exporter”,task=“delete”} |
The number of seconds it takes the bosh delete-deployment command test to run. |
bosh_sli_task_run_duration_seconds{exported_job=“bosh-health-exporter”,task=“deploy”} |
The number of seconds it takes the bosh deploy command test to run. |
bosh_sli_task_run_duration_seconds{exported_job=“bosh-health-exporter”,task=“deployments”} |
The number of seconds it takes the bosh deployments command test to run. |
bosh_sli_task_runs_total{exported_job=“bosh-health-exporter”} |
The total number of times a task runs. To see the failure rate of bosh_sli_task_runs_total{exported_job=“bosh-health-exporter”} , divide the value of bosh_sli_task_failures{exported_job=“bosh-health-exporter”} by the value of bosh_sli_task_runs{exported_job=“bosh-health-exporter”} . |
bosh_sli_task_failures_total{exported_job=“bosh-health-exporter”,task=“delete”} |
The total number of times the bosh delete-deployment command fails. |
bosh_sli_task_failures_total{exported_job=“bosh-health-exporter”,task=“deploy”} |
The total number of times the bosh deploy command fails. |
bosh_sli_task_failures_total{exported_job=“bosh-health-exporter”,task=“deployments”} |
The total number of times the bosh deployments command fails. |
The BOSH deployment metric exporter VM, bosh-deployments-exporter
, checks every 30 seconds whether any BOSH deployments other than the bosh-health
deployment created by the BOSH health metric exporter VM are running.
The following table describes each metric the BOSH deployment metric exporter VM generates:
Metric | Description |
---|---|
bosh_deployments_status |
Whether any BOSH deployments other than bosh-health are running. A value of 0 indicates that no other BOSH deployments are running on the BOSH Director. A value of 1 indicates that other BOSH deployments are running on the BOSH Director. |
bosh_sli_duration_seconds_bucket{exported_job=“bosh-deployments-exporter”} |
The number of seconds the BOSH deployment check takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of BOSH deployment check duration metrics. |
bosh_sli_duration_seconds_count{exported_job=“bosh-deployments-exporter”} |
The total number of duration metrics across all BOSH deployment check duration metric buckets. |
bosh_sli_duration_seconds_sum{exported_job=“bosh-deployments-exporter”} |
The total value of the duration metrics across all BOSH deployment check duration metric buckets. |
bosh_sli_exporter_status{exported_job=“bosh-deployments-exporter”} |
The health status of the BOSH deployment metric exporter VM. A value of 1 indicates that the BOSH deployment metric exporter VM is running and healthy. |
bosh_sli_failures_total{exported_job=“bosh-deployments-exporter”} |
The total number of times the BOSH deployment check fails. |
bosh_sli_run_duration_seconds{exported_job=“bosh-deployments-exporter”} |
The number of seconds a single BOSH deployment check takes to run. |
bosh_sli_runs_total{exported_job=“bosh-deployments-exporter”} |
The total number of times the BOSH deployment check runs. To see the failure rate of bosh_sli_runs_total{exported_job=“bosh-deployments-exporter”} , divide the value of bosh_sli_failures_total{exported_job=“bosh-deployments-exporter”} by the value of bosh_sli_runs_total{exported_job=“bosh-deployments-exporter”} . |
bosh_sli_task_duration_seconds_bucket{exported_job=“bosh-deployments-exporter”} |
The number of seconds it takes a task within the BOSH deployment check to run, grouped how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics. |
bosh_sli_task_duration_seconds_count{exported_job=“bosh-deployments-exporter”} |
The total number of duration metrics across all task duration metric buckets. |
bosh_sli_task_duration_seconds_sum{exported_job=“bosh-deployments-exporter”} |
The total value of the duration metrics across all task duration metric buckets. |
bosh_sli_task_run_duration_seconds{exported_job=“bosh-deployments-exporter”,task=“tasks”} |
The number of seconds it takes the bosh tasks command test to run. |
bosh_sli_task_runs_total{exported_job=“bosh-deployments-exporter”} |
The total number of times a task runs. To see the failure rate of bosh_sli_task_runs_total{exported_job=“bosh-deployments-exporter”} , divide the value of bosh_sli_task_failures_total{exported_job=“bosh-deployments-exporter”} by the value of bosh_sli_task_runs_total{exported_job=“bosh-deployments-exporter”} . |
bosh_sli_task_failures_total{exported_job=“bosh-deployments-exporter”,task=“tasks”} |
The total number of times the bosh tasks command fails. |
Healthwatch Exporter for TAS for VMs and Healthwatch Exporter for TKGI deploy VMs that generate metrics regarding the health of several Ops Manager and runtime components.
You can use the following Platform Metrics metrics to calculate percent availability and error budgets:
Developers create and manage apps on TAS for VMs using the Cloud Foundry Command Line Interface (cf CLI). Healthwatch Exporter for TAS for VMs deploys the TAS for VMs SLI exporter VM, pas-sli-exporter
, which continuously tests the functionality of the cf CLI.
The following table describes each metric the TAS for VMs SLI exporter VM generates:
Metric | Description |
---|---|
tas_sli_duration_seconds_bucket |
The number of seconds the TAS for VMs SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of TAS for VMs SLI test suite duration metrics. |
tas_sli_duration_seconds_count |
The total number of duration metrics across all TAS for VMs SLI test suite duration metric buckets. |
tas_sli_duration_seconds_sum |
The total value of the duration metrics across all TAS for VMs SLI test suite duration metric buckets. |
tas_sli_exporter_status |
The health status of the TAS for VMs SLI exporter VM. A value of 1 indicates that the TAS for VMs SLI exporter VM is running and healthy. |
tas_sli_failures_total |
The total number of times the TAS for VMs SLI test suite fails. |
tas_sli_run_duration_seconds |
The number of seconds the TAS for VMs SLI test suite takes to run. |
tas_sli_runs_total |
The total number of times the TAS for VMs SLI test suite runs. To see the failure rate of tas_sli_runs_total , divide the value of tas_sli_failures_total by the value of tas_sli_runs_total . |
tas_sli_task_duration_seconds_bucket |
The number of seconds it takes a task within the TAS for VMs SLI test suite to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics. |
tas_sli_task_duration_seconds_count |
The total number of duration metrics across all task duration metric buckets. |
tas_sli_task_duration_seconds_sum |
The total value of the duration metrics across all task duration metric buckets. |
tas_sli_task_run_duration_seconds{task=“delete”} |
The number of seconds it takes the cf delete command test to run. |
tas_sli_task_run_duration_seconds{task=“login”} |
The number of seconds it takes the cf login command test to run. |
tas_sli_task_run_duration_seconds{task=“logs”} |
The number of seconds it takes the cf logs command test to run. |
tas_sli_task_run_duration_seconds{task=“push”} |
The number of seconds it takes the cf push command test to run. |
tas_sli_task_run_duration_seconds{task=“setEnv”} |
The number of seconds it takes the cf set-env command test to run. |
tas_sli_task_run_duration_seconds{task=“start”} |
The number of seconds it takes the cf start command test to run. |
tas_sli_task_run_duration_seconds{task=“stop”} |
The number of seconds it takes the cf stop command test to run. |
tas_sli_task_runs_total |
The total number of times a task runs. To see the failure rate of tas_sli_task_runs_total , divide the value of tas_sli_task_failures by the value of tas_sli_task_runs . |
tas_sli_task_failures_total{task=“delete”} |
The total number of times the cf delete command fails. |
tas_sli_task_failures_total{task=“login”} |
The total number of times the cf login command fails. |
tas_sli_task_failures_total{task=“logs”} |
The total number of times the cf logs command fails. |
tas_sli_task_failures_total{task=“push”} |
The total number of times the cf push command fails. |
tas_sli_task_failures_total{task=“setEnv”} |
The total number of times the cf set-env command fails. |
tas_sli_task_failures_total{task=“start”} |
The total number of times the cf start command fails. |
tas_sli_task_failures_total{task=“stop”} |
The total number of times the cf stop command fails. |
Operators create and manage Kubernetes clusters using the TKGI Command Line Interface (TKGI CLI). Healthwatch Exporter for TKGI deploys the TKGI SLI exporter VM, pks-sli-exporter
, which continuously tests the functionality of the TKGI CLI.
The following table describes each metric the TKGI SLI exporter VM generates:
Metric | Description |
---|---|
tkgi_sli_duration_seconds_bucket |
The number of seconds the TKGI SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of TKGI SLI test suite duration metrics. |
tkgi_sli_duration_seconds_count |
The total number of duration metrics across all TKGI SLI test suite duration metric buckets. |
tkgi_sli_duration_seconds_sum |
The total value of the duration metrics across all TKGI SLI test suite duration metric buckets. |
tkgi_sli_exporter_status |
The health status of the TKGI SLI exporter VM. A value of 1 indicates that the TKGI SLI exporter VM is running and healthy. |
tkgi_sli_failures_total |
The total number of times the TKGI SLI test suite fails. |
tkgi_sli_run_duration_seconds |
The number of seconds the TKGI SLI test suite takes to run. |
tkgi_sli_runs_total |
The total number of times the TKGI SLI test suite runs. To see the failure rate of tkgi_sli_runs_total , divide the value of tkgi_sli_failures_total by the value of tkgi_sli_runs_total . |
tkgi_sli_task_duration_seconds_bucket |
The number of seconds it takes a task with the TKGI SLI test suite to run, grouped by duration. This metric is also called a bucket of task duration metrics. |
tkgi_sli_task_duration_seconds_count |
The total number of duration metrics across all task duration metric buckets. |
tkgi_sli_task_duration_seconds_sum |
The total value of the duration metrics across all task duration metric buckets. |
tkgi_sli_task_run_duration_seconds{task=“clusters”} |
The number of seconds it takes the tkgi clusters command test to run. |
tkgi_sli_task_run_duration_seconds{task=“get-credentials”} |
The number of seconds it takes the tkgi get-credentials command test to run. |
tkgi_sli_task_run_duration_seconds{task=“login”} |
The number of seconds it takes the tkgi login command test to run. |
tkgi_sli_task_run_duration_seconds{task=“plans”} |
The number of seconds it takes the tkgi plans command test to run. |
tkgi_sli_task_runs_total |
The total number of times a task runs. To see the failure rate of tkgi_sli_task_runs_total , divide the value of tkgi_sli_task_failures by the value of tkgi_sli_task_runs . |
tkgi_sli_task_failures_total{task=“clusters”} |
The total number of times the tkgi clusters command fails. |
tkgi_sli_task_failures_total{task=“get-credentials”} |
The total number of times the tkgi get-credentials command fails. |
tkgi_sli_task_failures_total{task=“login”} |
The total number of times the tkgi login command fails. |
tkgi_sli_task_failures_total{task=“plans”} |
The total number of times the tkgi plans command fails. |
Healthwatch Exporter for TAS for VMs and Healthwatch Exporter for TKGI deploy the certificate expiration metric exporter VM, cert-expiration-exporter
, which collects metrics that show when Ops Manager certificates are due to expire. For more information, see Monitoring Certificate Expiration.
The following table describes the metric the certificate expiration metric exporter VM generates:
Metric | Description |
---|---|
ssl_certificate_expiry_seconds{exported_instance=~“CERTIFICATE”} |
The time in seconds until a certificate expires, where CERTIFICATE is the name of the certificate. |
In the Canary URLs pane of the Healthwatch tile, you configure target URLs to which the Blackbox Exporters in the Prometheus instance sends canary tests. Testing a canary target URL allows you to gauge the overall health and accessibility of an app, runtime, or deployment.
On the Prometheus VM, tsdb
, the Blackbox Exporter job, blackbox-exporter
, generates canary test metrics.
The following table describes each metric the Blackbox Exporters in the Prometheus instance generates:
Metric | Description |
---|---|
probe_dns_additional_rrs |
The number of entries in the additional resource record list of the DNS server for the canary target URL. |
probe_dns_answer_rrs |
The number of entries in the answer resource record list of the DNS server for the canary target URL. |
probe_dns_authority_rrs |
The number of entries in the authority resource record list of the DNS server for the canary target URL. |
probe_dns_duration_seconds |
The duration of the canary test DNS request by phase. |
probe_dns_lookup_time_seconds |
The number of seconds the canary test DNS lookup takes to complete. |
probe_dns_serial |
The serial number of the DNS zone for your canary target URL. |
probe_duration_seconds |
The number of seconds the canary test takes to complete. |
probe_failed_due_to_regex |
Whether the canary test failed due to a regex error in the canary test configuration. A value of 0 indicates that the canary test did not fail due to a regex error. A value of 1 indicates that the canary test did fail due to a regex error. |
probe_http_content_length |
The length of the HTTP content response from the canary target URL. |
probe_http_duration_seconds |
The duration of the canary test HTTP request by phase, summed over all redirects. |
probe_http_last_modified_timestamp_seconds |
The last-modified timestamp for the HTTP response header in Unix time. |
probe_http_redirects |
The number of redirects the canary test goes through to reach the canary target URL. |
probe_http_ssl |
Whether the canary test used TLS for the final redirect. A value of 0 indicates that the canary test did not use TLS for the final redirect. A value of 1 indicates that the canary test did use TLS for the final redirect. |
probe_http_status_code |
The status code of the HTTP response from the canary target URL. |
probe_http_uncompressed_body_length |
The length of the uncompressed response body. |
probe_http_version |
The version of HTTP the canary test HTTP response uses. |
probe_icmp_duration_seconds |
The duration of the canary test ICMP request by phase. |
probe_icmp_reply_hop_limit |
If the canary test protocol is IPv6: The replied packet hop limit. If the canary test protocol is IPv4: The time-to-live count. |
probe_ip_addr_hash |
The hash of the IP address of the canary target URL. |
probe_ip_protocol |
Whether the IP protocol of the canary test is IPv4 or IPv6. |
probe_ssl_earliest_cert_expiry |
The earliest TLS certificate expiration for the canary test URL in Unix time. |
probe_ssl_last_chain_expiry_timestamp_seconds |
The last TLS chain expiration for the canary test URL in Unix time. |
probe_ssl_last_chain_info |
Information about the TLS leaf certificate for the canary test URL. |
probe_success |
Whether the canary test succeeded or failed. A value of 0 indicates that the canary test failed. A value of 1 indicates that the canary test succeeded. |
probe_tls_version_info |
The TLS version the canary test uses, or NaN when unknown. |
bosh_deployments_status |
Whether any BOSH deployments other than bosh-health are running. A value of 0 indicates that no other BOSH deployments are running on the BOSH Director. A value of 1 indicates that other BOSH deployments are running on the BOSH Director. |
Super value metrics (SVMs) are composite metrics that the Prometheus instance in Healthwatch v2.2 generates in a similar format to the super metrics used in Pivotal Healthwatch v1.8 and earlier. The SVM Forwarder VM, svm-forwarder
, then sends these metrics back into the Loggregator Firehose so third-party nozzles can send them to external destinations, such as a remote server or external aggregation service.
The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to the Loggregator Firehose. For more information about SVMs related to Healthwatch component metrics, see SVM Forwarder VM - Healthwatch Component Metrics below.
The following table describes each platform metric the SVM Forwarder VM sends to the Loggregator Firehose:
Metric | Description |
---|---|
Diego_AppsDomainSynced |
Whether Cloud Controller and Diego are in sync. A value of 0 indicates that Cloud Controller and Diego are not in sync. A value of 1 indicates that Cloud Controller and Diego are in sync. |
Diego_AvailableFreeChunksDisk |
The available free chunks of disk across all Diego Cells. |
Diego_AvailableFreeChunks |
The available free chunks of memory across all Diego Cells. |
Diego_LRPsAdded_1H |
The rate of change in running app instances over a one-hour period. |
Diego_TotalAvailableDiskCapacity_5M |
The remaining Diego Cell disk available across all Diego Cells over a five-minute period. |
Diego_TotalAvailableMemoryCapacity_5M |
The remaining Diego Cell memory available across all Diego Cells over a five-minute period. |
Diego_TotalPercentageAvailableContainerCapacity_5M |
The percentage of total available container capacity across all Diego Cells over a five-minute period. |
Diego_TotalPercentageAvailableDiskCapacity_5M |
The percentage of total available disk across all Diego Cells over a five-minute period. |
Diego_TotalPercentageAvailableMemoryCapacity_5M |
The percentage of total available memory across all Diego Cells over a five-minute period. |
Doppler_MessagesAverage_1M |
The average Doppler message rate over a one-minute period. |
Firehose_LossRate_1H |
The log transport loss rate over a one-hour period. |
Firehose_LossRate_1M |
The log transport loss rate over a one-minute period. |
SyslogAgent_LossRate_1M |
The Syslog Agent loss rate over a one-minute period. |
SyslogDrain_RLP_LossRate_1M |
The Reverse Log Proxy loss rate over a one-minute period. |
bosh_deployment |
Represents bosh_deployments_status from the BOSH deployment metric exporter VM, which indicates whether any BOSH deployments other than the one created by the BOSH health metric exporter VM are running. A value of 0 indicates that no other BOSH deployments are running on the BOSH Director. A value of 1 indicates that other BOSH deployments are running on the BOSH Director. |
health_check_bosh_director_success |
Whether the BOSH SLI test suite that the BOSH health metric exporter VM ran succeeded or failed. A value of 0 indicates that the BOSH SLI test suite failed. A value of 1 indicates that the BOSH SLI test suite succeeded. |
health_check_CanaryApp_available |
Whether the canary app is available. A value of 0 indicates that the canary app is unavailable. A value of 1 indicates that the canary app is available. |
health_check_CanaryApp_responseTime |
The response time of the canary app in seconds. |
health_check_cliCommand_delete |
Whether the cf delete command succeeds or fails. A value of 0 indicates that the cf delete command failed. A value of 1 indicates that the cf delete command succeeded. |
health_check_cliCommand_login |
Whether the cf login command succeeds or fails. A value of 0 indicates that the cf login command failed. A value of 1 indicates that the cf login command succeeded. |
health_check_cliCommand_logs |
Whether the cf logs command succeeds or fails. A value of 0 indicates that the cf logs command failed. A value of 1 indicates that the cf logs command succeeded. |
health_check_cliCommand_probe_count |
The number of cf CLI health checks that Healthwatch completes in the measured time period. |
health_check_cliCommand_pushTime |
The amount of time it takes the cf CLI to push an app. |
health_check_cliCommand_push |
Whether the cf push command succeeds or fails. A value of 0 indicates that the cf push command failed. A value of 1 indicates that the cf push command succeeded. |
health_check_cliCommand_start |
Whether the cf start command succeeds or fails. A value of 0 indicates that the cf start command failed. A value of 1 indicates that the cf start command succeeded. |
health_check_cliCommand_stop |
Whether the cf stop command succeeds or fails. A value of 0 indicates that the cf stop command failed. A value of 1 indicates that the cf stop command succeeded. |
health_check_cliCommand_success |
The overall success of the SLI tests that Healthwatch runs on the cf CLI. |
uaa_throughput_rate |
The lifetime number of requests completed by the UAA VM, emitted per UAA instance in TAS for VMs. This number includes health checks. |
The following metrics exist for the purpose of monitoring the Healthwatch components:
Healthwatch Exporter for TKGI deploys a TKGI metric exporter VM, pks-exporter
, that collects BOSH system metrics for TKGI and converts them to a Prometheus exposition format.
The following table describes each metric the TKGI metric exporter VM collects and converts:
Metric | Description |
---|---|
healthwatch_boshExporter_ingressLatency_seconds_bucket |
The number of seconds the TKGI metric exporter VM takes to process a batch of Loggregator envelopes, grouped by latency. This metric is also called a bucket of ingress latency metrics. |
healthwatch_boshExporter_ingressLatency_seconds_count |
The total number of metrics across all ingress latency metric buckets. |
healthwatch_boshExporter_ingressLatency_seconds_sum |
The total value of the metrics across all ingress latency metric buckets. |
healthwatch_boshExporter_ingress_envelopes |
The number of Loggregator envelopes the observability metrics agent on the TKGI metric exporter VM receives. |
healthwatch_boshExporter_metricConversion_seconds_bucket |
The number of seconds the TKGI metric exporter VM takes to convert a BOSH metric to a Prometheus gauge, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of gauge conversion duration metrics. |
healthwatch_boshExporter_metricConversion_seconds_count |
The total number of metrics across all gauge conversion duration metric buckets. |
healthwatch_boshExporter_metricConversion_seconds_sum |
The total value of the metrics across all gauge conversion duration metric buckets. |
healthwatch_boshExporter_status |
The health status of the TKGI metric exporter VM. A value of 0 indicates that the TKGI metric exporter VM is not responding. A value of 1 indicates that the TKGI metric exporter VM is running and healthy. |
Healthwatch Exporter for TAS for VMs deploys metric exporter VMs that collect metrics from the Loggregator Firehose and convert them into a Prometheus exposition format.
Each of the following metric exporter VMs collects and converts a single metric type from the Loggregator Firehose. The names of the metric exporter VMs correspond to the types of metrics they collect and convert:
The counter metric exporter VM, pas-exporter-counter
, collects counter metrics from the Loggregator Firehose and converts them into a Prometheus exposition format.
The following table describes each metric the counter metric exporter VM collects and converts:
Metric | Description |
---|---|
healthwatch_pasExporter_counterConversion_seconds |
The number of seconds the counter metric exporter VM takes to convert a Loggregator counter envelope to a Prometheus counter. |
healthwatch_pasExporter_ingressLatency_seconds |
The number of seconds the counter metric exporter VM takes to process a batch of Loggregator counter envelopes. |
healthwatch_pasExporter_ingress_envelopes |
The number of Loggregator counter envelopes the observability metrics agent on the counter metric exporter VM receives. |
healthwatch_pasExporter_status |
The health status of the counter metric exporter VM. A value of 0 indicates that the counter metric exporter VM is not responding. A value of 1 indicates that the counter metric exporter VM is running and healthy. |
The gauge metric exporter VM, pas-exporter-gauge
, collects gauge metrics from the Loggregator Firehose and converts them into a Prometheus exposition format.
The following table describes each metric the gauge metric exporter VM collects and converts:
Metric | Description |
---|---|
healthwatch_pasExporter_gaugeConversion_seconds |
The number of seconds the gauge metric exporter VM takes to convert a Loggregator gauge envelope to a Prometheus gauge. |
healthwatch_pasExporter_ingressLatency_seconds |
The number of seconds the gauge metric exporter VM takes to process a batch of Loggregator gauge envelopes. |
healthwatch_pasExporter_ingress_envelopes |
The number of Loggregator gauge envelopes the observability metrics agent on the gauge metric exporter VM receives. |
healthwatch_pasExporter_status |
The health status of the gauge metric exporter VM. A value of 0 indicates that the gauge metric exporter VM is not responding. A value of 1 indicates that the gauge metric exporter VM is running and healthy. |
Most of the metric exporter VMs generate metrics concerning how the Prometheus instance interacts with the /metrics
endpoint on each metric exporter VM.
The following table describes each metric the /metrics
endpoint on each metric exporter VM generates:
Metric | Description |
---|---|
healthwatch_prometheusExpositionLatency_seconds |
The number of seconds the metric exporter VM takes to render a Prometheus scrape page. |
healthwatch_prometheusExposition_histogramMapConversion |
The number of seconds the metric exporter VM takes to convert histogram collection to a map. |
healthwatch_prometheusExposition_metricMapConversion |
The number of seconds the metric exporter VM takes to convert metrics collection to a map. |
healthwatch_prometheusExposition_metricSorting |
The number of seconds the metric exporter VM takes to sort metrics when rendering a Prometheus scrape page. |
SVMs are composite metrics that the Prometheus instance in Healthwatch v2.2 generates in a similar format to the super metrics used in Pivotal Healthwatch v1.8 and earlier. The SVM Forwarder VM, svm-forwarder
, then sends these metrics back into the Loggregator Firehose so third-party nozzles can send them to external destinations, such as a remote server or external aggregation service.
The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to the Loggregator Firehose. For more information about SVMs related to platform metrics, see SVM Forwarder VM - Platform Metrics above.
The following table describes each Healthwatch component metric the SVM Forwarder VM sends to the Loggregator Firehose:
Metric | Description |
---|---|
failed_scrapes_total |
The total number of failed scrapes for the target source_id . |
last_total_attempted_scrapes |
The total number of attempted scrapes during the most recent round of scraping. |
last_total_failed_scrapes |
The total number of failed scrapes during the most recent round of scraping. |
last_total_scrape_duration |
The time in milliseconds to scrape all targets during the most recent round of scraping. |
scrape_targets_total |
The total number of scrape targets identified from the configuration file for the Prometheus VM. |