Healthwatch Metrics

This topic describes the metrics that the Healthwatch Exporter for VMware Tanzu^® Application Service™ (TAS for VMs) tile and the Healthwatch Exporter for VMware Tanzu^® Kubernetes Grid™ Integrated Edition (TKGI) generate.

Overview of Healthwatch Metrics

Healthwatch Exporter for TAS for VMs and Healthwatch Exporter for TKGI deploy metric exporter VMs to generate component metrics and service level indicators (SLIs) related to the health of your TAS for VMs and TKGI deployments:

BOSH SLIs
Platform Metrics
Healthwatch Component Metrics

Each metric exporter VM exposes these metrics and SLIs on a Prometheus exposition endpoint, /metrics.

The Prometheus instance that exists within your metrics monitoring system then scrapes each /metrics endpoints on the metric exporter VMs and imports those metrics into your monitoring system. You can configure the frequency at which the Prometheus instance scrapes the /metrics endpoints in the Prometheus pane of the Healthwatch for VMware Tanzu tile. To configure the scrape interval for the Prometheus instance, see Configure Prometheus in Configuring Healthwatch.

The name of each metric is in PromQL format. For more information, see the Prometheus documentation.

BOSH SLIs

In a VMware Tanzu^® Operations Manager™ (Ops Manager) foundation, the BOSH Director manages the VMs that each tile deploys. If the BOSH Director fails or is not responsive, the VMs that the BOSH Director manages also fail.

Healthwatch Exporter for TAS for VMs and Healthwatch Exporter for TKGI deploy two VMs that continuously test the functionality of the BOSH Director: the BOSH health metric exporter VM and the BOSH deployment metric exporter:

BOSH Health Metric Exporter VM
BOSH Deployment Metric Exporter VM

BOSH Health Metric Exporter VM

The BOSH health metric exporter VM, bosh-health-exporter, creates a BOSH deployment called bosh-health every ten minutes. This BOSH deployment deploys another VM, bosh-health-check, that runs a suite of SLI tests to validate the functionality of the BOSH Director. After the SLI tests are complete, the BOSH health metric exporter VM collects the metrics from the bosh-health-check VM, then deletes the bosh-health deployment and the bosh-health-check VM.

The following table describes each metric the BOSH health metric exporter VM generates:

Metric	Description
`bosh_sli_duration_seconds_bucket{exported_job=“bosh-health-exporter”}`	The number of seconds the BOSH health SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of BOSH health SLI test suite duration metrics.
`bosh_sli_duration_seconds_count{exported_job=“bosh-health-exporter”}`	The total number of duration metrics across all BOSH health SLI test suite duration metric buckets.
`bosh_sli_duration_seconds_sum{exported_job=“bosh-health-exporter”}`	The total value of the duration metrics across all BOSH health SLI test suite duration metric buckets.
`bosh_sli_exporter_status{exported_job=“bosh-health-exporter”}`	The health status of the BOSH health metric exporter VM. A value of `1` indicates that the BOSH health metric exporter VM is running and healthy.
`bosh_sli_failures_total{exported_job=“bosh-health-exporter”}`	The total number of times the BOSH health SLI test suite fails. A failed test suite is one in which any number of tests within the test suite fail.
`bosh_sli_run_duration_seconds{exported_job=“bosh-health-exporter”}`	The number of seconds a single BOSH health SLI test suite takes to run.
`bosh_sli_runs_total{exported_job=“bosh-health-exporter”}`	The total number of times the BOSH health SLI test suite runs. To see the failure rate of `bosh_sli_runs_total{exported_job=“bosh-health-exporter”}`, divide the value of `bosh_sli_failures_total{exported_job=“bosh-health-exporter”}` by the value of `bosh_sli_runs_total{exported_job=“bosh-health-exporter”}`.
`bosh_sli_task_duration_seconds_bucket{exported_job=“bosh-health-exporter”}`	The number of seconds it takes a task within the BOSH health SLI test suite to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics.
`bosh_sli_task_duration_seconds_count{exported_job=“bosh-health-exporter”}`	The total number of duration metrics across all task duration metric buckets.
`bosh_sli_task_duration_seconds_sum{exported_job=“bosh-health-exporter”}`	The total value of the duration metrics across all task duration metric buckets.
`bosh_sli_task_run_duration_seconds{exported_job=“bosh-health-exporter”,task=“delete”}`	The number of seconds it takes the `bosh delete-deployment` command test to run.
`bosh_sli_task_run_duration_seconds{exported_job=“bosh-health-exporter”,task=“deploy”}`	The number of seconds it takes the `bosh deploy` command test to run.
`bosh_sli_task_run_duration_seconds{exported_job=“bosh-health-exporter”,task=“deployments”}`	The number of seconds it takes the `bosh deployments` command test to run.
`bosh_sli_task_runs_total{exported_job=“bosh-health-exporter”}`	The total number of times a task runs. To see the failure rate of `bosh_sli_task_runs_total{exported_job=“bosh-health-exporter”}`, divide the value of `bosh_sli_task_failures{exported_job=“bosh-health-exporter”}` by the value of `bosh_sli_task_runs{exported_job=“bosh-health-exporter”}`.
`bosh_sli_task_failures_total{exported_job=“bosh-health-exporter”,task=“delete”}`	The total number of times the `bosh delete-deployment` command fails.
`bosh_sli_task_failures_total{exported_job=“bosh-health-exporter”,task=“deploy”}`	The total number of times the `bosh deploy` command fails.
`bosh_sli_task_failures_total{exported_job=“bosh-health-exporter”,task=“deployments”}`	The total number of times the `bosh deployments` command fails.

BOSH Deployment Metric Exporter VM

The BOSH deployment metric exporter VM, bosh-deployments-exporter, checks every 30 seconds whether any BOSH deployments other than the bosh-health deployment created by the BOSH health metric exporter VM are running.

The following table describes each metric the BOSH deployment metric exporter VM generates:

Metric	Description
`bosh_deployments_status`	Whether any BOSH deployments other than `bosh-health` are running. A value of `0` indicates that no other BOSH deployments are running on the BOSH Director. A value of `1` indicates that other BOSH deployments are running on the BOSH Director.
`bosh_sli_duration_seconds_bucket{exported_job=“bosh-deployments-exporter”}`	The number of seconds the BOSH deployment check takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of BOSH deployment check duration metrics.
`bosh_sli_duration_seconds_count{exported_job=“bosh-deployments-exporter”}`	The total number of duration metrics across all BOSH deployment check duration metric buckets.
`bosh_sli_duration_seconds_sum{exported_job=“bosh-deployments-exporter”}`	The total value of the duration metrics across all BOSH deployment check duration metric buckets.
`bosh_sli_exporter_status{exported_job=“bosh-deployments-exporter”}`	The health status of the BOSH deployment metric exporter VM. A value of `1` indicates that the BOSH deployment metric exporter VM is running and healthy.
`bosh_sli_failures_total{exported_job=“bosh-deployments-exporter”}`	The total number of times the BOSH deployment check fails.
`bosh_sli_run_duration_seconds{exported_job=“bosh-deployments-exporter”}`	The number of seconds a single BOSH deployment check takes to run.
`bosh_sli_runs_total{exported_job=“bosh-deployments-exporter”}`	The total number of times the BOSH deployment check runs. To see the failure rate of `bosh_sli_runs_total{exported_job=“bosh-deployments-exporter”}`, divide the value of `bosh_sli_failures_total{exported_job=“bosh-deployments-exporter”}` by the value of `bosh_sli_runs_total{exported_job=“bosh-deployments-exporter”}`.
`bosh_sli_task_duration_seconds_bucket{exported_job=“bosh-deployments-exporter”}`	The number of seconds it takes a task within the BOSH deployment check to run, grouped how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics.
`bosh_sli_task_duration_seconds_count{exported_job=“bosh-deployments-exporter”}`	The total number of duration metrics across all task duration metric buckets.
`bosh_sli_task_duration_seconds_sum{exported_job=“bosh-deployments-exporter”}`	The total value of the duration metrics across all task duration metric buckets.
`bosh_sli_task_run_duration_seconds{exported_job=“bosh-deployments-exporter”,task=“tasks”}`	The number of seconds it takes the `bosh tasks` command test to run.
`bosh_sli_task_runs_total{exported_job=“bosh-deployments-exporter”}`	The total number of times a task runs. To see the failure rate of `bosh_sli_task_runs_total{exported_job=“bosh-deployments-exporter”}`, divide the value of `bosh_sli_task_failures_total{exported_job=“bosh-deployments-exporter”}` by the value of `bosh_sli_task_runs_total{exported_job=“bosh-deployments-exporter”}`.
`bosh_sli_task_failures_total{exported_job=“bosh-deployments-exporter”,task=“tasks”}`	The total number of times the `bosh tasks` command fails.

Platform Metrics

Healthwatch Exporter for TAS for VMs and Healthwatch Exporter for TKGI deploy VMs that generate metrics regarding the health of several Ops Manager and runtime components.

You can use the following Platform Metrics metrics to calculate percent availability and error budgets:

TAS for VMs SLI Exporter VM
TKGI SLI Exporter VM
Certificate Expiration Metric Exporter VM
Prometheus VM
SVM Forwarder VM - Platform Metrics

TAS for VMs SLI Exporter VM

Developers create and manage apps on TAS for VMs using the Cloud Foundry Command Line Interface (cf CLI). Healthwatch Exporter for TAS for VMs deploys the TAS for VMs SLI exporter VM, pas-sli-exporter, which continuously tests the functionality of the cf CLI.

The following table describes each metric the TAS for VMs SLI exporter VM generates:

Metric	Description
`tas_sli_duration_seconds_bucket`	The number of seconds the TAS for VMs SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of TAS for VMs SLI test suite duration metrics.
`tas_sli_duration_seconds_count`	The total number of duration metrics across all TAS for VMs SLI test suite duration metric buckets.
`tas_sli_duration_seconds_sum`	The total value of the duration metrics across all TAS for VMs SLI test suite duration metric buckets.
`tas_sli_exporter_status`	The health status of the TAS for VMs SLI exporter VM. A value of `1` indicates that the TAS for VMs SLI exporter VM is running and healthy.
`tas_sli_failures_total`	The total number of times the TAS for VMs SLI test suite fails.
`tas_sli_run_duration_seconds`	The number of seconds the TAS for VMs SLI test suite takes to run.
`tas_sli_runs_total`	The total number of times the TAS for VMs SLI test suite runs. To see the failure rate of `tas_sli_runs_total`, divide the value of `tas_sli_failures_total` by the value of `tas_sli_runs_total`.
`tas_sli_task_duration_seconds_bucket`	The number of seconds it takes a task within the TAS for VMs SLI test suite to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of task duration metrics.
`tas_sli_task_duration_seconds_count`	The total number of duration metrics across all task duration metric buckets.
`tas_sli_task_duration_seconds_sum`	The total value of the duration metrics across all task duration metric buckets.
`tas_sli_task_run_duration_seconds{task=“delete”}`	The number of seconds it takes the `cf delete` command test to run.
`tas_sli_task_run_duration_seconds{task=“login”}`	The number of seconds it takes the `cf login` command test to run.
`tas_sli_task_run_duration_seconds{task=“logs”}`	The number of seconds it takes the `cf logs` command test to run.
`tas_sli_task_run_duration_seconds{task=“push”}`	The number of seconds it takes the `cf push` command test to run.
`tas_sli_task_run_duration_seconds{task=“setEnv”}`	The number of seconds it takes the `cf set-env` command test to run.
`tas_sli_task_run_duration_seconds{task=“start”}`	The number of seconds it takes the `cf start` command test to run.
`tas_sli_task_run_duration_seconds{task=“stop”}`	The number of seconds it takes the `cf stop` command test to run.
`tas_sli_task_runs_total`	The total number of times a task runs. To see the failure rate of `tas_sli_task_runs_total`, divide the value of `tas_sli_task_failures` by the value of `tas_sli_task_runs`.
`tas_sli_task_failures_total{task=“delete”}`	The total number of times the `cf delete` command fails.
`tas_sli_task_failures_total{task=“login”}`	The total number of times the `cf login` command fails.
`tas_sli_task_failures_total{task=“logs”}`	The total number of times the `cf logs` command fails.
`tas_sli_task_failures_total{task=“push”}`	The total number of times the `cf push` command fails.
`tas_sli_task_failures_total{task=“setEnv”}`	The total number of times the `cf set-env` command fails.
`tas_sli_task_failures_total{task=“start”}`	The total number of times the `cf start` command fails.
`tas_sli_task_failures_total{task=“stop”}`	The total number of times the `cf stop` command fails.

TKGI SLI Exporter VM

Operators create and manage Kubernetes clusters using the TKGI Command Line Interface (TKGI CLI). Healthwatch Exporter for TKGI deploys the TKGI SLI exporter VM, pks-sli-exporter, which continuously tests the functionality of the TKGI CLI.

The following table describes each metric the TKGI SLI exporter VM generates:

Metric	Description
`tkgi_sli_duration_seconds_bucket`	The number of seconds the TKGI SLI test suite takes to run, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of TKGI SLI test suite duration metrics.
`tkgi_sli_duration_seconds_count`	The total number of duration metrics across all TKGI SLI test suite duration metric buckets.
`tkgi_sli_duration_seconds_sum`	The total value of the duration metrics across all TKGI SLI test suite duration metric buckets.
`tkgi_sli_exporter_status`	The health status of the TKGI SLI exporter VM. A value of `1` indicates that the TKGI SLI exporter VM is running and healthy.
`tkgi_sli_failures_total`	The total number of times the TKGI SLI test suite fails.
`tkgi_sli_run_duration_seconds`	The number of seconds the TKGI SLI test suite takes to run.
`tkgi_sli_runs_total`	The total number of times the TKGI SLI test suite runs. To see the failure rate of `tkgi_sli_runs_total`, divide the value of `tkgi_sli_failures_total` by the value of `tkgi_sli_runs_total`.
`tkgi_sli_task_duration_seconds_bucket`	The number of seconds it takes a task with the TKGI SLI test suite to run, grouped by duration. This metric is also called a bucket of task duration metrics.
`tkgi_sli_task_duration_seconds_count`	The total number of duration metrics across all task duration metric buckets.
`tkgi_sli_task_duration_seconds_sum`	The total value of the duration metrics across all task duration metric buckets.
`tkgi_sli_task_run_duration_seconds{task=“clusters”}`	The number of seconds it takes the `tkgi clusters` command test to run.
`tkgi_sli_task_run_duration_seconds{task=“get-credentials”}`	The number of seconds it takes the `tkgi get-credentials` command test to run.
`tkgi_sli_task_run_duration_seconds{task=“login”}`	The number of seconds it takes the `tkgi login` command test to run.
`tkgi_sli_task_run_duration_seconds{task=“plans”}`	The number of seconds it takes the `tkgi plans` command test to run.
`tkgi_sli_task_runs_total`	The total number of times a task runs. To see the failure rate of `tkgi_sli_task_runs_total`, divide the value of `tkgi_sli_task_failures` by the value of `tkgi_sli_task_runs`.
`tkgi_sli_task_failures_total{task=“clusters”}`	The total number of times the `tkgi clusters` command fails.
`tkgi_sli_task_failures_total{task=“get-credentials”}`	The total number of times the `tkgi get-credentials` command fails.
`tkgi_sli_task_failures_total{task=“login”}`	The total number of times the `tkgi login` command fails.
`tkgi_sli_task_failures_total{task=“plans”}`	The total number of times the `tkgi plans` command fails.

Certificate Expiration Metric Exporter VM

Healthwatch Exporter for TAS for VMs and Healthwatch Exporter for TKGI deploy the certificate expiration metric exporter VM, cert-expiration-exporter, which collects metrics that show when Ops Manager certificates are due to expire. For more information, see Monitoring Certificate Expiration.

The following table describes the metric the certificate expiration metric exporter VM generates:

Metric	Description
`ssl_certificate_expiry_seconds{exported_instance=~“CERTIFICATE”}`	The time in seconds until a certificate expires, where `CERTIFICATE` is the name of the certificate.

Prometheus VM

In the Canary URLs pane of the Healthwatch tile, you configure target URLs to which the Blackbox Exporters in the Prometheus instance sends canary tests. Testing a canary target URL allows you to gauge the overall health and accessibility of an app, runtime, or deployment.

On the Prometheus VM, tsdb, the Blackbox Exporter job, blackbox-exporter, generates canary test metrics.

The following table describes each metric the Blackbox Exporters in the Prometheus instance generates:

Metric	Description
`probe_dns_additional_rrs`	The number of entries in the additional resource record list of the DNS server for the canary target URL.
`probe_dns_answer_rrs`	The number of entries in the answer resource record list of the DNS server for the canary target URL.
`probe_dns_authority_rrs`	The number of entries in the authority resource record list of the DNS server for the canary target URL.
`probe_dns_duration_seconds`	The duration of the canary test DNS request by phase.
`probe_dns_lookup_time_seconds`	The number of seconds the canary test DNS lookup takes to complete.
`probe_dns_serial`	The serial number of the DNS zone for your canary target URL.
`probe_duration_seconds`	The number of seconds the canary test takes to complete.
`probe_failed_due_to_regex`	Whether the canary test failed due to a regex error in the canary test configuration. A value of `0` indicates that the canary test did not fail due to a regex error. A value of `1` indicates that the canary test did fail due to a regex error.
`probe_http_content_length`	The length of the HTTP content response from the canary target URL.
`probe_http_duration_seconds`	The duration of the canary test HTTP request by phase, summed over all redirects.
`probe_http_last_modified_timestamp_seconds`	The last-modified timestamp for the HTTP response header in Unix time.
`probe_http_redirects`	The number of redirects the canary test goes through to reach the canary target URL.
`probe_http_ssl`	Whether the canary test used TLS for the final redirect. A value of `0` indicates that the canary test did not use TLS for the final redirect. A value of `1` indicates that the canary test did use TLS for the final redirect.
`probe_http_status_code`	The status code of the HTTP response from the canary target URL.
`probe_http_uncompressed_body_length`	The length of the uncompressed response body.
`probe_http_version`	The version of HTTP the canary test HTTP response uses.
`probe_icmp_duration_seconds`	The duration of the canary test ICMP request by phase.
`probe_icmp_reply_hop_limit`	If the canary test protocol is IPv6: The replied packet hop limit. If the canary test protocol is IPv4: The time-to-live count.
`probe_ip_addr_hash`	The hash of the IP address of the canary target URL.
`probe_ip_protocol`	Whether the IP protocol of the canary test is IPv4 or IPv6.
`probe_ssl_earliest_cert_expiry`	The earliest TLS certificate expiration for the canary test URL in Unix time.
`probe_ssl_last_chain_expiry_timestamp_seconds`	The last TLS chain expiration for the canary test URL in Unix time.
`probe_ssl_last_chain_info`	Information about the TLS leaf certificate for the canary test URL.
`probe_success`	Whether the canary test succeeded or failed. A value of `0` indicates that the canary test failed. A value of `1` indicates that the canary test succeeded.
`probe_tls_version_info`	The TLS version the canary test uses, or `NaN` when unknown.
`bosh_deployments_status`	Whether any BOSH deployments other than `bosh-health` are running. A value of `0` indicates that no other BOSH deployments are running on the BOSH Director. A value of `1` indicates that other BOSH deployments are running on the BOSH Director.

SVM Forwarder VM - Platform Metrics

Super value metrics (SVMs) are composite metrics that the Prometheus instance in Healthwatch v2.2 generates in a similar format to the super metrics used in Pivotal Healthwatch v1.8 and earlier. The SVM Forwarder VM, svm-forwarder, then sends these metrics back into the Loggregator Firehose so third-party nozzles can send them to external destinations, such as a remote server or external aggregation service.

The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to the Loggregator Firehose. For more information about SVMs related to Healthwatch component metrics, see SVM Forwarder VM - Healthwatch Component Metrics below.

The following table describes each platform metric the SVM Forwarder VM sends to the Loggregator Firehose:

Metric	Description
`Diego_AppsDomainSynced`	Whether Cloud Controller and Diego are in sync. A value of `0` indicates that Cloud Controller and Diego are not in sync. A value of `1` indicates that Cloud Controller and Diego are in sync.
`Diego_AvailableFreeChunksDisk`	The available free chunks of disk across all Diego Cells.
`Diego_AvailableFreeChunks`	The available free chunks of memory across all Diego Cells.
`Diego_LRPsAdded_1H`	The rate of change in running app instances over a one-hour period.
`Diego_TotalAvailableDiskCapacity_5M`	The remaining Diego Cell disk available across all Diego Cells over a five-minute period.
`Diego_TotalAvailableMemoryCapacity_5M`	The remaining Diego Cell memory available across all Diego Cells over a five-minute period.
`Diego_TotalPercentageAvailableContainerCapacity_5M`	The percentage of total available container capacity across all Diego Cells over a five-minute period.
`Diego_TotalPercentageAvailableDiskCapacity_5M`	The percentage of total available disk across all Diego Cells over a five-minute period.
`Diego_TotalPercentageAvailableMemoryCapacity_5M`	The percentage of total available memory across all Diego Cells over a five-minute period.
`Doppler_MessagesAverage_1M`	The average Doppler message rate over a one-minute period.
`Firehose_LossRate_1H`	The log transport loss rate over a one-hour period.
`Firehose_LossRate_1M`	The log transport loss rate over a one-minute period.
`SyslogAgent_LossRate_1M`	The Syslog Agent loss rate over a one-minute period.
`SyslogDrain_RLP_LossRate_1M`	The Reverse Log Proxy loss rate over a one-minute period.
`bosh_deployment`	Represents `bosh_deployments_status` from the BOSH deployment metric exporter VM, which indicates whether any BOSH deployments other than the one created by the BOSH health metric exporter VM are running. A value of `0` indicates that no other BOSH deployments are running on the BOSH Director. A value of `1` indicates that other BOSH deployments are running on the BOSH Director.
`health_check_bosh_director_success`	Whether the BOSH SLI test suite that the BOSH health metric exporter VM ran succeeded or failed. A value of `0` indicates that the BOSH SLI test suite failed. A value of `1` indicates that the BOSH SLI test suite succeeded.
`health_check_CanaryApp_available`	Whether the canary app is available. A value of `0` indicates that the canary app is unavailable. A value of `1` indicates that the canary app is available.
`health_check_CanaryApp_responseTime`	The response time of the canary app in seconds.
`health_check_cliCommand_delete`	Whether the `cf delete` command succeeds or fails. A value of `0` indicates that the `cf delete` command failed. A value of `1` indicates that the `cf delete` command succeeded.
`health_check_cliCommand_login`	Whether the `cf login` command succeeds or fails. A value of `0` indicates that the `cf login` command failed. A value of `1` indicates that the `cf login` command succeeded.
`health_check_cliCommand_logs`	Whether the `cf logs` command succeeds or fails. A value of `0` indicates that the `cf logs` command failed. A value of `1` indicates that the `cf logs` command succeeded.
`health_check_cliCommand_probe_count`	The number of cf CLI health checks that Healthwatch completes in the measured time period.
`health_check_cliCommand_pushTime`	The amount of time it takes the cf CLI to push an app.
`health_check_cliCommand_push`	Whether the `cf push` command succeeds or fails. A value of `0` indicates that the `cf push` command failed. A value of `1` indicates that the `cf push` command succeeded.
`health_check_cliCommand_start`	Whether the `cf start` command succeeds or fails. A value of `0` indicates that the `cf start` command failed. A value of `1` indicates that the `cf start` command succeeded.
`health_check_cliCommand_stop`	Whether the `cf stop` command succeeds or fails. A value of `0` indicates that the `cf stop` command failed. A value of `1` indicates that the `cf stop` command succeeded.
`health_check_cliCommand_success`	The overall success of the SLI tests that Healthwatch runs on the cf CLI.
`uaa_throughput_rate`	The lifetime number of requests completed by the UAA VM, emitted per UAA instance in TAS for VMs. This number includes health checks.

Healthwatch Component Metrics

The following metrics exist for the purpose of monitoring the Healthwatch components:

TKGI Metric Exporter VM
Healthwatch Exporter for TAS for VMs Metric Exporter VMs
Prometheus Exposition Endpoint
SVM Forwarder VM - Healthwatch Component Metrics

TKGI Metric Exporter VM

Healthwatch Exporter for TKGI deploys a TKGI metric exporter VM, pks-exporter, that collects BOSH system metrics for TKGI and converts them to a Prometheus exposition format.

The following table describes each metric the TKGI metric exporter VM collects and converts:

Metric	Description
`healthwatch_boshExporter_ingressLatency_seconds_bucket`	The number of seconds the TKGI metric exporter VM takes to process a batch of Loggregator envelopes, grouped by latency. This metric is also called a bucket of ingress latency metrics.
`healthwatch_boshExporter_ingressLatency_seconds_count`	The total number of metrics across all ingress latency metric buckets.
`healthwatch_boshExporter_ingressLatency_seconds_sum`	The total value of the metrics across all ingress latency metric buckets.
`healthwatch_boshExporter_ingress_envelopes`	The number of Loggregator envelopes the observability metrics agent on the TKGI metric exporter VM receives.
`healthwatch_boshExporter_metricConversion_seconds_bucket`	The number of seconds the TKGI metric exporter VM takes to convert a BOSH metric to a Prometheus gauge, grouped by how many ran in less than a certain amount of time. This metric is also called a bucket of gauge conversion duration metrics.
`healthwatch_boshExporter_metricConversion_seconds_count`	The total number of metrics across all gauge conversion duration metric buckets.
`healthwatch_boshExporter_metricConversion_seconds_sum`	The total value of the metrics across all gauge conversion duration metric buckets.
`healthwatch_boshExporter_status`	The health status of the TKGI metric exporter VM. A value of `0` indicates that the TKGI metric exporter VM is not responding. A value of `1` indicates that the TKGI metric exporter VM is running and healthy.

Healthwatch Exporter for TAS for VMs Metric Exporter VMs

Healthwatch Exporter for TAS for VMs deploys metric exporter VMs that collect metrics from the Loggregator Firehose and convert them into a Prometheus exposition format.

Each of the following metric exporter VMs collects and converts a single metric type from the Loggregator Firehose. The names of the metric exporter VMs correspond to the types of metrics they collect and convert:

Counter Metric Exporter VM
Gauge Metric Exporter VM

Counter Metric Exporter VM

The counter metric exporter VM, pas-exporter-counter, collects counter metrics from the Loggregator Firehose and converts them into a Prometheus exposition format.

The following table describes each metric the counter metric exporter VM collects and converts:

Metric	Description
`healthwatch_pasExporter_counterConversion_seconds`	The number of seconds the counter metric exporter VM takes to convert a Loggregator counter envelope to a Prometheus counter.
`healthwatch_pasExporter_ingressLatency_seconds`	The number of seconds the counter metric exporter VM takes to process a batch of Loggregator counter envelopes.
`healthwatch_pasExporter_ingress_envelopes`	The number of Loggregator counter envelopes the observability metrics agent on the counter metric exporter VM receives.
`healthwatch_pasExporter_status`	The health status of the counter metric exporter VM. A value of `0` indicates that the counter metric exporter VM is not responding. A value of `1` indicates that the counter metric exporter VM is running and healthy.

Gauge Metric Exporter VM

The gauge metric exporter VM, pas-exporter-gauge, collects gauge metrics from the Loggregator Firehose and converts them into a Prometheus exposition format.

The following table describes each metric the gauge metric exporter VM collects and converts:

Metric	Description
`healthwatch_pasExporter_gaugeConversion_seconds`	The number of seconds the gauge metric exporter VM takes to convert a Loggregator gauge envelope to a Prometheus gauge.
`healthwatch_pasExporter_ingressLatency_seconds`	The number of seconds the gauge metric exporter VM takes to process a batch of Loggregator gauge envelopes.
`healthwatch_pasExporter_ingress_envelopes`	The number of Loggregator gauge envelopes the observability metrics agent on the gauge metric exporter VM receives.
`healthwatch_pasExporter_status`	The health status of the gauge metric exporter VM. A value of `0` indicates that the gauge metric exporter VM is not responding. A value of `1` indicates that the gauge metric exporter VM is running and healthy.

Prometheus Exposition Endpoint

Most of the metric exporter VMs generate metrics concerning how the Prometheus instance interacts with the /metrics endpoint on each metric exporter VM.

The following table describes each metric the /metrics endpoint on each metric exporter VM generates:

Metric	Description
`healthwatch_prometheusExpositionLatency_seconds`	The number of seconds the metric exporter VM takes to render a Prometheus scrape page.
`healthwatch_prometheusExposition_histogramMapConversion`	The number of seconds the metric exporter VM takes to convert histogram collection to a map.
`healthwatch_prometheusExposition_metricMapConversion`	The number of seconds the metric exporter VM takes to convert metrics collection to a map.
`healthwatch_prometheusExposition_metricSorting`	The number of seconds the metric exporter VM takes to sort metrics when rendering a Prometheus scrape page.

SVM Forwarder VM - Healthwatch Component Metrics

SVMs are composite metrics that the Prometheus instance in Healthwatch v2.2 generates in a similar format to the super metrics used in Pivotal Healthwatch v1.8 and earlier. The SVM Forwarder VM, svm-forwarder, then sends these metrics back into the Loggregator Firehose so third-party nozzles can send them to external destinations, such as a remote server or external aggregation service.

The SVM Forwarder VM sends SVMs related to platform metrics and Healthwatch component metrics to the Loggregator Firehose. For more information about SVMs related to platform metrics, see SVM Forwarder VM - Platform Metrics above.

The following table describes each Healthwatch component metric the SVM Forwarder VM sends to the Loggregator Firehose:

Metric	Description
`failed_scrapes_total`	The total number of failed scrapes for the target `source_id`.
`last_total_attempted_scrapes`	The total number of attempted scrapes during the most recent round of scraping.
`last_total_failed_scrapes`	The total number of failed scrapes during the most recent round of scraping.
`last_total_scrape_duration`	The time in milliseconds to scrape all targets during the most recent round of scraping.
`scrape_targets_total`	The total number of scrape targets identified from the configuration file for the Prometheus VM.