This topic contains release notes for Healthwatch™ for VMware Tanzu® v2.2 (Healthwatch).
For information about the risks and limitations of Healthwatch v2.2, see Assumed Risks of Using Healthwatch v2.2 and Healthwatch v2.2 Limitations in Healthwatch for VMware Tanzu.
Release Date: October 12, 2023
[Breaking Change] Healthwatch now requires the use of system-metrics-agent processes to gather “system” metrics from BOSH deployed VMs. Please make sure to enable Enable System Metrics
in the Director Config
tab of the BOSH Director tile. Ensure that Apply Changes
is run on all tiles and that all VMs deployed via service brokers are upgraded prior to upgrading Healthwatch. The impact of not performing this action is that Healthwatch dashboards will fail to populate with metrics about VM health, cpu, memory, disk and other statistics. Alerts may also be affected. Any custom dashboards that query “system” metrics with an origin
label value of bosh-system-metrics-forwarder
will need to be updated to use system-metrics-agent
as the origin
label value.
[Feature] Added support of VMware Tanzu Kubernetes Grid Integrated Edition v1.16.
Healthwatch v2.2.9 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.46.0 |
Grafana | 9.5.9 |
Alertmanager | 0.26.0 |
PXC | 1.0.14 |
Release Date: August 17, 2023
[Feature] Added a Time Intervals
config in the Healthwatch tile in Alertmanager
pane that specifies a named interval of time that may be referenced in the routing tree to mute/activate particular routes for particular times of the day. For more information, see Prometheus Documentation.
[Bug Fix] Fixed user already exists
error occurs during UAA authentication after upgrading Healthwatch to a newer version. For more information, see User already exists error during UAA authentication after upgrading Healthwatch below.
Healthwatch v2.2.8 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.46.0 |
Grafana | 9.5.7 |
Alertmanager | 0.25.0 |
PXC | 1.0.14 |
Release Date: July 17, 2023
[Security Fix] The following CVEs were fixed by upgrading Grafana
release version: CVE-2023-28119, CVE-2023-1387.
[Security Fix] Upgraded golang
version to address multiple CVEs. For more information, see: CVE-2023-24537, CVE-2023-24534, CVE-2023-24538, CVE-2023-24536 in the National Vulnerability Database.
[Feature] In the Router dashboard in the Grafana UI, a new chart has been added using the system.cpu.sys
metric to better capture the full CPU load of the GoRouter VMs.
[Feature] In the Logging and Metrics Pipeline dashboard in the Grafana UI, a new chart has been added using the messages_dropped_per_drain
. This metric provides drain_url
and drain_scope
tags which allows for the possibility of further filtering.
[Feature] In the System at a Glance dashboard in the Grafana UI, now contains the bosh
(Bosh director
) VM in the VM Health panel.
[Feature] In the System at a Glance dashboard in the Grafana UI, when hovering over a series in the VM Health panel and then clicking Show job details
a user will be taken to the Job details page for that series.
[Feature] Added the option to log all access to the Grafana
VM.
[Feature] Added the option to enable/disable gravatar
.
[Feature] Added the option to decrease traffic by filtering out custom application metrics collected by the Metric Registrar
framework.
[Feature] The JVM maximum heap size is set to 80%
of available memory for the VMs of Healthwatch Exporter for TAS for VMs. This setting is not configurable on the Healthwatch tile.
[Feature Improvement] All dashboards in Grafana
now have a 1m
refresh interval by default.
[Known Issue] UAA authentication fails while trying to log in to Grafana after upgrading Healthwatch. For more information, see User already exists error during UAA authentication after upgrading Healthwatch below.
Healthwatch v2.2.7 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.45.0 |
Grafana | 9.5.5 |
Alertmanager | 0.25.0 |
PXC | 1.0.14 |
Release Date: April 13, 2023
[Security Fix] Updates SnakeYaml to v2.0 to address a critical CVE. For more information, see CVE 2022-1471 in the National Vulnerability Database.
[Security Fix] Updates Open JDK to v17.0.6_10 to address multiple CVEs. For more information, see: CVE-2022-21540, CVE-2022-21541, CVE-2022-21549, CVE-2022-34169 in the National Vulnerability Database.
[Security Fix] CF CLI v7 and v8 were updated to use the latest versions built with supported GOLANG v1.19.
Healthwatch v2.2.6 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.39.1 |
Grafana | 9.0.6 |
Alertmanager | 0.24.0 |
PXC | 1.0.3 |
Release Date: December 1, 2022
[Feature] Healthwatch supports VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) v1.13, v1.14.2, and v1.15 and later.
[Feature] Healthwatch supports VMware Tanzu Application Service for VMs (TAS for VMs) v2.11, v2.12, v2.13, and v3.0 and later.
[Feature] You can configure the Prometheus instance to use a bearer token to log in to a remote storage endpoint. For more information, see Prometheus Can Log In to Remote Storage with a Bearer Token below.
[Feature] When the SVM Forwarder VM and the BOSH deployment metric exporter VM are both deployed in the Healthwatch Exporter for VMware Tanzu Application Service for VMs (TAS for VMs) tile, the SVM Forwarder VM emits the bosh_deployments_status
metric into the Loggregator Firehose. For more information, see BOSH Deployments Status Emitted Into the Loggregator Firehose below.
[Breaking Change] Healthwatch requires Ubuntu Jammy Stemcell 1.49 or later. For more information, see Healthwatch Uses Ubuntu Jammy Stemcell 1.49 or Later below.
[Known Issue] Compiling BOSH Backup and Restore SDK (BBR-SDK) on Ubuntu Jammy stemcells can cause it to trigger a false positive in some McAfee malware scans. For more information, see BBR-SDK Can Trigger False Positives in Malware Scans below.
Healthwatch v2.2.5 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.39.1 |
Grafana | 9.0.6 |
Alertmanager | 0.24.0 |
PXC | 1.0.3 |
Release Date: September 21, 2022
[Feature] Healthwatch supports TKGI v1.12, v1.13, and v1.14.2 and later.
[Feature Improvement] In the Kubernetes Controller Manager dashboard in the Grafana user interface (UI), the Memory and CPU Usage panels are removed.
[Feature Improvement] In the TAS for VMs Metric Exporter VMs pane of the Healthwatch Exporter for TAS for VMs tile, you can configure the TAS for VMs service level indicator (SLI) exporter VM to run SLI tests for the Cloud Foundry Command-Line Interface (cf CLI) v8. To configure the cf CLI version for which the TAS for VMs SLI exporter VM runs SLI tests, see (Optional) Configure TAS for VMs Metric Exporter VMs in Configuring Healthwatch Exporter for TAS for VMs.
[Feature Improvement] The log levels for the Prometheus and Alertmanager instances are set to info
.
[Feature Improvement] The MySQL Overview dashboard in the Grafana UI includes the MySQL Jumpbox VM persistent disk usage chart.
[Feature Improvement] When you re-deploy a highly-available (HA) Healthwatch installation with multiple Grafana instances, the second Grafana instance does not start updating until after the first Grafana instance has updated and re-starts.
[Breaking Change] When you manually configure the route used to access the Grafana UI, you must configure the Grafana root URL field. For more information, see (Optional) Configure Grafana in Configuring Healthwatch.
[Bug Fix] You can deploy Healthwatch with a single Prometheus instance.
[Bug Fix] In the Grafana UI, the Diego/Capacity dashboard shows data for TAS for VMs [Small Footprint] deployments.
Healthwatch v2.2.4 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.36.2 |
Grafana | 9.0.6 |
Alertmanager | 0.24.0 |
PXC | 0.43.0 |
Release Date: August 18, 2022
[Feature Improvement] When you configure an alert receiver for Slack, you are not required to configure the Alert receiver configuration parameters field. The only fields you must configure are Alert receiver name and Slack API URL. For more information, see Configure a Slack Alert Receiver in Configuring Alerting.
[Feature Improvement] Email alerts that the Grafana instance sends include a link to the Grafana UI.
[Feature Improvement] In the Diego/Capacity dashboard in the Grafana UI, you can view metrics for Windows and Linux cells separately.
[Feature Improvement] In the System at a Glance dashboard in the Grafana UI, the VM Health panel no longer includes metrics for the bosh-health-check
VM. For more information about the bosh-health-check
VM, see BOSH Health Metric Exporter VM in Healthwatch Metrics.
[Feature Improvement] The BOSH Director Health dashboard in the Grafana UI includes the BOSH Health Check and BOSH Health Check Status History panels.
[Known Issue Fix] After you re-deploy an HA Healthwatch installation, the Grafana instance can load metrics data in the Grafana UI while multiple Prometheus instances update. This known issue fix improves upon the known issue fix in Healthwatch v2.2.2. For more information about this known issue, see No Data While Re-Deploying Highly-Available Healthwatch Installations below.
[Bug Fix] In the BOSH Director Health dashboard in the Grafana UI, the BOSH Director Status and BOSH Director Status History panels include the system_healthy
metric.
[Bug Fix] In the Kubernetes Nodes dashboard in the Grafana UI, the panel legend is more readable.
Healthwatch v2.2.3 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.36.2 |
Grafana | 9.0.6 |
Alertmanager | 0.24.0 |
PXC | 0.43.0 |
Release Date: May 18, 2022
[Feature] Healthwatch supports TAS for VMs v2.13 and earlier.
[Feature Improvement] The default alert templates are updated to better display grouped alerts.
[Feature Improvement] The Logging and Metrics Pipeline dashboard in the Grafana UI includes Syslog Agent metrics.
[Feature Improvement] The System At a Glance dashboard in the Grafana UI does not show metrics for compiler VMs.
[Known Issue Fix] The RabbitMQ dashboards in the Grafana UI show data for RabbitMQ on-demand instances that are configured to communicate over TLS. For more information about this known issue, see No Data on RabbitMQ Dashboards for RabbitMQ On-Demand Instances Using TLS below.
[Known Issue Fix] After you re-deploy an HA Healthwatch installation, the Grafana instance can load metrics data in the Grafana UI while multiple Prometheus instances update. For more information about this known issue, see No Data While Re-Deploying Highly-Available Healthwatch Installations below.
[Known Issue Fix] The Kubernetes Nodes dashboard in the Grafana UI shows data for Kubernetes clusters that use the containerd runtime. For more information about this known issue, see No Data for containerd Clusters on Kubernetes Nodes Dashboard for TKGI v1.12 and Later below.
[Known Issue Fix] The smoke test for Prometheus VMs no longer runs before the Prometheus VM is ready. For more information about this known issue, see Prometheus Smoke Test Fails as Healthwatch Re-Deploys below.
[Known Issue Fix] The Prometheus instance no longer fails to clean up the chunks_head
directory. For more information about this known issue, see Prometheus Clean-Up Failure Leads to Full Disk below.
[Bug Fix] The System at a Glance dashboard in the Grafana UI does not show duplicate Canary URL panels.
Healthwatch v2.2.2 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.35.0 |
Grafana | 8.5.2 |
Alertmanager | 0.24.0 |
PXC | 0.42.0 |
Release Date: February 28, 2022
[Feature] Healthwatch supports TKGI v1.13 and earlier.
[Feature Improvement] If you configured a generic OAuth provider to authenticate users who log in to the Grafana UI, you can configure a logout URL. For more information, see Grafana UI Logout URL below.
[Breaking Change] Healthwatch requires additional configuration in TKGI v1.13. For more information, see Healthwatch v2.2.1 Requires Additional Configuration in TKGI v1.13 below.
[Known Issue Fix] The SVM Forwarder VM does not create recursive labels. For more information about this known issue, see SVM Forwarder Creates Recursive Metric Labels below.
[Known Issue Fix] The TKGI SLI exporter VM cleans up the service accounts it creates while running the TKGI SLI test suite. For more information about this known issue, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts below.
[Known Issue Fix] The backup scripts for Prometheus VMs clean up the intermediary snapshots created by BOSH Backup and Restore (BBR). For more information about this known issue, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs below.
[Known Issue] The RabbitMQ dashboards in the Grafana UI show no data for RabbitMQ on-demand instances that are configured to communicate over TLS. For more information about this known issue, see No Data on RabbitMQ Dashboards for RabbitMQ On-Demand Instances Using TLS below.
[Known Issue] The Grafana instance cannot load metrics data in the Grafana UI while multiple Prometheus instances update after you re-deploy an HA Healthwatch installation. For more information about this known issue, see No Data While Re-Deploying Highly-Available Healthwatch Installations below.
[Known Issue] If you have TKGI v1.12 or later installed, the Kubernetes Nodes dashboard in the Grafana UI shows no data for Kubernetes clusters that use the containerd runtime. For more information about this known issue, see No Data for containerd Clusters on Kubernetes Nodes Dashboard for TKGI v1.12 and Later below.
[Known Issue] Under rare circumstances, the Prometheus instance fails to clean up the chunks_head
directory, leading to a full disk. For more information about this known issue, see Prometheus Clean-Up Failure Leads to Full Disk below.
Healthwatch v2.2.1 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.33.1 |
Grafana | 8.3.3 |
Alertmanager | 0.23.0 |
PXC | 0.40.0 |
Release Date: January 25, 2022
Note: Healthwatch v2.2.0 does not support TKGI v1.13. If you have TKGI v1.13 installed on your Ops Manager foundation, upgrade to Healthwatch v2.2.1.
[Security Fix] Apache Log4J dependencies are updated to v2.17.1 to address a critical CVE. For more information, see CVE 2021-44832 on the CVE website.
[Feature] In new installations of Healthwatch, the Routing rules field in the Alertmanager configuration pane is pre-configured with a default set of routing rules. For more information, see Default Routing Rules Are Pre-Configured for Alertmanager below.
[Feature] Healthwatch can automatically configure a route for the Grafana UI on Ops Manager foundations with TAS for VMs installed. For more information, see Automatic Grafana UI Route Configuration below.
[Feature] Healthwatch can automatically configure authentication with the User Account and Authentication (UAA) instances in TAS for VMs and TKGI for the Grafana UI. For more information, see Automatic UAA Authentication Configuration below.
[Feature] The Grafana UI includes the System at a Glance dashboard. For more information, see System at a Glance Dashboard in the Grafana UI below.
[Feature] The SVM Forwarder VM emits the probe_success
and probe_duration_seconds
metrics into the Loggregator Firehose. For more information, see Two Canary Test Metrics Emitted Into the Loggregator Firehose below.
[Feature] For Ops Manager v2.10.10 and later, the Prometheus instance scrapes BOSH Director metrics directly from the BOSH Director VM. For more information, see Prometheus Scrapes Metrics Directly from the BOSH Director VM below.
[Feature Improvement] Healthwatch uses Grafana v8, which requires a new open source license. For more information, see Healthwatch Requires New Open Source License for Grafana v8 below.
[Feature Improvement] Healthwatch automatically runs canary tests for the Ops Manager Installation Dashboard. For more information, see Healthwatch Automatically Runs Canary Tests for the Ops Manager Installation Dashboard below.
[Feature Improvement] Healthwatch automatically configures TKGI cluster discovery by default on Ops Manager foundations that have TKGI installed. For more information, see Automatic TKGI Cluster Discovery Configuration below.
[Feature Improvement] You can scale your Grafana, MySQL, and MySQL Proxy instances to 0
. For more information, see Remove Grafana below.
[Feature Improvement] Dashboards in the Grafana UI only show metrics for canary apps that are currently configured. For more information, see Grafana UI Dashboards Only Include Metrics for Current Canary Apps below.
[Feature Improvement] The timeouts for the TAS for VMs SLI test suite are increased to five minutes. For more information, see TAS for VMs SLI Test Timeouts Are Increased below.
[Breaking Change] If you use automated scripts to install and configure Healthwatch, you must update your scripts to reflect the new configuration requirements. For more information, see Update Automation Scripts below.
[Breaking Change] If you configured a UAA instance on a different Ops Manager foundation as the authentication method for logging in to the Grafana UI in Healthwatch v2.1, you must select Generic OAuth and configure the settings for the external UAA instance in the Grafana Authentication pane. For more information, see Authenticating with a UAA Instance on a Different Ops Manager Foundation below.
[Breaking Change] The timer metric exporter VM is removed from Healthwatch Exporter for TAS for VMs. For more information, see Timer Metric Exporter VM is Removed below.
[Known Issue] The SVM Forwarder VM creates recursive labels for certain metrics. For more information about this known issue, see SVM Forwarder Creates Recursive Metric Labels below.
[Known Issue] The TKGI SLI exporter VM does not clean up the service accounts it creates while running the TKGI SLI test suite. For more information about this known issue, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts below.
[Known Issue] The backup scripts for Prometheus VMs do not clean up the intermediary snapshots created by BOSH Backup and Restore (BBR). For more information about this known issue, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs below.
[Known Issue] The RabbitMQ dashboards in the Grafana UI show no data for RabbitMQ on-demand instances that are configured to communicate over TLS. For more information about this known issue, see No Data on RabbitMQ Dashboards for RabbitMQ On-Demand Instances Using TLS below.
[Known Issue] The Grafana instance cannot load metrics data in the Grafana UI while multiple Prometheus instances update after you re-deploy an HA Healthwatch installation. For more information about this known issue, see No Data While Re-Deploying Highly-Available Healthwatch Installations below.
[Known Issue] If you have TKGI v1.12 or later installed, the Kubernetes Nodes dashboard in the Grafana UI shows no data for Kubernetes clusters that use the containerd runtime. For more information about this known issue, see No Data for containerd Clusters on Kubernetes Nodes Dashboard for TKGI v1.12 and Later below.
[Bug Fix] The TAS for VMs SLI test for the cf login
command displays success instead of failure upon timeout.
[Bug Fix] The SyslogAgent_LossRate_1M
super value metric (SVM) is corrected to follow the recommended calculation for TAS for VMs. For more information, see the TAS for VMs documentation.
Healthwatch v2.2.0 uses the following open-source component versions:
Component | Packaged Version |
---|---|
Prometheus | 2.32.1 |
Grafana | 8.3.3 |
Alertmanager | 0.23.0 |
PXC | 0.40.0 |
To upgrade from Healthwatch v2.1 to Healthwatch v2.2, see Upgrading Healthwatch.
Healthwatch v2.2 includes the following major features:
In new installations of Healthwatch, the Routing rules field in the Alertmanager pane of the Healthwatch tile is pre-configured with a default set of routing rules. You can edit these routing rules according to the needs of your deployment.
For more information about configuring routing rules for Alertmanager, see Configure Alerting in Configuring Alerting.
Healthwatch uses Grafana v8, which requires the Affero General Public License (AGPL).
For more information about the AGPL, see GNU Affero General Public License on the GNU site. For more information about Grafana v8, see the Grafana documentation.
If your Ops Manager foundation has TAS for VMs installed, you can configure Healthwatch to automatically create a route for the Grafana UI in the Grafana pane of the Healthwatch tile.
For more information about configuring a route for the Grafana UI, see (Optional) Configure Grafana in Configuring Healthwatch.
When you select UAA as your Grafana UI authentication method in the Grafana Authentication pane of the Healthwatch tile, Healthwatch automatically configures authentication with the UAA instances in TAS for VMs and TKGI for the Grafana UI. If you want to configure authentication with a UAA instance on a different Ops Manager foundation, you must select Generic OAuth and configure it manually through the Grafana Authentication pane.
For more information about configuring UAA as your Grafana UI authentication method, see (Optional) Configure Grafana Authentication in Configuring Healthwatch.
If you configured a generic OAuth provider to authenticate users who log in to the Grafana UI, you can configure a logout URL in the Grafana Authentication pane of the Healthwatch tile.
For more information about configuring a logout URL for the Grafana UI, see Configure Generic OAuth Authentication in Configuring Grafana Authentication.
If you do not want to use any Grafana instances in your Healthwatch deployment, you can set the number of Grafana, MySQL, and MySQL Proxy instances for your Healthwatch deployment to 0
in the Resource Config pane of the Healthwatch tile.
For more information about removing Grafana from your Healthwatch deployment, see Removing Grafana in Healthwatch Components and Resource Requirements.
In the Remote Write pane of the Healthwatch tile, you can configure the Prometheus instance to use a bearer token to log in to a remote storage endpoint.
For more information about configuring the Prometheus instance to use a bearer token to log in to a remote storage endpoint, see (Optional) Configure Remote Write in Configuring Healthwatch.
Healthwatch automatically configures TKGI cluster discovery by default on Ops Manager foundations that have TKGI installed. If you do not want Healthwatch to configure TKGI cluster discovery, you can disallow it through the TKGI Cluster Discovery pane in the Healthwatch tile.
For more information about TKGI cluster discovery, see Configuring TKGI Cluster Discovery. For more information about allowing or disallowing TKGI cluster discovery, see Configure TKGI Cluster Discovery in Healthwatch in Configuring TKGI Cluster Discovery.
The Grafana UI includes the System at a Glance dashboard. This dashboard displays an overview of metrics related to the health of your Ops Manager foundation and the runtimes you have installed on that foundation.
For more information about the System at a Glance dashboard, see Default Dashboards in the Grafana UI in Using Healthwatch Dashboards in the Grafana UI.
For Ops Manager v2.10.10 and later, the Prometheus instance scrapes BOSH Director metrics directly from the BOSH Director VM instead of the Loggregator Firehose. This allows the Prometheus VM to gather more types of metrics related to the health of the BOSH Director. These metrics appear in the Director Health dashboard in the Grafana UI.
For more information about the BOSH Director metrics that the Prometheus instance scrapes, see BOSH SLIs in Healthwatch Metrics.
When the SVM Forwarder VM and the BOSH deployment metric exporter VM are both deployed in the Healthwatch Exporter for TAS for VMs tile, the SVM Forwarder VM emits the bosh_deployments_status
metric into the Loggregator Firehose.
For more information about the bosh_deployments_status
metric, see BOSH Deployment Metric Exporter VM in Healthwatch Metrics. For more information about the BOSH deployment metric exporter VM, see (Optional) Configure the BOSH Deployment Metric Exporter VM in Configuring Healthwatch Exporter for TAS for VMs. For more information about the SVM Forwarder VM, see (Optional) Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.
Healthwatch automatically runs canary tests for the Ops Manager Installation Dashboard.
For more information about canary test metrics, see Prometheus VM in Healthwatch Metrics.
Dashboards in the Grafana UI only show metrics for canary apps that are currently configured. Metrics for canary apps that are no longer used in your Healthwatch deployment are removed from your dashboards, in order to avoid mixing outdated data with current data.
For more information about canary test metrics, see Prometheus VM in Healthwatch Metrics.
The timeouts for the TAS for VMs SLI test suite are increased to five minutes. This reduces the number of false positives you may see in your metrics data.
For more information about canary test metrics, see TAS for VMs SLI Exporter VM in Healthwatch Metrics.
If you deploy the SVM Forwarder VM in the Healthwatch Exporter for TAS for VMs tile, the SVM Forwarder VM emits the probe_success
and probe_duration_seconds
canary test metrics into the Loggregator Firehose.
For more information about canary test metrics, see Prometheus VM in Healthwatch Metrics. For more information about the SVM Forwarder VM, see (Optional) Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.
Healthwatch v2.2 includes the following breaking changes:
To use Healthwatch v2.2.5 or later, you must upgrade its stemcell to Ubuntu Jammy Stemcell 1.49 or later. You can download Ubuntu Jammy Stemcell 1.49 and later from VMware Tanzu Network.
For more information about supported stemcells, see the Stemcells for VMware Tanzu documentation.
Many configuration options have been added, changed, or removed for Healthwatch v2.2. If you use automated scripts to install and configure Healthwatch, you must update your scripts to reflect the new configuration requirements.
For more information about installing and configuring Healthwatch through platform automation, see Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.
If you are upgrading from Healthwatch v2.1 and configured UAA as your authentication method for logging in to the Grafana UI, Healthwatch v2.2 keeps UAA as your configured authentication method by default. If you configured a UAA instance on a different Ops Manager foundation as the authentication method for logging in to the Grafana UI in Healthwatch v2.1, you must select Generic OAuth and configure the settings for the external UAA instance in the Grafana Authentication pane.
For more information about configuring a UAA instance on a different Ops Manager foundation as the authentication method for logging in to the Grafana UI, see Configuring Authentication with a UAA Instance on a Different Ops Manager Foundation.
The timer metric exporter VM, pas-exporter-timer
, is removed from Healthwatch Exporter for TAS for VMs. This removes unnecessary data and uses fewer IaaS resources.
For more information about the metrics for TAS for VMs that Healthwatch Exporter for TAS for VMs collects, see Healthwatch Exporter for TAS for VMs Metric Exporter VMs in Healthwatch Metrics.
After you install Healthwatch v2.2.1, you must configure TKGI v1.13 to send metrics for Kubernetes Controller Manager to Healthwatch.
For more information about configuring TKGI v1.13 to send metrics for Kubernetes Controller Manager to Healthwatch, see Configure TKGI in Configuring TKGI Cluster Discovery.
Healthwatch v2.2 includes the following known issues:
In Healthwatch v2.2.5, compiling BBR-SDK on Ubuntu Jammy stemcells can cause it to trigger a false positive in some McAfee malware scans. This false positive incorrectly identifies the releases/backup-and-restore-sdk-1.18.56-ubuntu-jammy-1.18-20221101-135320-248952853/compiled_packages/ database-backup-restorer-postgres-13/lib/earthdistance.so
file as infected.
However, when you compile BBR-SDK on Ubuntu Xenial stemcells, it does not trigger any alerts in McAfee malware scans.
For more information about BBR-SDK, see the BOSH documentation.
This known issue is fixed in Healtwatch v2.2.1 and later.
When the SVM Forwarder VM is deployed in Healthwatch Exporter for TAS for VMs, a change in the Prometheus server causes metrics with the job
and exported_job
labels to become recursive. For example, exported_job
becomes exported_exported_exported_exported_job
.
To work around this issue, set the number of SVM Forwarder VM instances for your Healthwatch deployment to 0
in the Resource Config pane of the Healthwatch Exporter for TAS for VMs tile. For more information about scaling Healthwatch resources, see (Optional) Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.
This known issue is fixed in Healthwatch v2.2.2 and later.
If you have TKGI v1.12 or later installed, the Kubernetes Nodes dashboard in the Grafana UI might not show data for Kubernetes clusters that use the containerd runtime.
In TKGI v1.11 and earlier, the name
label in Kubernetes cluster metrics start with k8s_
. However, in TKGI v1.12 and later, new Kubernetes clusters run on containerd instead of in Docker. As a result, in TKGI v1.12 and above the name
label in Kubernetes cluster metrics start with a hex value instead of k8s_
, which the Grafana instance does not recognize.
To fix this issue, upgrade to Healthwatch v2.2.2 or later.
If you are using TKGI v1.10.0 or v1.10.1, the Kubernetes Nodes dashboard in the Grafana UI might not show data for individual pods. This is due to a known issue in Kubernetes v1.19.6 and earlier and Kubernetes v1.20.1 and earlier.
To fix this issue, upgrade to TKGI v1.10.2 or later. For more information about upgrading to TKGI v1.10.2 or later, see the TKGI documentation.
If you are using TKGI to monitor Windows clusters, the Kubernetes Nodes dashboard in the Grafana UI might not show data. Healthwatch does not currently visualize node metrics for Windows clusters.
This known issue is fixed in Healthwatch v2.2.1 and later.
If you run SLI tests for TKGI through Healthwatch Exporter for TKGI, and you do not have an OpenID Connect (OIDC) provider for your Kubernetes clusters configured for TKGI, the TKGI SLI exporter VM does not automatically clean up the service accounts that it creates while running the TKGI SLI test suite.
To fix this issue, either upgrade to Healthwatch v2.2.1 or configure an OIDC provider as the identity provider for your Kubernetes clusters in the TKGI tile. This cleans up the service accounts that the TKGI SLI exporter VM creates in future TKGI SLI tests, but does not clean up existing service accounts from previous TKGI SLI tests. For more information about configuring an OIDC provider in TKGI, see OIDC Provider for Kubernetes Clusters in the TKGI documentation.
You may need to manually delete existing service accounts from previous TKGI SLI tests. For more information about manually deleting existing service accounts, see Healthwatch Exporter for TKGI Does Not Clean Up TKGI Service Accounts in Troubleshooting Heathwatch.
This known issue is fixed in Healthwatch v2.2.1 and later.
In Healthwatch v2.2.0, the backup scripts for Prometheus VMs do not clean up the intermediary snapshots created by BBR. This results in the disk space on Prometheus VMs filling up.
To fix this issue, either upgrade to Healthwatch v2.2.1 or manually clean up the snapshots. For more information about manually cleaning up the snapshots, see BBR Backup Snapshots Fill Disk Space on Prometheus VMs in Troubleshooting Healthwatch.
This known issue is fixed in Healthwatch v2.2.2 and later.
In Healthwatch v2.2.1 and earlier, the Prometheus instance does not scrape metrics from RabbitMQ on-demand instances that are configured to communicate over TLS. As a result, the RabbitMQ dashboards in the Grafana UI show no data for RabbitMQ on-demand instances that are configured to use TLS.
To fix this issue, upgrade to Healthwatch v2.2.2 or later and RabbitMQ v2.0.13 or later.
This known issue is fixed in Healthwatch v2.2.3 and later.
In Healthwatch v2.2.2 and earlier, the Grafana instance cannot load metrics data in the Grafana UI after you re-deploy an HA Healthwatch installation with multiple Prometheus instances. An HA Healthwatch installation is meant to allow the Grafana instance to continue loading data during re-deployment by ensuring that the second Prometheus instance does not start updating until after the first Prometheus instance has updated and re-starts. In Healthwatch v2.2.2 and earlier, a bug causes the second Prometheus instance to start updating before the first Prometheus instance re-starts.
To fix this issue, upgrade to Healthwatch v2.2.3 or later.
This known issue is fixed in Healthwatch v2.2.2 and later.
In Healthwatch v2.2.1 and earlier, a potential race condition sometimes causes the smoke test for Prometheus VMs to run before the Prometheus VM is ready. This leads to the smoke test failing when you re-deploy Healthwatch, even though it succeeds when you run the smoke test manually.
This known issue is fixed in Healthwatch v2.2.2 and later.
In Healthwatch v2.2.1, under rare circumstances, the Prometheus instance fails to clean up the chunks_head
directory. This leads to a full disk and subsequent failures when the Prometheus instance attempts to process new metrics.
To fix this issue, upgrade to Healthwatch v2.2.2 or later.
This known issue is fixed in Healthwatch v2.2.8 and later.
After upgrading Healthwatch, a user already exists
error occurs while attempting to log in to Grafana using UAA authentication. A new option has been added to the Grafana configuration file.
To fix this issue, upgrade Healthwatch to v2.2.8 or later.