This topic describes how to manually configure and deploy the Healthwatch™ for VMware Tanzu® (Healthwatch) tile.
To install, configure, and deploy Healthwatch through an automated pipeline, see Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.
The Healthwatch tile monitors metrics across one or more Ops Manager foundations by scraping metrics from Healthwatch Exporter tiles installed on each foundation. For more information about the architecture of the Healthwatch tile, see Healthwatch Tile in Healthwatch Architecture.
After installing Healthwatch, you configure Healthwatch component VMs, including the configuration files associated with them, through the tile UI. You can also configure errands and system logging, as well as scale VM instances up or down and configure load balancers for multiple VM instances.
To configure and deploy the Healthwatch tile:
Notes: To quickly deploy the Healthwatch tile to ensure that it deploys successfully before you fully configure it, you only need to configure the Assign AZ and Networks pane.
Navigate to the Healthwatch tile in the Ops Manager Installation Dashboard. For more information, see Navigate to the Healthwatch Tile below.
Assign jobs to your Availability Zones (AZs) and networks. For more information, see Assign AZs and Networks below.
Configure the Prometheus pane. For more information, see Configure Prometheus below.
(Optional) Configure the Alertmanager pane. For more information, see (Optional) Configure Alertmanager below.
(Optional) Configure the Grafana pane. For more information, see (Optional) Configure Grafana below.
(Optional) Configure the Grafana Authentication pane. For more information, see (Optional) Configure Grafana Authentication below.
(Optional) Configure the Grafana Dashboards pane. For more information, see (Optional) Configure Grafana Dashboards below.
(Optional) Configure the Canary URLs pane. For more information, see (Optional) Configure Canary URLs below.
(Optional) Configure the Remote Write pane. For more information, see (Optional) Configure Remote Write below.
(Optional) Configure the TKGI Cluster Discovery pane. For more information, see (Optional) Configure TKGI Cluster Discovery below.
(Optional) Configure the Errands pane. For more information, see (Optional) Configure Errands below.
(Optional) Configure the Syslog pane. For more information, see (Optional) Configure Syslog below.
(Optional) Configure the Resource Config pane. For more information, see (Optional) Configure Resources below.
Deploy Healthwatch. For more information, see Deploy Healthwatch below.
After you have configured and deployed the Healthwatch tile, you can configure and deploy the Healthwatch Exporter tiles for the Ops Manager foundations you want to monitor. For more information, see Next Steps below.
To navigate to the Healthwatch tile:
Navigate to the Ops Manager Installation Dashboard.
Click the Healthwatch tile.
In the Assign AZ and Networks pane, you assign jobs to your AZs and networks.
To configure the Assign AZ and Networks pane:
Select Assign AZs and Networks.
Under Place singleton jobs in, select the first AZ. Ops Manager runs any job with a single instance in this AZ.
Under Balance other jobs in, select one or more other AZs. Ops Manager balances instances of jobs with more than one instance across the AZs that you specify.
From the Network dropdown, select the runtime network that you created when configuring the BOSH Director tile.
Click Save.
In the Prometheus pane, you configure the Prometheus instance in the Healthwatch tile to scrape metrics from the Healthwatch Exporter tiles installed on each Ops Manager foundation, as well as any external services or databases from which you want to collect metrics.
The values that you configure in the Prometheus pane also configure their corresponding properties in the Prometheus configuration file. For more information, see Overview of Configuration Files in Healthwatch in Configuration File Reference Guide, Prometheus in Configuration File Reference Guide, and the Prometheus documentation.
To configure the Prometheus pane:
Select Prometheus.
For Scrape interval, specify the frequency at which you want the Prometheus instance to scrape Prometheus exposition endpoints for metrics. The Prometheus instance scrapes all Prometheus exposition endpoints at once through a global scrape. You can enter a value string that specifies ns
, us
, µs
, ms
, s
, m
, or h
. To scrape detailed metrics without consuming too much storage, VMware recommends using the default value of 15s
, or 15 seconds.
(Optional) To configure the Prometheus instance to scrape metrics from the Healthwatch Exporter tiles installed on other Ops Manager foundations or from external services or databases, configure additional scrape jobs under Additional scrape jobs. You can configure scrape jobs for any app or service that exposes metrics using a Prometheus exposition format, such as Concourse CI. For more information about Prometheus exposition formats, see the Prometheus documentation.
Note: The Prometheus instance automatically discovers and scrapes Healthwatch Exporter tiles that are installed on the same Ops Manager foundation as the Healthwatch tile. You do not need to configure scrape jobs for these Healthwatch Exporter tiles. You only need to configure scrape jobs for Healthwatch Exporter tiles that are installed on other Ops Manager foundations.
For Scrape job configuration parameters, provide the configuration YAML for the scrape job you want to configure. This job can use any of the properties defined by Prometheus except those in the tls_config
section. Do not prefix the configuration YAML with a dash. For example:
job_name: foundation-1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:9090"
- "5.6.7.8:9090"
For more information, see the Prometheus documentation.
job_name
property, do not use the following job names:
Healthwatch-view-pas-exporter
Healthwatch-view-pks-exporter
tsdb
grafana
pks-master-kube-scheduler
pks-master-kube-controller-manager
(Optional) To allow the Prometheus instance to communicate with the server for your external service or database over TLS:
For Chunk size to calculate Diego_AvailableFreeChunksDisk SVM, enter in MB the size that you want to specify for free chunks of disk. The default value is 6144
. Healthwatch uses this free chunk size to calculate the available free disk chunks super value metric (SVM), which it then uses to calculate the Diego_AvailableFreeChunksDisk
metric. If you configure Healthwatch Exporter for TAS for VMs to deploy the SVM Forwarder VM, the SVM Forwarder VM sends theDiego_AvailableFreeChunksDisk
metric back into the Loggregator Firehose so third-party nozzles can send it to external destinations, such as a remote server or external aggregation service. For more information about SVMs, see SVM Forwarder VM - Platform Metrics and SVM Forwarder VM - Healthwatch Component Metrics in Healthwatch Metrics. For more information about deploying the SVM Forwarder VM, see (Optional) Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.
For Chunk size to calculate Diego_AvailableFreeChunksMemory SVM, enter in MB the size that you want to specify for free chunks of memory. The default value is 4096
. Healthwatch uses this free chunk size to calculate the available free memory chunks SVM, which it then uses to calculate the Diego_AvailableFreeChunksMemory
metric. If you configure Healthwatch Exporter for TAS for VMs to deploy the SVM Forwarder VM, the SVM Forwarder VM sends the Diego_AvailableFreeChunksMemory
metric back into the Loggregator Firehose so third-party nozzles can send it to external destinations, such as a remote server or external aggregation service. For more information about SVMs, see SVM Forwarder VM - Platform Metrics and SVM Forwarder VM - Healthwatch Component Metrics in Healthwatch Metrics. For more information about deploying the SVM Forwarder VM, see (Optional)Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.
(Optional) For Static IP addresses for Prometheus VMs, enter a comma-separated list of valid static IP addresses that you want to reserve for the Prometheus instance. You must enter a separate IP address for each VM in the Prometheus instance. These IP addresses must not be within the reserved IP ranges you configured in the BOSH Director tile. To find the IP addresses of the Prometheus VMs:
Note: The Prometheus instance includes two VMs by default. For more information about viewing or scaling your VMs, see Healthwatch Components and Resource Requirements.
Click Save.
In the Alertmanager pane, you configure alerting for Healthwatch. To configure alerting for Healthwatch, you configure the alerting rules that Alertmanager follows and the alert receivers to which Alertmanager sends alerts.
To configure the Alertmanager pane, see Configuring Alerting.
In the Grafana pane, you configure the route for the Grafana UI. You can also configure email alerts and HTTP and HTTPS proxy request settings for the Grafana instance.
The values that you configure in the Grafana pane also configure their corresponding properties in the Grafana configuration file. For more information, see Overview of Configuration Files in Healthwatch in Configuration File Reference Guide, Grafana in Configuration File Reference Guide, and the Grafana documentation.
To configure the Grafana pane:
Select Grafana.
Under Grafana UI route, configure the route used to access the Grafana UI by selecting one of the following options:
https://grafana.sys.DOMAIN
in a browser window, where DOMAIN
is the system domain you configured in the Domains pane of the TAS for VMs tile. For more information, see the TAS for VMs documentation.Note: Healthwatch does not automatically assign a default root URL to the Grafana UI. You must manually configure a root URL for the Grafana UI in the Grafana root URL field.
After you deploy the Healthwatch tile for the first time, you must configure a DNS entry for the Grafana instance in the console for your IaaS using this root URL and the IP address of either the Grafana VMs or the load balancer associated with the Grafana instance. The Grafana instance listens on either port443
or 80
, depending on whether you provide a TLS certificate in the Certificate and private key for HTTPS fields below. For more information about configuring DNS entries for the Grafana instance, see Configuring DNS for the Grafana Instances.*.DOMAIN
, where DOMAIN
is the domain of the DNS entry that you configured for the Grafana instance. For example, if the DNS entry you configured for the Grafana instance is grafana.example.com
, enter *.example.com
. For more information about configuring a DNS entry for the Grafana instance, see Configuring DNS for the Grafana Instance.Under Grafana email alerts, choose whether to configure email alerts from the Grafana instance. VMware recommends using Alertmanager to configure and manage alerts in Healthwatch. If you require additional or alternative alerts, you can configure the SMTP server for the Grafana instance to send email alerts.
Under HTTP and HTTPS proxy request settings, choose whether to allow the Grafana instance to make HTTP and HTTPS requests through proxy servers:
Note: You only need to configure proxy settings if you are deploying Healthwatch in an air-gapped environment and want to configure alert channels to external addresses, such as the external Slack webhook.
*.bosh
and the range of your internal network IP addresses so the Grafana instance can still access the Prometheus instance without going though the proxy server. For example, *.bosh,10.0.0.0/8,*.example.com
allows the Grafana instance to access all BOSH DNS addresses and all internal network IP addresses containing 10.0.0.0/8
or *.example.com
directly, without going though the proxy server.(Optional) For Static IP addresses for Grafana VMs, enter a comma-separated list of valid static IP addresses that you want to reserve for the Grafana instance. These IP addresses must not be within the reserved IP ranges you configured in the BOSH Director tile.
(Optional) If you want to use Grafana legacy alerting instead of new Grafana Alerting, select the Opt out of Grafana Alerting checkbox. Please note that this will delete any alerts and changes made in Grafana Alerting.
(Optional) If you want to disable the gravatar
, select the Disable gravatar checkbox.
(Optional) To log all access to Grafana, select the Enable router logging checkbox. This will allow auditing of all traffic into the system.
Click Save.
In the Grafana Authentication pane, you configure how users log in to the Grafana UI.
To configure the Grafana Authentication pane, see Configuring Grafana Authentication.
In the Grafana Dashboards pane, you configure which dashboards the Grafana instance creates in the Grafana UI. The Grafana instance can create dashboards for metrics from TAS for VMs, TKGI, VMware Tanzu SQL with MySQL for VMs (Tanzu SQL for VMs), and VMware Tanzu RabbitMQ for VMs (Tanzu RabbitMQ). For more information about these dashboards, see Default Dashboards in the Grafana UI in Using Healthwatch Dashboards in the Grafana UI.
To configure the Grafana Dashboards pane:
Select Grafana Dashboards.
Under TAS for VMs, select one of the following options:
Note: If you choose to include TAS for VMs dashboards, you must configure TAS for VMs to forward system metrics to the Loggregator Firehose. Otherwise, no metrics appear in the Router dashboard in the Grafana UI. For more information, see Troubleshooting Missing Router Metrics in Troubleshooting Healthwatch.
Under TKGI, select one of the following options:
Under Tanzu SQL for VMs, select one of the following options:
Under Tanzu RabbitMQ, select one of the following options:
Note: If you choose to include Tanzu RabbitMQ dashboards, set the Metrics polling interval field in the Tanzu RabbitMQ tile to -1
. This prevents the Tanzu RabbitMQ tile from sending duplicate metrics to the Loggregator Firehose. To configure this field, see the Tanzu RabbitMQ documentation.
In the Canary URLs pane, you configure target URLs to which the Blackbox Exporters in the Prometheus instance sends canary tests. Testing a canary target URL allows you to gauge the overall health and accessibility of an app, runtime, or deployment.
The Canary URLs pane configures the Blackbox Exporters in the Prometheus instance. For more information, see the Blackbox exporter repository on GitHub.
The Blackbox Exporters in the Prometheus instance run canary tests on the fully-qualified domain name (FQDN) of your Ops Manager deployment by default. The results from these canary tests appear in the Ops Manager Health dashboard in the Grafana UI.
To configure the Canary URLs pane:
Select Canary URLs.
For Port, specify the port that the Blackbox Exporter exposes to the Prometheus instance. The default port is 9115
. You do not need to specify a different port unless port 9115
is already in use on the Prometheus instance.
(Optional) Under Additional target URLs, you can configure additional canary target URLs. The Prometheus instance runs continuous canary tests to these URLs and records the results. To configure additional canary target URLs:
Note: The Prometheus instance automatically creates scrape jobs for these URLs. You do not need to create additional scrape jobs for them in the Prometheus pane.
Click Save.
In the Remote Write pane, you can configure the Prometheus instance to write to remote storage, in addition to its local time series database (TSDB). Healthwatch stores monitoring data for six weeks before deleting it. Configuring remote write allows Healthwatch to store data that is older than six weeks in a remote database or storage endpoint. For a list of compatible remote databases and storage endpoints, see the Prometheus documentation.
The values that you configure in the Remote Write pane also configure their corresponding properties in the Prometheus configuration file. For more information, see Overview of Configuration Files in Healthwatch in Configuration File Reference Guide, Remote Write in Configuration File Reference Guide, and the Prometheus documentation.
To configure the Remote Write pane:
Select Remote Write.
Under Remote write destinations, click Add.
For Remote storage URL, enter the URL for your remote storage endpoint. For example, https://REMOTE-STORAGE-FQDN
, where REMOTE-STORAGE-FQDN
is the FQDN of your remote storage endpoint.
In Remote timeout, enter in seconds the amount of time that the Prometheus VM tries to make a request to your remote storage endpoint before the request fails.
If your remote storage endpoint requires a username and password to log in to it, configure the following fields:
Note: If you configure a username and password for the Prometheus instance to use when logging in to your remote storage endpoint, you cannot also configure a bearer token.
If your remote storage endpoint requires a bearer token to log in to it, enter the bearer token that the Prometheus instance uses to log in to your remote storage endpoint in Bearer token.
Note: If you configure a bearer token for the Prometheus instance to use when logging in to your remote storage endpoint, you cannot also configure a username and password.
(Optional) To allow the Prometheus instance to communicate with the server for your remote storage endpoint over TLS:
(Optional) To allow the Prometheus instance to make HTTP or HTTPS requests to your remote storage endpoint through a proxy server, enter the URL for your proxy server in Proxy URL.
You can configure more granular settings for writing to your remote storage endpoint by specifying additional parameters for the shards containing in-memory queues that read from the write-ahead log in the Prometheus instance. To configure additional parameters for these shards:
Click Save.
In the TKGI Cluster Discovery pane, you configure TKGI cluster discovery for Healthwatch. You only need to configure this pane if you have Ops Manager foundations with TKGI installed.
To configure TKGI cluster discovery, see Configuring TKGI Cluster Discovery.
Errands are scripts that Ops Manager runs automatically when it installs or uninstalls a product, such as a new version of Healthwatch. There are two types of errands: post-deploy errands run after the product is installed, and pre-delete errands run before the product is uninstalled. However, there are no pre-delete errands for Healthwatch.
By default, Ops Manager always runs all errands.
In the Errands pane, you can select On to always run an errand or Off to never run it.
For more information about how Ops Manager manages errands, see the Ops Manager documentation.
To configure the Errands pane:
Select Errands.
(Optional) Choose whether to always run or never run the following errands:
Click Save.
In the Syslog pane, you can configure system logging in Healthwatch to forward log messages from Healthwatch component VMs to an external destination for troubleshooting, such as a remote server or external syslog aggregation service.
To configure the Syslog pane:
Select Syslog.
Under Do you want to configure Syslog forwarding?, select one of the following options:
For Address, enter the IP address or DNS domain name of your external destination.
For Port, enter a port on which your external destination listens.
For Transport Protocol, select TCP or UDP from the dropdown. This determines which transport protocol Healthwatch uses to forward system logs to your external destination.
(Optional) To transmit logs over TLS:
(Optional) For Queue Size, specify the number of log messages Healthwatch can hold in a buffer at a time before sending them to your external destination. The default value is 100000
.
(Optional) To forward debug logs to your external destination, activate the Forward Debug Logs checkbox. This checkbox is deactivated by default.
(Optional) To specify a custom syslog rule, enter it in Custom rsyslog configuration in RainerScript syntax. For more information about custom syslog rules, see the TAS for VMs documentation. For more information about RainerScript syntax, see the rsyslog documentation.
Click Save Syslog Settings.
In the Resource Config pane, you can scale Healthwatch component VMs up or down according to the needs of your deployment, as well as associate load balancers with a group of VMs. For example, you can scale the persistent disk size of the Prometheus instance to allow longer data retention.
To configure the Resource Config pane:
Select Resource Config.
(Optional) To scale a job, select an option from the dropdown for the resource you want to modify:
(Optional) To add a load balancer to a job:
Click Save.
To complete your installation of the Healthwatch tile:
Return to the Ops Manager Installation Dashboard.
Click Review Pending Changes.
Click Apply Changes.
For more information, see the Ops Manager documentation.
After you have successfully installed the Healthwatch tile, continue to one of the following topics to configure and deploy the Healthwatch Exporter tiles for the Ops Manager foundations you want to monitor:
If you have TAS for VMs installed on an Ops Manager foundation you want to monitor, see Configuring Healthwatch Exporter for TAS for VMs.
If you have TKGI installed on an Ops Manager foundation you want to monitor, see Configuring Healthwatch Exporter for TKGI.