This topic describes how to manually configure and deploy the Healthwatch for VMware Tanzu tile.
To install, configure, and deploy Healthwatch through an automated pipeline, see Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.
The Healthwatch tile monitors metrics across one or more Ops Manager foundations by scraping metrics from Healthwatch Exporter tiles installed on each foundation. For more information about the architecture of the Healthwatch tile, see Healthwatch Tile in Healthwatch Architecture.
After installing Healthwatch, you configure Healthwatch component VMs, including the configuration files associated with them, through the tile UI. You can also configure errands and system logging, as well as scale VM instances up or down and configure load balancers for multiple VM instances.
To configure and deploy the Healthwatch tile:
Navigate to the Healthwatch tile in the Ops Manager Installation Dashboard. For more information, see Navigate to the Healthwatch Tile below.
Assign jobs to your Availability Zones (AZs) and networks. For more information, see Assign AZs and Networks below.
Configure the Prometheus pane. For more information, see Configure Prometheus below.
(Optional) Configure the Alertmanager pane. For more information, see (Optional) Configure Alertmanager below.
(Optional) Configure the Grafana pane. For more information, see (Optional) Configure Grafana below.
(Optional) Configure the Canary URLs pane. For more information, see (Optional) Configure Canary URLs below.
(Optional) Configure the Remote Write pane. For more information, see (Optional) Configure Remote Write below.
Configure the TKGI Cluster Discovery pane. For more information, see Configure TKGI Cluster Discovery below.
(Optional) Configure the Errands pane. For more information, see (Optional) Configure Errands below.
(Optional) Configure the Syslog pane. For more information, see (Optional) Configure Syslog below.
(Optional) Configure the Resource Config pane. For more information, see (Optional) Configure Resources below.
Deploy Healthwatch. For more information, see Deploy Healthwatch below.
After you have configured and deployed the Healthwatch tile, you can configure and deploy the Healthwatch Exporter tiles for the Ops Manager foundations you want to monitor. For more information, see Next Steps below.
To navigate to the Healthwatch tile:
Navigate to the Ops Manager Installation Dashboard.
Click the Healthwatch tile.
In the Assign AZ and Networks pane, you assign jobs to your AZs and networks.
To configure the Assign AZ and Networks pane:
Select Assign AZs and Networks.
Under Place singleton jobs in, select the first AZ. Ops Manager runs any job with a single instance in this AZ.
Under Balance other jobs in, select one or more other AZs. Ops Manager balances instances of jobs with more than one instance across the AZs that you specify.
From the Network dropdown, select the runtime network that you created when configuring the BOSH Director tile.
Click Save.
In the Prometheus pane, you configure the Prometheus instance in the Healthwatch tile to scrape metrics from the Healthwatch Exporter tiles installed on each Ops Manager foundation, as well as any external services or databases from which you want to collect metrics.
The values that you configure in the Prometheus pane also configure their corresponding properties in the Prometheus configuration file. For more information, see Overview of Configuration Files in Healthwatch in Configuration File Reference Guide, Prometheus in Configuration File Reference Guide, and the Prometheus documentation.
To configure the Prometheus pane:
Select Prometheus.
For Scrape interval, specify the frequency at which you want the Prometheus instance to scrape Prometheus exposition endpoints for metrics. The Prometheus instance scrapes all Prometheus exposition endpoints at once through a global scrape. You can enter a value string that specifies ns
, us
, µs
, ms
, s
, m
, or h
. To scrape detailed metrics without consuming too much storage, VMware recommends using the default value of 15s
, or 15 seconds.
(Optional) To configure the Prometheus instance to scrape metrics from the Healthwatch Exporter tiles installed on other Ops Manager foundations or from external services or databases, configure additional scrape jobs under Additional scrape jobs. You can configure scrape jobs for any app or service that exposes metrics using a Prometheus exposition format, such as Concourse CI. For more information about Prometheus exposition formats, see the Prometheus documentation.
Note: The Prometheus instance automatically discovers and scrapes Healthwatch Exporter tiles that are installed on the same Ops Manager foundation as the Healthwatch tile. You do not need to configure scrape jobs for these Healthwatch Exporter tiles. You only need to configure scrape jobsbfor Healthwatch Exporter tiles that are installed on other Ops Manager foundations.
For Scrape job configuration parameters, provide the configuration YAML for the scrape job you want to configure. This job can use any of the properties defined by Prometheus except those in the tls_config
section. Do not prefix the configuration YAML with a dash. For example:
job_name: foundation-1
metrics_path: /metrics
scheme: https
static_configs:
- targets:
- "1.2.3.4:9090"
- "5.6.7.8:9090"
For more information, see the Prometheus documentation.
job_name
property, do not use the following job names:
Healthwatch-view-pas-exporter
Healthwatch-view-pks-exporter
tsdb
grafana
pks-master-kube-scheduler
pks-master-kube-controller-manager
For Free disk chunk size, enter in MB the size that you want to specify for free chunks of disk. The default value is 6144
. Healthwatch uses this free chunk size to calculate the available free disk chunks super value metric (SVM), which it then uses to calculate the Diego_AvailableFreeChunksDisk
metric. If you configure Healthwatch Exporter for TAS for VMs to deploy the SVM Forwarder VM, the SVM Forwarder VM sends the Diego_AvailableFreeChunksDisk
metric back into the Loggregator Firehose so third-party nozzles can send it to external destinations, such as a remote server or external aggregation service. For more information about SVMs, see SVM Forwarder VM - Platform Metrics and SVM Forwarder VM - Healthwatch Component Metrics in Healthwatch Metrics. For more information about deploying the SVM Forwarder VM, see (Optional) Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.
For Free memory chunk size, enter in MB the size that you want to specify for free chunks of memory. The default value is 4096
. Healthwatch uses this free chunk size to calculate the available free memory chunks SVM, which it then uses to calculate the Diego_AvailableFreeChunksMemory
metric. If you configure Healthwatch Exporter for TAS for VMs to deploy the SVM Forwarder VM, the SVM Forwarder VM sends the Diego_AvailableFreeChunksMemory
metric back into the Loggregator Firehose so third-party nozzles can send it to external destinations, such as a remote server or external aggregation service. For more information about SVMs, see SVM Forwarder VM - Platform Metrics and SVM Forwarder VM - Healthwatch Component Metrics in Healthwatch Metrics. For more information about deploying the SVM Forwarder VM, see (Optional) Configure Resources in Configuring Healthwatch Exporter for TAS for VMs.
(Optional) For Static IP addresses for Prometheus VMs, enter a comma-separated list of valid static IP addresses that you want to reserve for the Prometheus instance. You must enter a separate IP address for each VM in the Prometheus instance. These IP addresses must not be within the reserved IP ranges you configured in the BOSH Director tile. To find the IP addresses of the Prometheus VMs:
Note: The Prometheus instance includes two VMs by default. For more information about viewing or scaling your VMs, see Healthwatch Components and Resource Requirements.
Click Save.
In the Alertmanager pane, you configure alerting for Healthwatch. To configure alerting for Healthwatch, you configure the alerting rules that Alertmanager follows and the alert receivers to which Alertmanager sends alerts.
To configure the Alertmanager pane, see Configuring Alerting.
In the Grafana pane, you configure how users access and authenticate with the Grafana UI, as well as which dashboards appear in the Grafana UI. For more information about the Grafana UI as it relates to Healthwatch, see Healthwatch.
The values that you configure in the Grafana pane also configure their corresponding properties in the Grafana configuration file. For more information, see Overview of Configuration Files in Healthwatch in Configuration File Reference Guide, Configuring the Grafana Configuration File in Configuration File Reference Guide, and the Grafana documentation.
To configure the Grafana pane:
Select Grafana.
(Optional) If you configured generic OAuth or UAA authentication for users to log in to the Grafana UI, or if you configured alerts through Alertmanager, enter a URL for the Grafana UI in Grafana root URL. You must configure this URL to allow a generic OAuth provider or UAA to redirect users to the Grafana UI. Alertmanager also uses this URL to generate links to the Grafana UI in alert messages.
Note: Healthwatch v2.1 does not automatically assign a default root URL to the Grafana UI. You must manually configure a root URL for the Grafana UI in the Grafana root URL field.
After you deploy the Healthwatch tile for the first time, you must use this root URL and the public IP address of either a single Grafana VM or the load balancer associated with your Grafana instance to configure a DNS entry for the Grafana instance in the console for your IaaS. Your Grafana instance listens on either port443
or 80
, depending on whether you provide a TLS certificate in the Certificate and private key for HTTPS fields below. For more information about configuring DNS for the Grafana instance, see Configuring DNS for the Grafana Instance. Under HTTP and HTTPS proxy request settings, choose whether to allow the Grafana instance to make HTTP and HTTPS requests through proxy servers:
*.bosh
and the ange of your internal network IP addresses so the Grafana instance can still access the Prometheus instance without going though the proxy server. For example, *.bosh,10.0.0.0/8,*.example.com
allows the Grafana instance to access all BOSH DNS addresses and all internal network IP addresses containing 10.0.0.0/8
or *.example.com
directly, without going though the proxy server. Note: You only need to configure proxy settings if you are deploying Healthwatch in an air-gapped environment and want to configure alert channels to external addresses, such as the external Slack webhook.
(Optional) For Static IP addresses for Grafana VMs, enter a comma-separated list of valid static IP addresses that you want to reserve for the Grafana instance. These IP addresses must not be within the reserved IP ranges you configured in the BOSH Director tile.
(Optional) To prevent users from logging in to Grafana UI with basic authentication, including admin users, deactivate the Allow Grafana UI login with basic authentication checkbox. This checkbox is activated by default.
Under Runtime discovery, select how you want the Grafana instance to discover the runtimes installed on your Ops Manager foundations:
Note: If you are using Pivotal Application Service (PAS) v2.7, you must allow PAS v2.7 to forward system metrics to the Loggregator Firehose, or no metrics appear in the Router dashboard in the Grafana UI. For more information, see Troubleshooting Missing Router Metrics in Troubleshooting Healthwatch.
(Optional) If you want the Grafana instance to create a dashboard for metrics from the VMware Tanzu SQL with MySQL for VMs (Tanzu SQL for VMs) tile, activate the Create Tanzu SQL for VMs dashboard checkbox. This checkbox is deactivated by default.
(Optional) If you want the Grafana instance to create dashboards for metrics from the VMware Tanzu RabbitMQ for VMs (Tanzu RabbitMQ) tile, activate the Create Tanzu RabbitMQ dashboards checkbox. This checkbox is deactivated by default.
Note: If you activate the Create Tanzu RabbitMQ dashboards checkbox, set the Metrics polling interval field in the Tanzu RabbitMQ tile to -1
. This prevents the Tanzu RabbitMQ tile from sending duplicate metrics to the Loggregator Firehose. To configure this field, see the Tanzu RabbitMQ documentation.
(Optional) To allow HTTPS connections to one or more Grafana instances, you must provide a certificate and private key for the Grafana instance to use for TLS connections in Certificate and private key for HTTPS.
VMware recommends also providing a certificate signed by a third-party CA in CA certificate for HTTPS. You can generate a self-signed certificate using the Ops Manager root CA, but doing so causes your browser to warn you that your CA is invalid every time you access the Grafana UI.
*.DOMAIN
, where DOMAIN
is the domain of the DNS entry that you configured for the Grafana instance. For example, if the DNS entry you configured for the Grafana instance is grafana.example.com
, enter *.example.com
. For more information about configuring a DNS entry for the Grafana instance, see Configuring DNS for the Grafana Instance.(Optional) To configure an additional cipher suite for TLS connections to the Grafana instance, enter a comma-separated list of ciphers in Additional ciphers for TLS. For a list of supported cipher suites, see cipher_suites.go in the Go repository on GitHub.
Under Grafana UI authentication method, select the user authentication method you want to configure for users to log in to the Grafana UI:
Under Grafana email alerts, choose whether to allow email alerts from the Grafana UI.
Click Save.
In the Canary URLs pane, you configure target URLs to which the Blackbox Exporters in the Prometheus instance sends canary tests. Testing a canary target URL allows you to gauge the overall health and accessibility of an app, runtime, or deployment.
The Canary URLs pane configures the Blackbox Exporters in the Prometheus instance. For more information, see the Blackbox exporter repository on GitHub.
To configure the Canary URLs pane:
Select Canary URLs.
For Port, specify the port that the Blackbox Exporter exposes to the Prometheus instance. The default port is 9115
. You do not need to specify a different port unless port 9115
is already in use on the Prometheus instance.
(Optional) For Ops Manager FQDN, enter https://OPS-MANAGER-FQDN/api/v0/info
, where OPS-MANAGER-FQDN
is the fully-qualified domain name (FQDN) of your Ops Manager deployment. This creates a canary target URL that allows the Blackbox Exporter to test whether the Ops Manager Installation Dashboard is accessible. The results from these canary tests appear in the Ops Manager Health dashboard in the Grafana UI.
(Optional) If your Ops Manager deployment uses self-signed certificates, activate the Skip TLS certificate verification checkbox. When this checkbox is activated, the Prometheus instance does not verify the identity of your Ops Manager deployment. This checkbox is deactivated by default.
(Optional) Under Target URLs, you can configure canary target URLs. The Prometheus instance runs continuous canary tests to these URLs and records the results. To configure canary target URLs:
apps.sys.FOUNDATION-URL
if you have TAS for VMs installed, or api.pks.FOUNDATION-URL:8443
if you have TKGI installed, where FOUNDATION-URL
is the root URL of your Ops Manager foundation. Note: The Prometheus instance automatically creates scrape jobs for these URLs. You do not need to create additional scrape jobs for them in the Prometheus pane.
Click Save.
In the Remote Write pane, you can configure the Prometheus instance to write to remote storage, in addition to its local time series database (TSDB). Healthwatch stores monitoring data for six weeks before deleting it. Configuring remote write allows Healthwatch to store data that is older than six weeks in a remote database or storage endpoint. For a list of compatible remote databases and storage endpoints, see the [Prometheus documentation] (https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage).
The values that you configure in the Remote Write pane also configure their corresponding properties in the Prometheus configuration file. For more information, see Overview of Configuration Files in Healthwatch in Configuration File Reference Guide, Remote Write in Configuration File Reference Guide, and the Prometheus documentation.
To configure the Remote Write pane:
Select Remote Write.
Under Remote write destinations, click Add.
For Remote storage URL, enter the URL for your remote storage endpoint. For example, https://REMOTE-STORAGE-FQDN
, where REMOTE-STORAGE-FQDN
is the FQDN of your remote storage endpoint.
In Remote timeout, enter in seconds the amount of time that the Prometheus VM tries to make a request to your remote storage endpoint before the request fails.
If your remote storage endpoint requires a username and password to log in to it, configure the following fields:
(Optional) To allow the Prometheus instance to communicate with the server for your remote storage endpoint over TLS:
(Optional) To allow the Prometheus instance to make HTTP or HTTPS requests to your remote storage endpoint through a proxy server, enter the URL for your proxy server in Proxy URL.
You can configure more granular settings for writing to your remote storage endpoint by specifying additional parameters for the shards containing in-memory queues that read from the write-ahead log in the Prometheus instance. To configure additional parameters for these shards:
Click Save.
In the TKGI Cluster Discovery pane, you configure TKGI cluster discovery for Healthwatch. You only need to configure this pane if you have Ops Manager foundations with TKGI installed.
To configure TKGI cluster discovery, see Configuring TKGI Cluster Discovery.
Errands are scripts that Ops Manager runs automatically when it installs or uninstalls a product, such as a new version of Healthwatch. There are two types of errands: post-deploy errands run after the product is installed, and pre-delete errands run before the product is uninstalled. However, there are no pre-delete errands for Healthwatch.
By default, Ops Manager always runs all errands.
In the Errands pane, you can select On to always run an errand or Off to never run it.
For more information about how Ops Manager manages errands, see the Ops Manager documentation.
To configure the Errands pane:
Select Errands.
(Optional) Choose whether to always run or never run the following errands:
Click Save.
In the Syslog pane, you can configure system logging in Healthwatch to forward log messages from Healthwatch component VMs to an external destination for troubleshooting, such as a remote server or external syslog aggregation service.
To configure the Syslog pane:
Select Syslog.
Under Do you want to configure Syslog forwarding?, select one of the following options:
For Address, enter the IP address or DNS domain name of your external destination.
For Port, enter a port on which your external destination listens.
For Transport Protocol, select TCP or UDP from the dropdown. This determines which transport protocol Healthwatch uses to forward system logs to your external destination.
(Optional) To transmit logs over TLS:
(Optional) For Queue Size, specify the number of log messages Healthwatch can hold in a buffer at a time before sending them to your external destination. The default value is 100000
.
(Optional) To forward debug logs to your external destination, activate the Forward Debug Logs checkbox. This checkbox is deactivated by default.
(Optional) To specify a custom syslog rule, enter it in Custom rsyslog configuration in RainerScript syntax. For more information about custom syslog rules, see the TAS for VMs documentation. For more information about RainerScript syntax, see the rsyslog documentation.
Click Save Syslog Settings.
In the Resource Config pane, you can scale Healthwatch component VMs up or down according to the needs of your deployment, as well as associate load balancers with a group of VMs. For example, you can scale the persistent disk size of the Prometheus instance to allow longer data retention.
To configure the Resource Config pane:
Select Resource Config.
(Optional) To scale a job, select an option from the dropdown for the resource you want to modify:
(Optional) To add a load balancer to a job:
Click Save.
To complete your installation of the Healthwatch tile:
Return to the Ops Manager Installation Dashboard.
Click Review Pending Changes.
Click Apply Changes.
For more information, see the Ops Manager documentation.
After you have successfully installed the Healthwatch tile, continue to one of the following topics to configure and deploy the Healthwatch Exporter tiles for the Ops Manager foundations you want to monitor:
If you have TAS for VMs installed on an Ops Manager foundation you want to monitor, see Configuring Healthwatch Exporter for TAS for VMs.
If you have TKGI installed on an Ops Manager foundation you want to monitor, see Configuring Healthwatch Exporter for TKGI.