This topic explains how to configure alerting in Healthwatch™ for VMware Tanzu® (Healthwatch).
In Healthwatch, you can configure the Prometheus instance to send alerts to Alertmanager according to alerting rules you configure. Alertmanager then manages those alerts by removing duplicate alerts, grouping alerts together, and routing those groups to alert receiver integrations such as email, PagerDuty, or Slack. Alertmanager also silences and inhibits alerts according to the alerting rules you configure.
For more information, see the Prometheus documentation.
In the Alertmanager pane, you configure alerting rules, routing rules, and alert receivers for Alertmanager to use.
The values that you configure in the Alertmanager pane also configure their corresponding properties in the Alertmanager configuration file. For more information, see Overview of Configuration Files in Healthwatch in Configuration File Reference Guide, Configuring the Alertmanager Configuration File in Configuration File Reference Guide, and the Prometheus documentation.
To configure alerting through the Alertmanager pane:
Navigate to the Ops Manager Installation Dashboard.
Click the Healthwatch tile.
Select Alertmanager.
For Alerting rules, provide in YAML format the rule statements that define which alerts Alertmanager sends to your alert receivers:
OPS_MANAGER_URL
with the fully-qualified domain name (FQDN) of your Ops Manager deployment:
For Routing rules, provide in YAML format the route block that defines where Alertmanager sends alerts, how frequently Alertmanager sends alerts, and how Alertmanager groups alerts together. The following example shows a possible set of routing rules:
receiver: 'example-receiver'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [cluster, alertname]
group_by
gathers all alerts with the same label into a single alert. For example, including cluster
in the group_by
property groups together all alerts from the same cluster. You can see the labels for the metrics that Healthwatch collects, such as cluster
, index
, deployment
, and origin
, within the braces at the end of each metric.route
configuration parameters. For more information about the parameters you must provide, see the Prometheus documentation. (Optional) For Inhibit rules, provide in YAML format the rule statements that define which alerts Alertmanager does not send to your alert receivers. For more information, see the Prometheus documentation.
Configure the alert receivers that you specified in Routing rules in a previous step. For more information, see Configure Alert Receivers below.
You can configure email, PagerDuty, Slack, and webhook alert receivers in the Healthwatch tile. For more information, see the Prometheus documentation.
You can also configure custom alert receiver integrations that are not natively supported by Alertmanager through webhook receivers. For more information about configuring custom alert receiver integrations, see the Prometheus documentation.
If you configure two or more alert receivers with the same name, Alertmanager merges them into a single alert receiver. For more information, see Combining Alert Receivers below.
The following sections describe how to configure each type of alert receiver:
Note: If you want to provide authentication and TLS communication settings for your alert receivers, you must provide them in the associated alert receiver configuration fields described in the sections below. If the base configuration YAML for your alert receivers include fields for authentication and TLS communication settings, do not include them when you provide the configuration YAML for your alert receivers in the Alert receiver configuration parameters fields.
To configure an email alert receiver:
Under Email alert receivers, click Add.
For Alert receiver name, enter the name you want to give your email alert receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing rules field in Configure Alerting above.
For Alert receiver configuration parameters, provide the configuration parameters for your email alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a set of required configuration parameters:
to: '[email protected]'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
At minimum, your configuration parameters must include the to
, from
, and smarthost
properties. The other properties you must include depend on both the SMTP server for which you are configuring an alert receiver and the needs of your Ops Manager foundation. For more information about the properties you can include in this configuration, see the Prometheus documentation.
html
and headers
properties or leave them blank, Healthwatch automatically populates them with a default template. To view the default email template for Healthwatch, see email_template.yml on GitHub.auth_password
property, the auth_secret
property, or the <tls_config>
section in the configuration parameters for your email alert receiver. You can configure these properties in the next steps of this procedure.(Optional) To configure SMTP authentication between Alertmanager and your email alert receiver, configure one of the following fields:
(Optional) To allow Alertmanager to communicate with your email alert receiver over TLS, configure the following fields:
To configure a PagerDuty alert receiver:
Under PagerDuty alert receivers, click Add.
For Alert receiver name, enter the name you want to give your PagerDuty alert receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing rules field in Configure Alerting above.
For Alert receiver configuration parameters, provide the configuration parameters for your PagerDuty alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a possible set of configuration parameters:
url: https://api.pagerduty.com/api/v2/alerts
client: '{{ template "pagerduty.example.client" . }}'
client_url: '{{ template "pagerduty.example.clientURL" . }}'
description: '{{ template "pagerduty.example.description" .}}'
severity: 'error'
The properties you must include depend on both the PagerDuty instance for which you are configuring an alert receiver and the needs of your Ops Manager foundation. For more information about the properties you can include in this configuration, see the Prometheus documentation.
description
, details
, or links
properties or leave them blank, Healthwatch automatically populates them with a default template. To view the default PagerDuty template for Healthwatch, see pagerduty_template.yml on GitHub.routing_key
property, the service_key
property, the <http_config>
section, or the <tls_config>
section in the configuration parameters for your PagerDuty alert receiver. You can configure these properties in the next steps of this procedure.Enter your PagerDuty integration key in one of the following fields:
(Optional) To configure an HTTP client for Alertmanager to use to communicate with the PagerDuty API, configure one of the following options:
(Optional) To allow Alertmanager to communicate with your PagerDuty alert receiver over TLS, configure the following fields:
To configure a Slack alert receiver:
Under Slack alert receivers, click Add.
For Alert receiver name, enter the name you want to give your Slack alert receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing rules field in Configure Alerting above.
(Optional) For Alert receiver configuration parameters, provide the configuration parameters for your Slack alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a possible set of configuration parameters:
channel: '#operators'
username: 'Example Alerting Integration'
The properties you must include depend on both the Slack instance for which you are configuring an alert receiver and the needs of your Ops Manager foundation. For more information about the properties you can include in this configuration, see the see the Prometheus documentation.
title
, title_link
, or text
properties or leave them blank, Healthwatch automatically populates them with a default template. To view the default PagerDuty template for Healthwatch, see slack_template.yml on GitHub.api_url
property, the api_url_file
property, the <http_config>
section, or the <tls_config>
section in the configuration parameters for your Slack alert receiver. You can configure these properties in the next steps of this procedure.For Slack API URL, enter the webhook URL for your Slack instance from your Slack app directory.
(Optional) To configure an HTTP client for Alertmanager to use to communicate with the server for your Slack instance, configure one of the following options:
(Optional) To allow Alertmanager to communicate with your Slack alert receiver over TLS, configure the following fields:
To configure a webhook alert receiver:
Under Webhook alert receivers, click Add.
For Alert receiver name, enter the name you want to give your webhook alert receiver. The name you enter in this field must match the name you specified in the route block you entered in the Routing rules field in Configure Alerting above.
For Alert receiver configuration parameters, provide the configuration parameters for your webhook alert receiver in YAML format. Do not prefix the YAML with a dash. The following example shows a possible set of configuration parameters:
url: https://example.com/data/12345
max_alerts: 0
The properties you must include depend on both the webhook for which you are configuring an alert receiver and the needs of your Ops Manager foundation. For more information about the properties you can include in this configuration, see the Prometheus documentation.
<http_config>
section or the <tls_config>
section in the configuration parameters for your webhook alert receiver. You can configure these properties in the next steps of this procedure.(Optional) To configure an HTTP client for Alertmanager to use to communicate with the server that processes your webhook, configure one of the following options:
(Optional) To allow Alertmanager to communicate with your webhook alert receiver over TLS, configure the following fields:
Click Save.
If you configure two or more alert receivers with the same name, Alertmanager merges them into a single alert receiver. For example, if you configure:
Two email receivers named “Foundation” with distinct email addresses
One PagerDuty receiver named “Foundation”
One email receiver named “Clusters”
Then Alertmanager merges them into the following alert receivers:
One alert receiver named “Foundation” containing two email configurations and a PagerDuty configuration
One alert receiver named “Clusters” containing one email configuration
The example below shows how Alertmanager combines the alert receivers described above in its configuration file:
receivers:
- name: 'Foundation'
email_configs:
- to: '[email protected]'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnnotations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."
- to: '[email protected]'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnnotations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."
pagerduty_configs:
- url: https://api.pagerduty.com/api/v2/alerts
client: '{{ template "pagerduty.example.client" . }}'
client_url: '{{ template "pagerduty.example.clientURL" . }}'
description: '{{ template "pagerduty.example.description" .}}'
severity: 'error'
- name: 'Clusters'
email_configs:
- to: '[email protected]'
from: example.healthwatch.foundation.com
smarthost: smtp.example.org:587
headers: { subject: "[ALERT] - [{{ .ExampleLabels.severity }}] - {{ .ExampleAnnotations.summary }}" }
html: '{{ template "email.example.html" . }}'
text: "This is an alert."
Alertmanager includes a command-line tool called amtool
. You can use amtool
to temporarily silence Alertmanager alerts without modifying your alerting rules. For more information about how to use amtool
, see the Alertmanager documentation on GitHub.
You can also use the Alertmanager UI to view and silence alerts. To access the Alertmanager UI, see Viewing the Alertmanager UI in Troubleshooting Healthwatch.
To silence alerts using amtool
:
SSH into one of the Prometheus VMs deployed by the Healthwatch tile. Alertmanager replicates any changes you make in one Prometheus VM to all other Prometheus VMs. To SSH into one of the Prometheus VMs, see the Ops Manager documentation.
Navigate to the amtool
directory by running:
cd /var/vcap/jobs/alertmanager/packages/alertmanager/bin
View all of your currently running alerts by running:
amtool -o extended alert --alertmanager.url http://localhost:10401
This command returns a list of all currently running alerts that includes detailed information about each alert, including the name of the alert and the Prometheus instance on which it runs.
You can also query the list of alerts by name and instance to view specific alerts.
To query alerts by name, run:
amtool -o extended alert query alertname="ALERT-NAME" --alertmanager.url http://localhost:10401
Where ALERT-NAME
is the name of the alert you want to silence. You can query the exact name of the alert, or you can query a partial name and include the regular expression .*
to see all alerts that include the partial name, such as in the following example:
amtool -o extended alert query alertname=~"Test.*" --alertmanager.url http://localhost:10401
To query alerts by instance, run:
amtool -o extended alert query instance=~".+INSTANCE-NUMBER" --alertmanager.url http://localhost:10401
Where INSTANCE-NUMBER
is the number of the Prometheus instance for which you want to silence alerts.
To query alerts by name and instance, run:
amtool -o extended alert query alertname=~"ALERT-NAME" instance=~".+INSTANCE-NUMBER" --alertmanager.url http://localhost:10401
Where:
ALERT-NAME
is the name of the alert you want to silence.INSTANCE-NUMBER
is the number of the Prometheus instance for which you want to silence an alert.Run one of the following commands to silence either a specific alert or all alerts:
To silence a specific alert for a specified amount of time, run:
amtool silence add alertname=ALERT-NAME instance=~".+INSTANCE-NUMBER" --alertmanager.url http://localhost:10401
Where:
ALERT-NAME
is the name of the alert you want to silence.INSTANCE-NUMBER
is the number of the Prometheus instance for which you want to silence an alert.To silence all alerts for a specified amount of time, run:
amtool silence add 'alertname=~.+' -d TIME-TO-SILENCE -c 'COMMENT' --alertmanager.url http://localhost:10401
Where:
TIME-TO-SILENCE
is the amount of time in minutes or hours you want to silence alerts. For example, 30m
or 4h
.COMMENT
is any notes about this silence you want to add.Note:~.+
is a regular expression that includes all alerts in the silence you set.
To silence an alert indefinitely, run:
amtool silence add ALERT-NAME -c 'COMMENT' --alertmanager.url http://localhost:10401
Where: * ALERT-NAME
is the name of the alert you want to silence. * COMMENT
is a note about why the alert is being silenced.
Record the ID string from the output. You can use this ID to unmute the alert
amtool silence --alertmanager.url http://localhost:10401 expire SILENCED-ID
Where: * SILENCED-ID
is the recorded ID string.
If you would like to see what alerts are currently silenced
amtool silence query
For more information, run amtool --help
or see the Alertmanager documentation on GitHub.