Configuring Federation for Multi-Foundation Monitoring

This topic describes how to configure federation for your multi-foundation Healthwatch™ for VMware Tanzu^® (Healthwatch) deployment.

Overview of Federation

When you configure your Healthwatch deployment to federate metrics, the Prometheus instance in the Healthwatch tile on a monitoring VMware Tanzu^® Operations Manager™ (Ops Manager) foundation scrapes a subset of metrics from the Prometheus instances in the Healthwatch tiles installed on the Ops Manager foundations you monitor. This is useful if you want to monitor a subset of metrics from multiple Ops Manager foundations without storing all metrics from those Ops Manager foundations in a single Prometheus instance. Because federation allows you to choose which metrics the Healthwatch deployment on your monitoring Ops Manager foundation receives, you can monitor a large number of Ops Manager foundations without overwhelming the Prometheus instance in the Healthwatch deployment on your monitoring Ops Manager foundation.

To configure federation for your Healthwatch deployment, you must install the Healthwatch tile on your monitoring Ops Manager foundation and on each Ops Manager foundation you want to monitor, in addition to installing the Healthwatch Exporter tile on each Ops Manager foundation you want to monitor. Then, you must configure the Healthwatch tile on your monitoring Ops Manager foundation to federate metrics from the Prometheus installed on the Ops Manager foundations you want to monitor. If you want to federate metrics from Ops Manager foundations with TKGI installed, you must also configure TKGI cluster discovery on the Ops Manager foundations you want to monitor.

To configure federation for your multi-foundation Healthwatch deployment:

Set up your multi-foundation deployment for federation by following the procedure in the section for your runtime:
- Set Up Your Multi-Foundation TAS for VMs Deployment
- Set Up Your Multi-Foundation TKGI Deployment
Configure scrape jobs for the Prometheus instances in the Healthwatch tiles on the Ops Manager foundations you want to monitor. To configure these scrape jobs, see Configure Scrape Jobs below.
Test your federation configuration to see whether it is working correctly. To test your federation configuration, see Test Your Federation Configuration below.

If your multi-foundation Healthwatch deployment contains one or more highly available (HA) Healthwatch deployments, see Federation for a Highly Available Healthwatch Deployment below.

For more information about federation, see the Prometheus documentation.

Caution: Federating all metrics from an Ops Manager foundation you monitor negatively affects the performance of the Prometheus instance in the Healthwatch tile installed on your monitoring Ops Manager foundation, sometimes even causing it to crash. To avoid this, VMware recommends federating only certain metrics, such as service level indicator (SLI) metrics, from each Ops Manager foundation you monitor. For more information about the metrics you can collect, see Healthwatch Metrics.

Set Up Your Multi-Foundation TAS for VMs Deployment

To configure TAS for VMs deployments on multiple Ops Manager foundations to federate metrics to a single monitoring Ops Manager foundation:

Install and configure the Healthwatch and Healthwatch Exporter for TAS for VMs tiles on each Ops Manager foundation you want to monitor. To install and configure the Healthwatch and Healthwatch Exporter for TAS for VMs tiles, see the following topics:
Install and configure the Healthwatch tile on your monitoring Ops Manager foundation. To install and configure the Healthwatch tile, see the following topics:
- Installing a Tile Manually or Installing, Configuring, and Deploying a Tile Through an Automated Pipeline
- Configuring Healthwatch
In the Healthwatch tile on your monitoring Ops Manager foundation, configure scrape jobs for the Prometheus instances in the Healthwatch tiles on the Ops Manager foundations you want to monitor. To configure these scrape jobs, see Configure Scrape Jobs below.

Set Up Your Multi-Foundation TKGI Deployment

When you install the Healthwatch tile on an Ops Manager foundation that has TKGI installed, you can configure the Prometheus instance to detect on-demand Kubernetes clusters created through the TKGI API and create scrape jobs for them. However, the Prometheus instance in a Healthwatch deployment can only detect Kubernetes clusters for TKGI deployments on the same Ops Manager foundation.

For a Healthwatch deployment on one Ops Manager foundation to receive metrics for Kubernetes clusters created through TKGI deployments on other Ops Manager foundations, you must configure the Healthwatch Exporter for TKGI deployment on those Ops Manager foundations to federate metrics to the Prometheus instance in the Healthwatch deployment on the Ops Manager foundation you use to monitor the other Ops Manager foundations. If you do not configure federation for TKGI deployments on the Ops Manager foundations you want to monitor, the Healthwatch Exporter for TKGI deployments on those Ops Manager foundations can only send component metrics and SLIs related to the health of those TKGI deployments.

To configure TKGI deployments on multiple Ops Manager foundations to federate metrics to a single monitoring Ops Manager foundation:

Install and configure the Healthwatch and Healthwatch Exporter for TKGI tiles on each Ops Manager foundation you want to monitor. To install and configure the Healthwatch and Healthwatch Exporter for TKGI tiles, see the following topics:
Install and configure the Healthwatch tile on your monitoring Ops Manager foundation. To install and configure the Healthwatch tile, see the following topics:
- Installing a Tile Manually or Installing, Configuring, and Deploying a Tile Through an Automated Pipeline
- Configuring Healthwatch
Configure TKGI cluster discovery in the Healthwatch tile on each Ops Manager foundation you want to monitor. Do not configure TKGI cluster discovery in the Healthwatch tile on your monitoring foundation. To configure TKGI cluster discovery on the Ops Manager foundations you want to monitor, see Configuring TKGI Cluster Discovery.
In the Healthwatch tile on your monitoring Ops Manager foundation, configure scrape jobs for the Prometheus instances in the Healthwatch tiles on the Ops Manager foundations you want to monitor. To configure these scrape jobs, see Configure Scrape Jobs below.

Configure Scrape Jobs

To configure the Prometheus instance in the Healthwatch tile on your monitoring Ops Manager foundation to scrape metrics from the Prometheus instances in the Healthwatch tiles on the Ops Manager foundations you want to monitor:

For each Ops Manager foundation you want to monitor, open port 4450 for the Prometheus instance in the Healthwatch tile in the user console for your IaaS. For more information, see the documentation for your IaaS.
For each Ops Manager foundation you want to monitor:
1. Navigate to the Ops Manager Installation Dashboard for the Ops Manager foundation you want to monitor.
2. Click the Healthwatch tile.
3. Select the Credentials tab.
4. In the Promxy Client Mtls row of the TSDB section, click Link to Credential.
5. Record the values of private_key_pem and cert_pem. These values are the private key and certificate for Promxy Client mTLS.
  
  Note: The values of private_key_pem and cert_pem are in JSON format and contain several\n markers. Ensure that you convert all \n markers into newlines before you use these values in an upcoming step.
6. Retrieve the certificate for the Ops Manager root certificate authority (CA) of the Ops Manager foundation you want to monitor. For more information, see the Ops Manager documentation.
7. Navigate to the Ops Manager Installation Dashboard for your monitoring Ops Manager foundation.
8. Click the Healthwatch tile.
9. Select Prometheus.
10. Under Additional scrape jobs, click Add.
11. For Scrape job configuration parameters, provide in YAML format the configuration parameters for a scrape job for the Prometheus instance in the Healthwatch tile on the Ops Manager foundation you want to monitor. In the example below, the scrape job federates all metrics with names that match the regular expression ^metric_name_regex.* from the Prometheus instance at the IP address listed under the targets property:
```
job_name: example-job-name
scheme: https
metrics_path: '/federate'
params:
  'match[]':
    - '{__name__=~"^metric_name_regex.*"}'
static_configs:
  - targets:
    - 'source-tsdb-1:4450'
    - 'source-tsdb-2:4450'
```
  Note: If you have configured a load balancer or DNS entry for the Prometheus instance, include the IP address for your load balancer or DNS entry in each target listed under the targets property instead of the IP address for the Prometheus instance.
12. For Certificate and private key for TLS, enter the certificate and private key you recorded from the Promxy Client mTLS row in the Credentials tab in the Healthwatch tile installed on the Ops Manager foundation you want to monitor in a previous step.
13. For CA certificate for TLS, enter the Ops Manager root CA certificate for the Ops Manager foundation you want to monitor that you recorded in a previous step.
14. For Target server name, enter promxy.
15. Click Save.
  
  If you are using the om CLI to configure the Healthwatch tile, the example below shows how you would enter the example configuration parameters above in an automation script:
```
product-properties:
  .properties.scrape_configs:
    value:
      - ca: |
          -----BEGIN CERTIFICATE-----
          SECRET
          -----END CERTIFICATE-----
        scrape_job: |
          job_name: example-job-name
          scheme: https
          metrics_path: '/federate'
          params:
            'match[]':
              - '{__name__=~"^my_metric_name_regex.*"}'
          static_configs:
            - targets:
              - 'source-prometheus-1:4450'
        server_name: promxy
        tls_certificates:
          cert_pem: |
            -----BEGIN CERTIFICATE-----
            SECRET
            -----END CERTIFICATE-----
          private_key_pem: |
            -----BEGIN RSA PRIVATE KEY-----
            SECRET
            -----END RSA PRIVATE KEY-----
```
  For more information, see Configure and Deploy Your Tile Using the om CLI in Installing, Configuring, and Deploying a Tile Through an Automated Pipeline.

For more information about configuring scrape jobs, see Configure Prometheus in Configuring Healthwatch and the Prometheus documentation.

After you have finished configuring federation for your Healthwatch deployment, you can confirm that your federation configuration is working correctly using the Grafana UI. For more information, see Test Your Federation Configuration below.

Test Your Federation Configuration

To confirm that your federation configuration is working correctly:

In your web browser, navigate to the Grafana UI.
Log in to the Grafana UI.
On the left side of the Grafana UI homepage, click the Explore icon. An empty Explore tab appears.
In the query field to the right of the Metrics browser menu tab, enter up.
Click Run query.
Under Table, review the query results. If your federation configuration is working, the job column includes the job_name from the scrape jobs you configured for each Ops Manager foundation you monitor in Configure Federation above.

Federation for a Highly Available Healthwatch Deployment

In an HA Healthwatch deployment, each VM in the Prometheus instance in the Healthwatch tile scrapes the same data from the metric exporter VMs that the Healthwatch Exporter tiles deploy.

When federating metrics, you can configure the Prometheus instance in the Healthwatch tile on your monitoring Ops Manager foundation to scrape both copies of that data from the Prometheus instance in the Healthwatch tile on each Ops Manager foundation you monitor. To do this, include both VMs in each Prometheus instance from the Ops Manager foundations you want to monitor in the scrape job configuration parameters. While including both VMs creates duplicate sets of metrics, it also ensures that you do not lose metric data if one of the two VMs goes down. However, doubling the number of metrics that the Prometheus instance collects also negatively affects the performance of the Prometheus instance.

Alternatively, you can create load balancers or DNS entries in your IaaS user console for the Prometheus instances on each Ops Manager foundation you monitor, then include the IP addresses for each load balancer or DNS entry in the targets listed under the targets property in your scrape job configuration parameters. For more information, see Configure Scrape Jobs above.

In both cases, VMware recommends configuring static IP addresses for both VMs in each of the Prometheus instances. For more information about configuring static IP addresses for Prometheus instances, see Configure Prometheus in Configuring Healthwatch.