This topic explains how to deploy Prometheus into a workload cluster. The procedures below apply to vSphere, Amazon Web Services (AWS), and Azure deployments.
Prometheus is an open-source systems monitoring and alerting toolkit. Tanzu Kubernetes Grid includes signed binaries for Prometheus that you can deploy on workload clusters to monitor cluster health and services.
kubectl
, as described in Install the Tanzu CLI and Kubernetes CLI for Use with a vSphere with Tanzu Supervisor or Install the Tanzu CLI and Kubernetes CLI for Use with Standalone Management Clusters.cert-manager
and contour
packages.ImportantSupport for IPv6 addresses in Tanzu Kubernetes Grid is limited; see Deploy Clusters on IPv6 (vSphere Only). If you are not deploying to an IPv6-only networking environment, you must provide IPv4 addresses in the following steps.
To prepare the cluster:
Get the admin credentials of the workload cluster into which you want to deploy Prometheus. For example:
tanzu cluster kubeconfig get my-cluster --admin
Set the context of kubectl to the cluster. For example:
kubectl config use-context my-cluster-admin@my-cluster
(Optional) Enable Ingress for Prometheus
To enable ingress, you can install the below optional packages:
Continue to Deploy Prometheus into the Workload Cluster below.
To install Prometheus:
If the cluster does not have a package repository with the Prometheus package installed, such as the tanzu-standard
repository, install one:
tanzu package repository add PACKAGE-REPO-NAME --url PACKAGE-REPO-ENDPOINT --namespace tkg-system
Where:
PACKAGE-REPO-NAME
is the name of the package repository, such as tanzu-standard
or the name of a private image registry configured with ADDITIONAL_IMAGE_REGISTRY
variables.PACKAGE-REPO-ENDPOINT
is the URL of the package repository.
tanzu-standard
URL is projects.registry.vmware.com/tkg/packages/standard/repo:v2023.10.16
. See List Package Repositories to obtain this value from the Tanzu CLI, or in Tanzu Mission Control see the Addons > Repositories list in the Cluster pane.Confirm that the Prometheus package is available in your workload cluster:
tanzu package available list -A
Retrieve the version of the available package:
tanzu package available list prometheus.tanzu.vmware.com -A
| Retrieving package versions for prometheus.tanzu.vmware.com...
NAME VERSION RELEASED-AT NAMESPACE
prometheus.tanzu.vmware.com 2.43.0+vmware.1-tkg.4 2020-11-24T18:00:00Z tanzu-package-repo-global
When you are ready to deploy Prometheus, you can:
vSphere with Tanzu: To deploy the Prometheus package to a workload cluster created by a vSphere Supervisor cluster, you must deploy it with custom values. The Prometheus package has not been validated for workload clusters on vSphere 7.0 U3.
After you confirm the package version and retrieve it, you can install the package.
Install the Prometheus package using its default values:
tanzu package install prometheus \
--package prometheus.tanzu.vmware.com \
--version AVAILABLE-PACKAGE-VERSION \
--namespace TARGET-NAMESPACE
Where:
TARGET-NAMESPACE
is the namespace in which you want to install the Prometheus package. For example, the my-packages
or tanzu-cli-managed-packages
namespace.
--namespace
flag is not specified, the Tanzu CLI uses the default
namespace. The Prometheus pods and any other resources associated with the Prometheus component are created in the tanzu-system-monitoring
namespace; do not install the Prometheus package into this namespace.kubectl create namespace my-packages
.AVAILABLE-PACKAGE-VERSION
is the version that you retrieved above, for example 2.43.0+vmware.1-tkg.4
.
For example:
tanzu package install prometheus --package prometheus.tanzu.vmware.com --namespace my-packages --version 2.43.0+vmware.1-tkg.4
\ Installing package 'prometheus.tanzu.vmware.com'
| Getting package metadata for 'prometheus.tanzu.vmware.com'
| Creating service account 'prometheus-my-packages-sa'
| Creating cluster admin role 'prometheus-my-packages-cluster-role'
| Creating cluster role binding 'prometheus-my-packages-cluster-rolebinding'
- Creating package resource
\ Package install status: Reconciling
Added installed package 'prometheus' in namespace 'my-packages'
vSphere with Tanzu: On vSphere 8 and vSphere 7.0 U2 with the vSphere with Tanzu feature enabled, the tanzu package install prometheus
command may return the error Failed to get final advertise address: No private IP address found, and explicit IP not provided
.
To fix this error, create and apply a package overlay to reconfigure the alertmanager
component:
Create a file overlay-alertmanager.yaml
containing:
---
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.and_op(overlay.subset({"kind": "Deployment"}), overlay.subset({"metadata": {"name": "alertmanager"}}))
---
spec:
template:
spec:
containers:
#@overlay/match by="name",expects="0+"
- name: alertmanager
args:
- --cluster.listen-address=
Create a secret from the overlay:
kubectl create secret generic alertmanager-overlay -n tanzu-package-repo-global -o yaml --dry-run=client --from-file=overlay-alertmanager.yaml | kubectl apply -f -
Annotate the package with the secret:
kubectl annotate PackageInstall prometheus -n tanzu-package-repo-global ext.packaging.carvel.dev/ytt-paths-from-secret-name.1=alertmanager-overlay
Continue to Verify Prometheus Deployment below.
To install the Prometheus package using user-provided values:
Create a configuration file. This file configures the Prometheus package.
tanzu package available get prometheus.tanzu.vmware.com/PACKAGE-VERSION --default-values-file-output FILE-PATH
Where PACKAGE-VERSION
is the version of the Prometheus package that you want to install and FILE-PATH
is the location to which you want to save the configuration file, for example, prometheus-data-values.yaml
. The above command creates a configuration file named prometheus-data-values.yaml
containing the default values. Note that in the previous versions, this file was called prometheus-data-values.yaml
.
For information about configuration parameters to use in prometheus-data-values.yaml
, see Prometheus Package Configuration Parameters below.
vSphere with Tanzu: If you are deploying Prometheus to a workload cluster created by a vSphere Supervisor cluster, set a non-null value for prometheus.pvc.storageClassName
and alertmanager.pvc.storageClassName
in the prometheus-data-values.yaml
file:
ingress:
enabled: true
virtual_host_fqdn: "prometheus.corp.tanzu"
prometheus_prefix: "/"
alertmanager_prefix: "/alertmanager/"
prometheusServicePort: 80
alertmanagerServicePort: 80
prometheus:
pvc:
storageClassName: STORAGE-CLASS
alertmanager:
pvc:
storageClassName: STORAGE-CLASS
Where STORAGE-CLASS
is the name of the cluster’s storage class, as returned by kubectl get storageclass
.
After you make any changes needed to your prometheus-data-values.yaml
file, remove all comments in it:
yq -i eval '... comments=""' prometheus-data-values.yaml
Deploy the package:
tanzu package install prometheus \
--package prometheus.tanzu.vmware.com \
--version PACKAGE-VERSION \
--values-file prometheus-data-values.yaml \
--namespace TARGET-NAMESPACE
Where:
TARGET-NAMESPACE
is the namespace in which you want to install the Prometheus package, Prometheus package app, and any other Kubernetes resources that describe the package. For example, the my-packages
or tanzu-cli-managed-packages
namespace. If the --namespace
flag is not specified, the Tanzu CLI uses the default
namespace. The Prometheus pods and any other resources associated with the Prometheus component are created in the tanzu-system-monitoring
namespace; do not install the Prometheus package into this namespace.PACKAGE-VERSION
is the version that you retrieved above, for example 2.43.0+vmware.1-tkg.4
.Continue to Verify Prometheus Deployment below.
After you deploy Prometheus, you can verify that the deployment is successful:
Confirm that the Prometheus package is installed. For example:
tanzu package installed list -A
/ Retrieving installed packages...
NAME PACKAGE-NAME PACKAGE-VERSION STATUS NAMESPACE
cert-manager cert-manager.tanzu.vmware.com 1.10.1+vmware.1-tkg.2 Reconcile succeeded my-packages
prometheus prometheus.tanzu.vmware.com 2.43.0+vmware.1-tkg.4 Reconcile succeeded my-packages
antrea antrea.tanzu.vmware.com Reconcile succeeded tkg-system
metrics-server metrics-server.tanzu.vmware.com Reconcile succeeded tkg-system
vsphere-cpi vsphere-cpi.tanzu.vmware.com Reconcile succeeded tkg-system
vsphere-csi vsphere-csi.tanzu.vmware.com Reconcile succeeded tkg-system
The prometheus
package and the prometheus
app are installed in the namespace that you specify when running the tanzu package install
command.
Confirm that the prometheus
app is successfully reconciled:
kubectl get apps -A
For example:
NAMESPACE NAME DESCRIPTION SINCE-DEPLOY AGE
my-packages cert-manager Reconcile succeeded 74s 29m
my-packages prometheus Reconcile succeeded 20s 33m
tkg-system antrea Reconcile succeeded 70s 3h43m
[...]
If the status is not Reconcile succeeded
, view the full status details of the prometheus
app. Viewing the full status can help you troubleshoot the problem:
kubectl get app prometheus --namespace PACKAGE-NAMESPACE -o yaml
Where PACKAGE-NAMESPACE
is the namespace in which you installed the package.
Confirm that the new services are running by listing all of the pods that are running in the cluster:
kubectl get pods -A
In the tanzu-system-monitoring
namespace, you should see the prometheus
, alertmanager
, node_exporter
, pushgateway
, cadvisor
and kube_state_metrics
services running in a pod:
NAMESPACE NAME READY STATUS RESTARTS AGE
[...]
tanzu-system-monitoring alertmanager-d6bb4d94d-7fgmb 1/1 Running 0 35m
tanzu-system-monitoring prometheus-cadvisor-pgfck 1/1 Running 0 35m
tanzu-system-monitoring prometheus-kube-state-metrics-868b5b749d-9w5f2 1/1 Running 0 35m
tanzu-system-monitoring prometheus-node-exporter-97x6c 1/1 Running 0 35m
tanzu-system-monitoring prometheus-node-exporter-dnrkk 1/1 Running 0 35m
tanzu-system-monitoring prometheus-pushgateway-84cc9b85c6-tgmv6 1/1 Running 0 35m
tanzu-system-monitoring prometheus-server-6479964fb6-kk9g2 2/2 Running 0 35m
[...]
The Prometheus pods and any other resources associated with the Prometheus component are created in the namespace you provided in prometheus-data-values.yaml
. If you are using the default namespace, these are created in the tanzu-system-monitoring
namespace.
There are two ways you can view configuration parameters of the Prometheus package:
To retrieve the package schema:
tanzu package available get prometheus.tanzu.vmware.com/2.43.0+vmware.1-tkg.4 -n AVAILABLE-PACKAGE-NAMESPACE --values-schema
This command lists configuration parameters of the Prometheus package and their default values. You can use the output to update your prometheus-data-values.yml
file created in Deploy Prometheus with Custom Values above.
The following table lists configuration parameters of the Prometheus package and describes their default values.
You can set the following configuration values in your prometheus-data-values.yml
file created in Deploy Prometheus with Custom Values above.
Parameter | Description | Type | Default |
---|---|---|---|
namespace |
Namespace where Prometheus will be deployed. | String | tanzu-system-monitoring |
prometheus.deployment.replicas |
Number of Prometheus replicas. | String | 1 |
prometheus.deployment.containers.args |
Prometheus container arguments. You can configure this parameter to change retention time. For information about configuring Prometheus storage parameters, see the Prometheus documentation. Note Longer retention times require more storage capacity than shorter retention times. It might be necessary to increase the persistent volume claim size if you are significantly increasing the retention time. | List | n/a |
prometheus.deployment.containers.resources |
Prometheus container resource requests and limits. | Map | {} |
prometheus.deployment.podAnnotations |
The Prometheus deployments pod annotations. | Map | {} |
prometheus.deployment.podLabels |
The Prometheus deployments pod labels. | Map | {} |
prometheus.deployment.configMapReload.containers.args |
Configmap-reload container arguments. | List | n/a |
prometheus.deployment.configMapReload.containers.resources |
Configmap-reload container resource requests and limits. | Map | {} |
prometheus.service.type |
Type of service to expose Prometheus. Supported Values: ClusterIP . |
String | ClusterIP |
prometheus.service.port |
Prometheus service port. | Integer | 80 |
prometheus.service.targetPort |
Prometheus service target port. | Integer | 9090 |
prometheus.service.labels |
Prometheus service labels. | Map | {} |
prometheus.service.annotations |
Prometheus service annotations. | Map | {} |
prometheus.pvc.annotations |
Storage class annotations. | Map | {} |
prometheus.pvc.storageClassName |
Storage class to use for persistent volume claim. By default this is null and default provisioner is used. | String | null |
prometheus.pvc.accessMode |
Define access mode for persistent volume claim. Supported values: ReadWriteOnce , ReadOnlyMany , ReadWriteMany . |
String | ReadWriteOnce |
prometheus.pvc.storage |
Define storage size for persistent volume claim. | String | 150Gi |
prometheus.config.prometheus_yml |
For information about the global Prometheus configuration, see the Prometheus documentation. | YAML file | prometheus.yaml |
prometheus.config.alerting_rules_yml |
For information about the Prometheus alerting rules, see the Prometheus documentation. | YAML file | alerting_rules.yaml |
prometheus.config.recording_rules_yml |
For information about the Prometheus recording rules, see the Prometheus documentation. | YAML file | recording_rules.yaml |
prometheus.config.alerts_yml |
Additional prometheus alerting rules are configured here. | YAML file | alerts_yml.yaml |
prometheus.config.rules_yml |
Additional prometheus recording rules are configured here. | YAML file | rules_yml.yaml |
alertmanager.deployment.replicas |
Number of alertmanager replicas. | Integer | 1 |
alertmanager.deployment.containers.resources |
Alertmanager container resource requests and limits. | Map | {} |
alertmanager.deployment.podAnnotations |
The Alertmanager deployments pod annotations. | Map | {} |
alertmanager.deployment.podLabels |
The Alertmanager deployments pod labels. | Map | {} |
alertmanager.service.type |
Type of service to expose Alertmanager. Supported Values: ClusterIP . |
String | ClusterIP |
alertmanager.service.port |
Alertmanager service port. | Integer | 80 |
alertmanager.service.targetPort |
Alertmanager service target port. | Integer | 9093 |
alertmanager.service.labels |
Alertmanager service labels. | Map | {} |
alertmanager.service.annotations |
Alertmanager service annotations. | Map | {} |
alertmanager.pvc.annotations |
Storage class annotations. | Map | {} |
alertmanager.pvc.storageClassName |
Storage class to use for persistent volume claim. By default this is null and default provisioner is used. | String | null |
alertmanager.pvc.accessMode |
Define access mode for persistent volume claim. Supported values: ReadWriteOnce , ReadOnlyMany , ReadWriteMany . |
String | ReadWriteOnce |
alertmanager.pvc.storage |
Define storage size for persistent volume claim. | String | 2Gi |
alertmanager.config.alertmanager_yml |
For information about the global YAML configuration for Alert Manager, see the Prometheus documentation. | YAML file | alertmanager_yml |
kube_state_metrics.deployment.replicas |
Number of kube-state-metrics replicas. | Integer | 1 |
kube_state_metrics.deployment.containers.resources |
kube-state-metrics container resource requests and limits. | Map | {} |
kube_state_metrics.deployment.podAnnotations |
The kube-state-metrics deployments pod annotations. | Map | {} |
kube_state_metrics.deployment.podLabels |
The kube-state-metrics deployments pod labels. | Map | {} |
kube_state_metrics.service.type |
Type of service to expose kube-state-metrics. Supported Values: ClusterIP . |
String | ClusterIP |
kube_state_metrics.service.port |
kube-state-metrics service port. | Integer | 80 |
kube_state_metrics.service.targetPort |
kube-state-metrics service target port. | Integer | 8080 |
kube_state_metrics.service.telemetryPort |
kube-state-metrics service telemetry port. | Integer | 81 |
kube_state_metrics.service.telemetryTargetPort |
kube-state-metrics service target telemetry port. | Integer | 8081 |
kube_state_metrics.service.labels |
kube-state-metrics service labels. | Map | {} |
kube_state_metrics.service.annotations |
kube-state-metrics service annotations. | Map | {} |
node_exporter.daemonset.replicas |
Number of node-exporter replicas. | Integer | 1 |
node_exporter.daemonset.containers.resources |
node-exporter container resource requests and limits. | Map | {} |
node_exporter.daemonset.hostNetwork |
Host networking requested for this pod. | boolean | false |
node_exporter.daemonset.podAnnotations |
The node-exporter deployments pod annotations. | Map | {} |
node_exporter.daemonset.podLabels |
The node-exporter deployments pod labels. | Map | {} |
node_exporter.service.type |
Type of service to expose node-exporter. Supported Values: ClusterIP . |
String | ClusterIP |
node_exporter.service.port |
node-exporter service port. | Integer | 9100 |
node_exporter.service.targetPort |
node-exporter service target port. | Integer | 9100 |
node_exporter.service.labels |
node-exporter service labels. | Map | {} |
node_exporter.service.annotations |
node-exporter service annotations. | Map | {} |
pushgateway.deployment.replicas |
Number of pushgateway replicas. | Integer | 1 |
pushgateway.deployment.containers.resources |
pushgateway container resource requests and limits. | Map | {} |
pushgateway.deployment.podAnnotations |
The pushgateway deployments pod annotations. | Map | {} |
pushgateway.deployment.podLabels |
The pushgateway deployments pod labels. | Map | {} |
pushgateway.service.type |
Type of service to expose pushgateway. Supported Values: ClusterIP . |
String | ClusterIP |
pushgateway.service.port |
pushgateway service port. | Integer | 9091 |
pushgateway.service.targetPort |
pushgateway service target port. | Integer | 9091 |
pushgateway.service.labels |
pushgateway service labels. | Map | {} |
pushgateway.service.annotations |
pushgateway service annotations. | Map | {} |
cadvisor.daemonset.replicas |
Number of cadvisor replicas. | Integer | 1 |
cadvisor.daemonset.containers.resources |
cadvisor container resource requests and limits. | Map | {} |
cadvisor.daemonset.podAnnotations |
The cadvisor deployments pod annotations. | Map | {} |
cadvisor.daemonset.podLabels |
The cadvisor deployments pod labels. | Map | {} |
ingress.enabled |
Activate/Deactivate ingress for prometheus and alertmanager. | Boolean | false |
ingress.virtual_host_fqdn |
Hostname for accessing promethues and alertmanager. | String | prometheus.system.tanzu |
ingress.prometheus_prefix |
Path prefix for prometheus. | String | / |
ingress.alertmanager_prefix |
Path prefix for alertmanager. | String | /alertmanager/ |
ingress.prometheusServicePort |
Prometheus service port to proxy traffic to. | Integer | 80 |
ingress.alertmanagerServicePort |
Alertmanager service port to proxy traffic to. | Integer | 80 |
ingress.tlsCertificate.tls.crt |
Optional certificate for ingress if you want to use your own TLS certificate. A self signed certificate is generated by default. Note tls.crt is a key and not nested. |
String | Generated cert |
ingress.tlsCertificate.tls.key |
Optional certificate private key for ingress if you want to use your own TLS certificate. Note tls.key is a key and not nested. |
String | Generated cert key |
ingress.tlsCertificate.ca.crt |
Optional CA certificate. Note ca.crt is a key and not nested. |
String | CA certificate |
To make changes to the configuration of the Prometheus package after deployment, update your deployed Prometheus package:
Update the Prometheus configuration in the prometheus-data-values.yaml
file.
Update the installed package:
tanzu package installed update prometheus \
--version 2.43.0+vmware.1-tkg.4 \
--values-file prometheus-data-values.yaml \
--namespace my-packages
Expected output:
| Updating package 'prometheus'
- Getting package install for 'prometheus'
| Updating secret 'prometheus-my-packages-values'
| Updating package install for 'prometheus'
Updated package install 'prometheus' in namespace 'my-packages'
The Prometheus package is reconciled using the new value or values that you added. It can take up to five minutes for kapp-controller
to apply the changes.
For information about updating, see Update a Package.
To remove the Prometheus package on your cluster, run:
tanzu package installed delete prometheus --namespace my-packages
For information about deleting, see Delete a Package.
To configure notifications for Alert Manager, edit the alertmanager.config.alertmanager_yml
section in your prometheus-data-values.yml
file.
For information about configuring notifications, such as Slack or Email, see Configuration in the Prometheus documentation.
By default, ingress is not enabled on Prometheus. This is because access to the Prometheus dashboard is not authenticated. To access the Prometheus dashboard:
Deploy Contour on the cluster.
For information about deploying Contour, see Install Contour for Ingress Control.
Copy the ingress.enabled
section below into prometheus-data-values.yaml
.
ingress:
enabled: false
virtual_host_fqdn: "prometheus.system.tanzu"
prometheus_prefix: "/"
alertmanager_prefix: "/alertmanager/"
prometheusServicePort: 80
alertmanagerServicePort: 80
#! [Optional] The certificate for the ingress if you want to use your own TLS certificate.
#! We will issue the certificate by cert-manager when it's empty.
tlsCertificate:
#! [Required] the certificate
tls.crt:
#! [Required] the private key
tls.key:
#! [Optional] the CA certificate
ca.crt:
Update ingress.enabled
from false
to true
.
Create a DNS record to map prometheus.system.tanzu
to the address of the Envoy load balancer.
To obtain the address of the Envoy load balancer, see Install Contour for Ingress Control.
Access the Prometheus dashboard by navigating to https://prometheus.system.tanzu
in a browser.
The Prometheus package is now running and scraping data from your cluster. To visualize the data in Grafana dashboards, see Deploy Grafana on Workload Clusters.