Collecting Metrics with Prometheus to Monitor vSphere Container Storage Plug-in

You can use Prometheus to collect vSphere Container Storage Plug-in metrics. You can then visualize these metrics with Grafana dashboards to monitor health and stability of vSphere Container Storage Plug-in.

What Is Prometheus and Grafana?

Prometheus is an open-source monitoring software that collects, organizes, and stores metrics along with unique identifiers and timestamps. vSphere Container Storage Plug-in exposes its metrics so that Prometheus can collect them.

Using the information captured in Prometheus, you can build Grafana dashboards that help you analyse and understand the health and behavior of vSphere Container Storage Plug-in.

For more information, see the Prometheus documentation at https://prometheus.io/docs/introduction/overview/.

Exposing Prometheus Metrics

Prometheus collects metrics from targets by scraping metrics HTTP endpoints.

In the controller pod of vSphere Container Storage Plug-in, the following two containers expose metrics:

The vsphere-csi-controller container exposes Prometheus metrics from port 2112.
The container provides communication from the Kubernetes Cluster API server to the CNS component on vCenter Server for volume lifecycle operations.
The vsphere-syncer container exposes Prometheus metrics from port 2113.
The container sends metadata information about persistent volumes to the CNS component on vCenter Server, so that it can be displayed in the vSphere Client in the Container Volumes view.

View Prometheus Metrics

You can view Prometheus metrics exposed by vsphere-csi-controller service of vSphere Container Storage Plug-in.

Get the Cluster IP of the vsphere-csi-controller service.

# kubectl get service -n vmware-system-csi
NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
vsphere-csi-controller   ClusterIP   10.100.XXX.XX   <none>        2112/TCP,2113/TCP   23h

View Prometheus metrics exposed by the vsphere-csi-controller service.

To get metrics exposed at a specific port, use the appropriate command.

Action	Command
Get metrics exposed at port 2112	# curl 10.100.XXX.XX:2112/metrics
Get metrics exposed at port 2113	# curl 10.100.XXX.XX:2113/metrics

Prometheus Metrics Exposed by vSphere Container Storage Plug-in

Name	Type	Description	Example
vsphere_csi_info	Gauge	Metrics that indicates the `vsphere-csi-controller` container version.	`vsphere_csi_info{version="16b7a33"} 1`
vsphere_syncer_info	Gauge	Metrics that indicates the `vsphere-syncer` container version.	`vsphere_syncer_info{version="16b7a33"} 1`
vsphere_cns_volume_ops_histogram	Vector of histogram	Histogram vector metrics to observe various control operations on CNS. The `optype` field indicates the type of the CNS volume operation. The value of `optype` can be the following: create-volume delete-volume attach-volume detach-volume update-volume-metadata expand-volume query-volume query-all-volume query-volume-info relocate-volume configure-volume-acl query-snapshots create-snapshot delete-snapshot The value of the `status` field can be `pass` or `fail`.	`vsphere_cns_volume_ops_histogram_bucket{optype="attach-volume",status="pass",le="1"} 1` `vsphere_cns_volume_ops_histogram_sum{optype="attach-volume",status="pass"} 6.611152518` `vsphere_cns_volume_ops_histogram_count{optype="attach-volume",status="pass"} 3`
vsphere_csi_volume_ops_histogram	Vector of histogram	Histogram vector metrics to observe various control operations in vSphere Container Storage Plug-in. The `optype` field indicates the type of the volume operation performed by vSphere Container Storage Plug-in. The value of `optype` can be the following: create-volume delete-volume attach-volume detach-volume expand-volume create-snapshot delete-snapshot list-snapshot The value of the `status` field can be `pass` or `fail`.	`vsphere_csi_volume_ops_histogram_bucket{optype="create-volume",status="pass",voltype="block",le="7"} 3` `vsphere_csi_volume_ops_histogram_sum{optype="create- volume",status="pass",voltype="block"} 9.983518201` `vsphere_csi_volume_ops_histogram_count{optype="create-volume",status="pass",voltype="block"} 3`
vsphere_full_sync_ops_histogram	Vector of histogram	Histogram vector metric to observe the full synchronization operation of vSphere Container Storage Plug-in. The value of the `status` field can be `pass` or `fail`.	`vsphere_full_sync_ops_histogram_bucket{status="pass",le="7"} 73` `vsphere_full_sync_ops_histogram_sum{status="pass"} 7.559699346999998` `vsphere_full_sync_ops_histogram_count{status="pass"} 73`

Deploy Prometheus and Build Grafana Dashboards

Follow this sample workflow to deploy a Prometheus and build Grafana dashboards.

Deploy Prometheus Monitoring Stack

Deploy a Prometheus monitoring stack that includes AlertManager and Grafana.

Procedure

Clone the kube-prometheus repository from GitHub.

% git clone https://github.com/prometheus-operator/kube-prometheus
Cloning into 'kube-prometheus'...
remote: Enumerating objects: 15523, done.
remote: Counting objects: 100% (209/209), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 15523 (delta 126), reused 123 (delta 78), pack-reused 15314
Receiving objects: 100% (15523/15523), 7.79 MiB | 542.00 KiB/s, done.
Resolving deltas: 100% (9884/9884), done.

% cd kube-prometheus

% ls
CHANGELOG.md experimental
CONTRIBUTING.md go.mod
DCO go.sum
LICENSE jsonnet
Makefile jsonnetfile.json
README.md jsonnetfile.lock.json
RELEASE.md kubescape-exceptions.json
build.sh kustomization.yaml
code-of-conduct.md manifests
developer-workspace scripts
docs sync-to-internal-registry.jsonnet
example.jsonnet tests
examples
%

Apply the kube-prometheus manifests.

Create the CRDs used by the Prometheus stack.

% kubectl apply --server-side -f manifests/setup
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com serverside-applied
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com serverside-applied
namespace/monitoring serverside-applied


% kubectl get crd
NAME                                             CREATED AT
alertmanagerconfigs.monitoring.coreos.com        2022-03-09T09:16:24Z
alertmanagers.monitoring.coreos.com              2022-03-09T09:16:25Z
backups.velero.io                                2022-02-18T11:12:17Z
backupstoragelocations.velero.io                 2022-02-18T11:12:17Z
cnsvolumeoperationrequests.cns.vmware.com        2022-02-10T11:38:35Z
csinodetopologies.cns.vmware.com                 2022-02-10T11:38:55Z
deletebackuprequests.velero.io                   2022-02-18T11:12:17Z
downloadrequests.velero.io                       2022-02-18T11:12:17Z
podmonitors.monitoring.coreos.com                2022-03-09T09:16:25Z
podvolumebackups.velero.io                       2022-02-18T11:12:17Z
podvolumerestores.velero.io                      2022-02-18T11:12:17Z
probes.monitoring.coreos.com                     2022-03-09T09:16:25Z
prometheuses.monitoring.coreos.com               2022-03-09T09:16:26Z
prometheusrules.monitoring.coreos.com            2022-03-09T09:16:27Z
resticrepositories.velero.io                     2022-02-18T11:12:17Z
restores.velero.io                               2022-02-18T11:12:17Z
schedules.velero.io                              2022-02-18T11:12:18Z
serverstatusrequests.velero.io                   2022-02-18T11:12:18Z
servicemonitors.monitoring.coreos.com            2022-03-09T09:16:27Z
thanosrulers.monitoring.coreos.com               2022-03-09T09:16:27Z
volumesnapshotclasses.snapshot.storage.k8s.io    2022-02-10T11:48:15Z
volumesnapshotcontents.snapshot.storage.k8s.io   2022-02-10T11:48:16Z
volumesnapshotlocations.velero.io                2022-02-18T11:12:18Z
volumesnapshots.snapshot.storage.k8s.io          2022-02-10T11:48:17Z

Deploy and verify the Prometheus stack objects.

% kubectl apply -f manifests/
alertmanager.monitoring.coreos.com/main created
poddisruptionbudget.policy/alertmanager-main created
prometheusrule.monitoring.coreos.com/alertmanager-main-rules created
secret/alertmanager-main created
service/alertmanager-main created
serviceaccount/alertmanager-main created
servicemonitor.monitoring.coreos.com/alertmanager-main created
clusterrole.rbac.authorization.k8s.io/blackbox-exporter created
clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created
configmap/blackbox-exporter-configuration created
deployment.apps/blackbox-exporter created
service/blackbox-exporter created
serviceaccount/blackbox-exporter created
servicemonitor.monitoring.coreos.com/blackbox-exporter created
secret/grafana-config created
secret/grafana-datasources created
configmap/grafana-dashboard-alertmanager-overview created
configmap/grafana-dashboard-apiserver created
configmap/grafana-dashboard-cluster-total created
configmap/grafana-dashboard-controller-manager created
configmap/grafana-dashboard-grafana-overview created
configmap/grafana-dashboard-k8s-resources-cluster created
configmap/grafana-dashboard-k8s-resources-namespace created
configmap/grafana-dashboard-k8s-resources-node created
configmap/grafana-dashboard-k8s-resources-pod created
configmap/grafana-dashboard-k8s-resources-workload created
configmap/grafana-dashboard-k8s-resources-workloads-namespace created
configmap/grafana-dashboard-kubelet created
configmap/grafana-dashboard-namespace-by-pod created
configmap/grafana-dashboard-namespace-by-workload created
configmap/grafana-dashboard-node-cluster-rsrc-use created
configmap/grafana-dashboard-node-rsrc-use created
configmap/grafana-dashboard-nodes created
configmap/grafana-dashboard-persistentvolumesusage created
configmap/grafana-dashboard-pod-total created
configmap/grafana-dashboard-prometheus-remote-write created
configmap/grafana-dashboard-prometheus created
configmap/grafana-dashboard-proxy created
configmap/grafana-dashboard-scheduler created
configmap/grafana-dashboard-workload-total created
configmap/grafana-dashboards created
deployment.apps/grafana created
prometheusrule.monitoring.coreos.com/grafana-rules created
service/grafana created
serviceaccount/grafana created
servicemonitor.monitoring.coreos.com/grafana created
prometheusrule.monitoring.coreos.com/kube-prometheus-rules created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
deployment.apps/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created
service/kube-state-metrics created
serviceaccount/kube-state-metrics created
servicemonitor.monitoring.coreos.com/kube-state-metrics created
prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created
servicemonitor.monitoring.coreos.com/kube-apiserver created
servicemonitor.monitoring.coreos.com/coredns created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
servicemonitor.monitoring.coreos.com/kube-scheduler created
servicemonitor.monitoring.coreos.com/kubelet created
clusterrole.rbac.authorization.k8s.io/node-exporter created
clusterrolebinding.rbac.authorization.k8s.io/node-exporter created
daemonset.apps/node-exporter created
prometheusrule.monitoring.coreos.com/node-exporter-rules created
service/node-exporter created
serviceaccount/node-exporter created
servicemonitor.monitoring.coreos.com/node-exporter created
clusterrole.rbac.authorization.k8s.io/prometheus-k8s created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created
poddisruptionbudget.policy/prometheus-k8s created
prometheus.monitoring.coreos.com/k8s created
prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
rolebinding.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s-config created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
role.rbac.authorization.k8s.io/prometheus-k8s created
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
clusterrole.rbac.authorization.k8s.io/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created
clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created
clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created
configmap/adapter-config created
deployment.apps/prometheus-adapter created
poddisruptionbudget.policy/prometheus-adapter created
rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created
service/prometheus-adapter created
serviceaccount/prometheus-adapter created
servicemonitor.monitoring.coreos.com/prometheus-adapter created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
prometheusrule.monitoring.coreos.com/prometheus-operator-rules created
service/prometheus-operator created
serviceaccount/prometheus-operator created
servicemonitor.monitoring.coreos.com/prometheus-operator created


% kubectl get servicemonitors -A
NAMESPACE    NAME                      AGE
monitoring   alertmanager-main         46s
monitoring   blackbox-exporter         45s
monitoring   coredns                   25s
monitoring   grafana                   27s
monitoring   kube-apiserver            25s
monitoring   kube-controller-manager   24s
monitoring   kube-scheduler            24s
monitoring   kube-state-metrics        26s
monitoring   kubelet                   24s
monitoring   node-exporter             23s
monitoring   prometheus-adapter        17s
monitoring   prometheus-k8s            20s
monitoring   prometheus-operator       16s


% kubectl get pod -n monitoring
NAME                                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2    Running   0          95s
alertmanager-main-1                    2/2    Running   0          95s
alertmanager-main-2                    2/2    Running   0          95s
blackbox-exporter-7d89b9b799-svr4t     3/3    Running   0          2m5s
grafana-5577bc8799-b5bnd               1/1    Running   0          107s
kube-state-metrics-d5754d6dc-spx4w     3/3    Running   0          106s
node-exporter-8b44z                    2/2    Running   0          103s
node-exporter-jrxrc                    2/2    Running   0          103s
node-exporter-pj7nb                    2/2    Running   0          103s
prometheus-adapter-6998fcc6b5-dlqk6    1/1    Running   0          97s
prometheus-adapter-6998fcc6b5-qswk4    1/1    Running   0          97s
prometheus-k8s-0                       2/2    Running   0          94s
prometheus-k8s-1                       2/2    Running   0          94s
prometheus-operator-59647c66cf-ldppj   2/2    Running   0          96s


% kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       ClusterIP   10.96.161.166   <none>        9093/TCP,8080/TCP            7m18s
alertmanager-operated   ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   6m47s
blackbox-exporter       ClusterIP   10.104.28.233   <none>        9115/TCP,19115/TCP           7m17s
grafana                 ClusterIP   10.97.77.202    <none>        3000/TCP                     7m
kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP            6m58s
node-exporter           ClusterIP   None            <none>        9100/TCP                     6m55s
prometheus-adapter      ClusterIP   10.104.10.57    <none>        443/TCP                      6m50s
prometheus-k8s          ClusterIP   10.99.185.136   <none>        9090/TCP,8080/TCP            6m53s
prometheus-operated     ClusterIP   None            <none>        9090/TCP                     6m46s
prometheus-operator     ClusterIP   None            <none>        8443/TCP

Adjust the ClusterRole prometheus-k8s.

When deployed through kube-prometheus, the ClusterRole prometheus-k8s does not have the necessary apiGroup resources and verbs rules to pick up metrics of vSphere Container Storage Plug-in. You must modify the ClusterRole with the necessary rules.

Display the ClusterRole after it was first created.

% kubectl get ClusterRole prometheus-k8s -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"prometheus","app.kubernetes.io/instance":"k8s","app.kubernetes.io/name":"prometheus","app.kubernetes.io/part-of":"kube-prometheus","app.kubernetes.io/version":"2.33.4"},"name":"prometheus-k8s"},"rules":[{"apiGroups":[""],"resources":["nodes/metrics"],"verbs":["get"]},{"nonResourceURLs":["/metrics"],"verbs":["get"]}]}
  creationTimestamp: "2022-03-09T09:19:39Z"
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.33.4
  name: prometheus-k8s
  resourceVersion: "7283142"
  uid: e18f021c-3e6e-4162-98ca-bbf912b75b06
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

Update the apiGroup resources and verbs rules of the manifest.

% cat prometheus-clusterRole-updated.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.33.0
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- nonResourceURLs:
  - /metrics
  verbs:
  - get
% kubectl apply -f prometheus-clusterRole-updated.yaml
clusterrole.rbac.authorization.k8s.io/prometheus-k8s configured

Display the updated ClusterRole.

% kubectl get ClusterRole prometheus-k8s -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"prometheus","app.kubernetes.io/instance":"k8s","app.kubernetes.io/name":"prometheus","app.kubernetes.io/part-of":"kube-prometheus","app.kubernetes.io/version":"2.33.0"},"name":"prometheus-k8s"},"rules":[{"apiGroups":[""],"resources":["nodes","services","endpoints","pods"],"verbs":["get","list","watch"]},{"nonResourceURLs":["/metrics"],"verbs":["get"]}]}
  creationTimestamp: "2022-03-09T09:19:39Z"
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.33.0
  name: prometheus-k8s
  resourceVersion: "7284231"
  uid: e18f021c-3e6e-4162-98ca-bbf912b75b06
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get

Create Service Monitor.

You must create a ServiceMonitor object to monitor any service, such as vSphere Container Storage Plug-in, through Prometheus.

Create the manifest and deploy the ServiceMonitor object.

The object will be used to monitor the vsphere-csi-controller service. The endpoints refer to ports 2112 (ctlr) and 2113 (syncer).

% cat vsphere-csi-controller-service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: vsphere-csi-controller-prometheus-servicemonitor
  namespace: monitoring
  labels:
    name: vsphere-csi-controller-prometheus-servicemonitor
spec:
  selector:
    matchLabels:
      app: vsphere-csi-controller
  namespaceSelector:
    matchNames:
    - vmware-system-csi
  endpoints:
  - port: ctlr
  - port: syncer


% kubectl apply -f vsphere-csi-controller-service-monitor.yaml
servicemonitor.monitoring.coreos.com/vsphere-csi-controller-prometheus-servicemonitor created

Verify that ServiceMonitors are running.

% kubectl get servicemonitors -A
NAMESPACE    NAME                                              AGE
monitoring  alertmanager-main                                  9m32s
monitoring  blackbox-exporter                                  9m31s
monitoring  coredns                                            9m11s
monitoring  grafana                                            9m13s
monitoring  kube-apiserver                                     9m11s
monitoring  kube-controller-manager                            9m10s
monitoring  kube-scheduler                                     9m10s
monitoring  kube-state-metrics                                 9m12s
monitoring  kubelet                                            9m10s
monitoring  node-exporter                                      9m9s
monitoring  prometheus-adapter                                 9m3s
monitoring  prometheus-k8s                                     9m6s
monitoring  prometheus-operator                                9m2s
monitoring  vsphere-csi-controller-prometheus-servicemonitor   42s

Check the logs on the prometheus-k8s-* nodes in the monitoring namespace.

The logs list any potential issues with scraping metrics from vSphere Container Storage Plug-in. For example, if you have not correctly updated the ClusterRole, you can observe errors similar to this:

ts=2022-03-07T15:15:06.580Z caller=klog.go:116 level=error component=k8s_client_runtime    
func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167:       
Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: 
User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\"
in API group \"\" in the namespace \"vmware-system-csi\""

Launch Prometheus UI

Access Prometheus UI and view the vSphere Container Storage Plug-in metrics that Prometheus collects.

Procedure

Make Prometheus server UI accessible.
By default, a Prometheus service prometheus-k8s that you deployed in Step b is of ClusterIP type. This means that it is an internal service and not accessible externally.
```
% kubectl get svc prometheus-k8s -n monitoring
NAME            TYPE        CLUSTER-IP      EXTERNAL-IP  PORT(S)            AGE
prometheus-k8s  ClusterIP  10.99.185.136  <none>        9090/TCP,8080/TCP  97m
```
You can use various methods to address this, such as change the service type to NodePort or LoadBalancer if you have one available to provide LoadBalancer IPs.

For the purposes of this testing, port-forward the service and port (9090), and make it accessible from local host.
```
% kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
```
Open a browser on your desktop and connect to http://localhost:9090 to see the Prometheus UI.
View the following metrics.
- vsphere-csi-info exposed by vsphere-csi-controller container from port 2112.
- vsphere-syncer-info exposed by vsphere-syncer container from port 2113.

Create Grafana Dashboard

Launch the Grafana portal and create a dashboard to display metrics of vSphere Container Storage Plug-in.

Procedure

Make Grafana UI accessible.

As with Prometheus, Grafana is deployed as ClusterIP and is not accessible externally.

Use the port-forward functionality to access Grafana from a browser on the local host. This time the port is 3000.

% kubectl get svc grafana -n monitoring
NAME      TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
grafana   ClusterIP   10.97.77.202  <none>        3000/TCP  98m


% kubectl --namespace monitoring port-forward svc/grafana 3000
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000

Access the Grafana UI through http://localhost:3000.
In Grafana UI, set up the dashboard for vSphere Container Storage Plug-in.
You can import sample dashboards from a GitHub location at https://github.com/kubernetes-sigs/vsphere-csi-driver/tree/master/grafana-dashboard.

For information on how to import a Grafana dashboard, see Grafana documentation at Export and import.
Review the Grafana dashboard.
The dashboard similar to the following displays the vSphere Container Storage Plug-in metrics that have been scraped and stored by Prometheus.

Set Up a Prometheus Alert

You can define an alert for vsphere-csi-controller to notify you when something is wrong.

Procedure

Verify that an alert manager has been deployed.

The alert manager is normally deployed in addition to other services when you deploy kube-prometheus.

% kubectl get service -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
alertmanager-main       ClusterIP   10.102.4.82     <none>        9093/TCP,8080/TCP               19h
alertmanager-operated   ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP      19h
blackbox-exporter       ClusterIP   10.98.242.67    <none>        9115/TCP,19115/TCP              19h
grafana                 NodePort    10.102.89.156   <none>        3000:31926/TCP                  19h
kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP               19h
node-exporter           ClusterIP   None            <none>        9100/TCP                        19h
prometheus-adapter      ClusterIP   10.98.39.123    <none>        443/TCP                         19h
prometheus-k8s          NodePort    10.105.37.241   <none>        9090:30091/TCP,8080:32536/TCP   19h
prometheus-operated     ClusterIP   None            <none>        9090/TCP                        19h
prometheus-operator     ClusterIP   None            <none>        8443/TCP                        19h

% kubectl get pod -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   0          19h
alertmanager-main-1                    2/2     Running   0          19h
alertmanager-main-2                    2/2     Running   0          19h
blackbox-exporter-55bb7b586b-dpfxb     3/3     Running   0          19h
grafana-7474bdd9cb-5mklg               1/1     Running   0          19h
kube-state-metrics-c655879df-vg8qc     3/3     Running   0          19h
node-exporter-2bk6j                    2/2     Running   0          19h
node-exporter-fmzhq                    2/2     Running   0          19h
node-exporter-n446g                    2/2     Running   0          19h
node-exporter-q4qk2                    2/2     Running   0          19h
node-exporter-zjkrf                    2/2     Running   0          19h
node-exporter-ztvmm                    2/2     Running   0          19h
prometheus-adapter-6b59dfc556-ks2c9    1/1     Running   0          19h
prometheus-adapter-6b59dfc556-m9rpb    1/1     Running   0          19h
prometheus-k8s-0                       2/2     Running   0          19h
prometheus-k8s-1                       2/2     Running   0          19h
prometheus-operator-7b997546f8-pzs8t   2/2     Running   0          19h

Define an alert for vsphere-csi-controller.

To do this, you need to specify PrometheusRule for vsphere-csi-controller.

The following example shows an alert when the CreateVolume success rate is lower than 95%.

% /kube-prometheus/manifests# cat vsphereCSIController-prometheusRule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app.kubernetes.io/component: vsphere-csi-controller
    app.kubernetes.io/name: vsphere-csi-controller
    app.kubernetes.io/part-of: kube-prometheus
    prometheus: k8s
    role: alert-rules
  name: vsphere-csi-controller-rules
  namespace: monitoring
spec:
  groups:
  - name: vsphere.csi.controller.rules
    rules:
    - alert: CreateVolumeSuccessRateLow
      annotations:
        description: Success rate of CSI volume OP "create-volume" is lower than 95% in last 6 hours.
        summary: Success rate of CSI volume OP "create-volume" is lower than 95% in last 6 hours.   
      expr: sum(rate(vsphere_csi_volume_ops_histogram_count{status="pass", optype="create-volume"}[6h]))/sum (rate(vsphere_csi_volume_ops_histogram_count{optype="create-volume"}[6h]))*100 < 95 
      for: 5m
      labels:
        issue: Success rate of CSI volume OP "create-volume" is lower than 95% in last 6 hours.
        severity: warning