Horizon Storage Performance

This dashboard gives an overall performance of the storage components of Desktop as a Service. It shows performance problems related to storage such as high latency, high outstanding IO, and low utilization. This dashboard is designed for all three roles (Horizon administrator, VMware administrator, and Storage administrator), with the goal of fostering close collaboration among the teams.

This dashboard is a superset of the vSphere Datastore Performance dashboard in terms of functionalities, but only list datastores that are used by Horizon.

Design Consideration

This dashboard combines contention and utilization metrics in one dashboard, but still visually separates them for ease of use. Local datastores are not covered as they are not generally used in Horizon.

How to Use the Dashboard

Review the two Datastore Performance bar charts

The breadth bar chart measures the population and the percentage of VMs affected.
Expect all of them to be in the green range. At the very least, none of them should be in the red.
Selecting one of the bars will reveal the objects within the bucket. Click on the Maximize button in the tool bar of the widget to clearly see the list. You cannot select one of the rows to drive other widget.

Review the Datacenters in Horizon table

Focus on the datacenter with the worst latency. The column is colour coded. If your operations require a different threshold, edit the widget to adjust accordingly.

Select a datacenter from the Datacenters table

The list of shared datastores in the datacenter are shown with their KPIs and the datastore that is unavailable is not shown.

Review the Datastores in selected DC table

Expect all of them to be in the green range. At the very least, none of them should be in the red.
Pay attention to the datastores that are not performing.

Select one of the entries in the table.

Its KPI is shown in the scoreboards. Utilization and Contetion are shown differently.
Read and Write latency is shown separately for a better insight. The nature of read and write problem may not be the same, so it is useful to see the difference.
Its relevant property is shown in the property widget.

Select one or more entries in the scoreboard.

The line chart below the scoreboard plots the selected metrics.
Use the metric chart widget to compare metrics to see if there is any correlation.
You can also stack them. For example, you can combine Read IOPS and Write IOPS to get the Total IOPS. But, you should not combine Read Latency and Write Latency to get total latency as you must consider the read to write ratio.

Points to Note

The vSphere storage is represented as a datastore. The underlying storage protocol can be files (NFS) or blocks (VMFS). vSAN uses VMFS as its consumption layer because it is unique to vSAN, and has its own monitoring need. Latency can happen when IOPS and throughput are not high. When latency occurs, troubleshooting can take a lot of time.
Latency can happen when IOPS and throughput are not high. Observe the logs and queues in the various storage stacks (for example, driver) and monitor their performance.
The datastores that share underlying physical array can experience problems at the same time. The underlying array can experience a hot spot on its own, as it is made of independent magnetic disks or SSDs.
The dashboard does not have datastore clusters. If your environment uses it, add a View List to list them, and have this View List drive the Datastore Performance view list.
If you have many VMs with virtual disks on multiple datastores, add a View List widget to list the individual virtual disks. Use this list to plot the latency of individual virtual disk.