vSphere Availability Dashboard

There are two layers of Availability, that is, the Consumer layer and the Provider layer. The vSphere Availability dashboard covers the Provider layer. This dashboard includes a cluster and not an ESXi host because the cluster is operationally a single compute provider. This dashboard considers the N+1 design, where the cluster can withstand one host failure. Logically, a cluster with fewer hosts has a higher risk.

Design Considerations

The vSphere Availability dashboard helps you analyze and report the uptime, as availability is typically part of the official business SLA. It is also often required in the monthly operational summary report.

This dashboard is not designed for live monitoring of the uptime. A NOC style of dashboard is better suited for those use cases. VMware Tools such as VMware Aria Operations for Logs must be leveraged as the fault is typically preceded with soft errors.

How to Use the Dashboard

The Clusters widget lists all the clusters in the environment. It is sorted by the lowest uptime so that the cluster with the lowest uptime in the last one month is displayed.
- The Running Hosts column is color-coded as logically a smaller cluster has a higher risk. A single host failure results in a relatively higher capacity degradation.
- The vSAN? column is hyper-converged, which means both the compute and the storage part is considered.
- The Admission Control Policy column is based on the Cluster Configuration \ DAS Configuration \ Active property. The mapping between the code to name is:
  - -1 : Disabled
  - 0 : Cluster Resource Percentage
  - 1 : Slot Policy (Powered-on VMs)
  - 2 : Dedicated Failover Hosts
- In a large environment, creating a filter for the list of clusters can make it more manageable. Group by the class of services such as gold, silver, and bronze and default the selection to Gold. In this way, you can easily view your gold clusters.
Click any cluster from the Clusters widget.
- The cluster uptime is automatically plotted in the Selected Cluster Uptime Trend widget. It uses 99%, 99.%, and 99.99% as the threshold for red, orange, and yellow colors respectively.
- The ESXi host details in ESXi in the Selected Cluster widget are automatically updated. For more context, you can add a property widget that lists the selected ESXi host properties.
- In the ESXi in the Selected Cluster widget, the Connected to vCenter and Maintenance State columns are not the average values, as both are string. However, they display the last state in the selected period. This allows you to go back to a specific point in time and view availability at that point.
The Datastores not available widget lists only the datastores with powered off status. This covers both local and shared datastores. To add context, consider adding an extra column such as the data center where it resides, and the datastore types such as NFS and VMFS.
The Port Group Availability widget lists port groups that currently have an uptime of less than 100%. To add context, consider adding an extra column such as the data center where it resides, number of used ports, and the maximum number of ports.

For more context, you can add a property widget that lists the selected object properties. Multiple tables can drive the same property widget, but the object type must be the same.
In a large environment, you can create a filter for this dashboard. Group by the class of services such as gold, silver, and bronze and default the selection to Gold. In this way, the monitoring is not cluttered with less critical workloads.
In the ESXi in the Selected Cluster widget, the Connected to vCenter and Maintenance State columns are not the average values, as both are string. However, they display the last state in the selected period. This allows you to go back to a specific point in time and view availability at that point.

Points to Note

You can add vCenter Server and NSX components availability. This requires the VMware SDDC Health Monitoring Solution.