The Cluster Capacity dashboard helps you visualize information differently by providing choices for customization. Use this dashboard to highlight the clusters that need attention. The Cluster Capacity dashboard is designed for the Capacity team and not for the Operations team. It provides a long term and a top-down view, enabling the Capacity team to plan the future expansion and refresh of the aging hardware technology.
Contention is included as it directly measures the performance. If your cluster is unable to serve its existing workload, then do not add a new workload. By definition, if the cluster does not have room for a new workload, then its capacity is full. The ideal scenario is that the cluster must run at 100% utilization but 0% contention. In this case, the cluster is productive and your investment is well used.
Utilization is the primary counter for capacity, as it reflects the actual live usage of the resources. When utilization is high, it does not matter if the overcommit ratio is far below your target because the cluster is full. Also, the utilization must not be very low.
- Newly provisioned VM
- Disaster Recovery
- Undersized VM
- Auto-scale VM (a group of web servers behind a Load Balancer)
Reclamation is included as it can impact your decision and the wastage can be common. Capacity can be low, but if you can reclaim a sizeable chunk of wastage, you can defer the purchase of the hardware.
Wastage is displayed by a new color. Dark gray indicates wastage as capacity is not used. The performance problem due to low utilization can be caused by a bottleneck elsewhere.
How to Use the Dashboard
The Cluster Capacity dashboard is layered, gradually providing details as you work top-down in the dashboard.
- The Clusters by Capacity Remaining and Clusters by Time Remaining (days) bar charts summarize the clusters based on capacity remaining and time remaining. Just because you are running low on capacity does not mean you are running out of time.
- The two bar charts work together. The ideal situation is low capacity remaining and high time remaining. This means that your resources are cost effective and are working as expected.
- The three heat maps are Time Remaining, Capacity Remaining, and VM Remaining.
- The cluster size is made constant for ease of use. If your cluster sizes are not standardized, consider using the number of ESXi hosts to display the difference in sizes.
- Clusters Capacity List widget. If any cluster needs attention, then select the cluster to view the related details.
- Utilization is displayed for three months and not one week. The daily average is displayed and not the hourly average and the focus is on RAM consumed and not RAM active.
- Reservation can impact the efficiency of your cluster. If your cluster size varies, complement the reservation number by displaying a relative value.
- Number of VMs is displayed because the newly provisioned VMs might not be active yet. They are often mistaken as idle, as they can remain unused for months. When you view VM increasing but the demand remaining low, it is a sign of potential demand coming up in the future.
- Workload can be low, but is the overcommit ratio high? The newly provisioned VMs tend to be idle for weeks, and suddenly increase. Use the VM Count widget to view if there was recent growth.
- You can check why it is low on capacity. Is it because of real workload or just reservation?
Points to Note
- Add a drill-down to the ESXi Capacity dashboard. A logical place to initiate this drill-down is in the Cluster Capacity List widget. Link this widget into the table of ESXi host in the destination dashboard.
- If you have screen real estate, add a cluster size information. Add cluster size. Small clusters are less efficient from a capacity perspective due to higher overheads and the inability to support larger VMs.
- The peak is defined as the highest among any ESXi hosts. If the peak is higher than the cluster-wide average, then it is unbalanced and is a common reason for suboptimal capacity. You can add a peak to complement the average utilization. Find out the cause of unbalance and optimize it.
- Add peak to complement average utilization. This lets you focus on unbalance, a common reason for suboptimal capacity. Find out the source of unbalance, which can be an opportunity for optimization.
- This dashboard is not designed for the stretched cluster as it requires its own capacity model.