The Cluster Capacity dashboard includes the ESXi host and resource pools as they impact cluster capacity.
Design Considerations
See Capacity Dashboards for common design considerations among all the dashboards for capacity management.
How to Use the Dashboard
The Cluster Capacity dashboard is layered, gradually providing details as you work top-down in the dashboard.
- The three bar charts which are Clusters by Capacity Remaining, Clusters by Time Remaining, Clusters by VM Remaining, summarize the overall situation. The first two charts can be used together to identify when you need to add capacity to address growth. Time remaining uses historical growth in a cluster to forecast when more capacity is needed. This allows you to operate more efficiently by making sure you have enough capacity currently and proactively plan for adding capacity. The third bar chart which is Clusters by VM Remaining, provides complete contexts, as different clusters can have different VM sizes.
For a large environment, a heat map is helpful. The three heat maps are Time Remaining, Capacity Remaining, and VM Remaining. If your cluster sizes are not standardized, create another heat map, and use the number of ESXi hosts to show the size difference.
The Clusters Capacity widget provides a table with details. The number of ESXi hosts are color coded as smaller clusters have a relatively higher overhead. Select a cluster from the table to view the capacity details that are automatically displayed.
Performance
Ensure that the performance of the cluster meets your SLAs.
Utilization
The next two charts are Memory Workload (%) and CPU Workload (%), that show values relative to your usable capacity. Utilization is displayed for three months and not one week. The daily average is displayed and not the hourly average, so you can focus on the overall trend. For memory, the focus is on consumed memory and not active memory.Allocation
You can view the trend of the three which are CPU, disk, and memory components together on the Overcommit Ratio chart. In general, your CPU overcommit should be the highest, followed by the disk (because of thin provision). Memory overcommit tends to be near one due to its nature as cache.
Use the line chart in the Allocation widget, to see the trend. The data is averaged hourly.
In the VM Count widget, the trend line of the number of VMs over time is important to spot if there are many newly provisioned VM. If you see that the VMs are increasing but demand remains low, it indicates a sign of potential demand in the future.
ReservationReservation can impact the efficiency of your cluster. Your cluster could be low on capacity because of real workload or just reservation. If your cluster size varies, complement the reservation number by showing a relative value. Once you have a standardized number, you can visualize them on a heat map.
- ESXi Analysis
Good cluster capacity does not indicate that there is no issue at the ESXi level. Unbalance is a common problem, especially in large clusters and stretched clusters.
The ESXi Hosts in a Cluster table displays all the member ESXi hosts. You can see the unbalance clearly, thanks to the color code. The color code reflects the unbalance.
The 99th percentile Performance column takes the 99th percentile value of the ESXi Performance (%) metric.
Select an ESXI host to view the details. Both the CPU Workload (%) and the Memory Workload (%) trend line charts display if there is a steady demand, cyclical demand, rising demand, or declining demand. The trend is as important as the present value. View trends over a longer time. Utilization is displayed for three months and not one week. The daily average is displayed and not the hourly average. The focus is on memory consumed and not memory active. Memory consumed includes the total memory consumed, so it includes the memory consumed by VMkernel. Both total and usable utilization in terms of memory and CPU are displayed and provides the absolute amount of capacity.
- VM Analysis
Use the VMs in the selected Cluster or Host table to analyze the cause of the low capacity remaining and which VMs are impacting the infrastructure resources, such as, CPU, memory, and disk space. The table lists either the VMs in the cluster or host. When you select one of the VMs, additional relevant information is displayed.
If there are many large VMs running low on capacity you can stop provisioning until you upsize the existing VMs first.