Use the VM Availability dashboard to calculate the availability of the Guest OS. The availability of the Guest OS is calculated because the Guest OS might not be running even when the VM is powered on. There are two layers of Availability, that is, the Consumer layer and the Provider layer. This dashboard covers the Consumer layer. You can view VMs in the selected data center, uptime trend for a selected cluster, and so on.

Design Considerations

The VM Availability dashboard helps you check the availability (uptime in percentage) of VMs, as availability is typically part of the services provided by the IaaS provider.

This dashboard does not check the application uptime because it is possible that the application such as, a database, or a web server, is down while the underlying Windows or Linux is up. Generally, the service provided by the IaaS team is only for Windows or Linux. For information on the application, use the network ping or application-specific agent such as application monitoring.

How to Use the Dashboard

  • In the Datacenters widget, click any data center from the list.
    • To view the overall information, click the vSphere World object.
    • The other widgets are automatically updated once you click any datacenter.
    • Create a filter that reflects your class of service for this widget. Group by the class of services such as gold, silver, and bronze and default the selection to Gold. In this way, the monitoring is not cluttered with less critical workloads and you can focus on the important VMs. You can achieve this by creating a vRealize Operations Cloud custom group for each class of service.
  • The VMs by Uptime in the last 30 days widget displays the average uptime of VMs grouped by their availability. The bucket distribution helps you cater to a wide array of environments. If you are monitoring only production VMs where the uptime is expected to be near 100% all the time, edit the bucket to meet your operational needs.
    • The VMs in the Selected Datacenter widget display all the VMs that are currently deployed to the data center. The average uptime is displayed for the last month. For a production VM, expect this number to be 100% or closer to 100%.
      Note: The Services column will be blank unless Service Discovery is enabled and the services/processes are discovered on a specific virtual machine.
    • The VMs column includes all VMs including the powered off VMs.
  • Click any VM in the VMs by Uptime in the last 30 days widget to view the details in the VM in the Selected VM Powered On Status, Selected VM Uptime Trend, and Selected Cluster Uptime Trend widgets.
    • The Selected VM Uptime Trend widget displays the selected VM’s Guest Tool Uptime (%) across the last 30 days.
  • The Guest OS: Services widget displays the service state over time and the process or services running inside the Guest OS. If Guest OS services or processes are discovered inside a VM, their availability is analyzed. This requires the Service Discovery.
  • The ESXi Host(s) where the VM has run widget displays the historical migration of the VM. This can be useful in determining the cause of a VM downtime.

Points to Note

  • The metric only tracks the availability of VMware Tools and not the entire Guest OS. If VMware Tools is not up, it assumes the Guest OS to be down. You can check that this is not a false negative by adding a few line charts that display the evidence of activity. A good counter is IO counters such as Disk IOPS, Disk Throughput, and Network Transmit Throughput, because IO requires CPU processing. CPU usage is not a reliable counter as the work by VMkernel on the VM is charged to the CPU counters.
  • vRealize Operations Cloud exhibits a new ping adapter. This allows you to enhance the accuracy of the uptime measurement by creating a super metric that adds the ping information or by checking the process using an agent, such as application monitoring.
  • Add a property widget that lists the selected VM properties to give you more context about the VM. In a large environment, the VM name alone might not provide enough context.