Use the Troubleshooting tabs to identify the root cause of problems that the system does not resolve by alert recommendations or simple analysis.

To troubleshoot the symptoms of the capacity problems that are occurring on the cluster and host system, and determine when those problems occurred, use the Troubleshooting tabs to investigate the memory problem.

Procedure

  1. From the left menu, click Environment, and then click Object Browser>vSphere Hosts and Clusters and select the object. For example, USA-Cluster.
  2. Click the Alerts tab and review the symptoms.
    The Symptoms tab displays the symptoms that triggered on the selected cluster. You notice that several critical symptoms exist.
    • Cluster Compute Resource Time Remaining with committed projects is critically low
    • Cluster Compute Resource Time Remaining is critically low
    • Capacity remaining is critically low
  3. Investigate the critical symptoms.
    1. Point to each critical symptom to identify the metric used.
    2. To view only the symptoms that affect the cluster, enter cluster in the quick filter text box.
      When you point to Cluster Compute Resource Time Remaining is critically low, the metric Capacity|Time Remaining appears. You notice that its value is less than or equal to zero, which caused the capacity symptom to trigger and generate an alert on the USA-Cluster.
  4. Click the Events > Timeline tab to review the triggered symptoms, alerts, and events that occurred on the USA-Cluster over time, and identify when the problems occurred.
    1. Click the calendar and select Last 7 Days as the range.
      Several events appear in red.
    2. Point to each event to view the details.
    3. To display the events that occurred on the cluster's data center, click View From, and select Datacenter.
      Warning events for the data center appear in yellow.
    4. Point to the warning events.
      You notice that a hard threshold violation occurred on the data center late in the evening. The hard threshold violation shows that the Badge|Workload metric value was under the acceptable value, and that the violation triggered.
    5. To view the affected child objects, click View From and select Host System.
  5. Click the Events tab to examine the changes that occurred on the USA-Cluster, and determine whether a change occurred that contributed to the root cause of the alert or other problems with the cluster.
    1. Review the graph.
      By reviewing the graph, you can determine whether a reoccurring event has caused the errors. Each event indicates that the guest file system is out of disk space. The affected objects appear in the pane following the graph.
    2. Click each red triangle to identify the affected object and highlight it in that pane.
  6. Click the Capacity tab to evaluate details of capacity and time remaining.
  7. Click the All Metrics tab to evaluate the objects in their context in the environment topology to help identify the possible cause of a problem.
    1. In the top view, select USA-Cluster.
    2. In the metrics pane, expand All Metrics > Capacity Analyltics Generated and double-click Capacity Remaining (%).
      The Capacity Remaining (%) calculation appears on the right pane.
    3. In the metrics pane, expand All Metrics > Badge and double-click Workload (%). The Workload (%) calculation appears on the right pane.
    4. On the toolbar, click Date Controls and select Last 7 Days.
      The metric chart indicates that the capacity for the cluster remained at a steady level for the past week, but that the Badge|Workload (%) calculation displays workload extremes.

Results

You have analyzed the symptoms, timeline, events, and metrics related to the problems on your cluster. Through your analysis, you have determined that the heavy workload on the cluster has caused the cluster to start running out of capacity.

What to do next

Examine the Details views and heat maps to interpret the properties, metrics, and alerts. Also, look for trends and spikes that occur in the resources for your objects, the distributions of resources across your objects, and data maps. You can examine the use of various object types across your objects.

Examine the Details views and heat maps to interpret the properties, metrics, and alerts. Also, to look for trends and spikes that occur in the resources for your objects, the distributions of resources across your objects, and data maps. You can examine the use of various object types across your objects. See Examine the Environment Details.