Diagnostics for VMware Cloud Foundation is a centralized platform that monitors the overall operational status of the VMware Cloud Foundation software stack.

It is a self-service platform that helps you analyze and troubleshoot the components of VMware Cloud Foundation, including vCenter, ESXi, vSAN, capabilities such as vSphere vMotion, snapshots, VM provisioning, and other issues including security advisories and certificates. As an Infrastructure admin, you can monitor the operational state of your environment using diagnostics findings and custom dashboards. The built-in dashboards are an extension to native VMware Aria Operations dashboards. Diagnostics validates if your environment is up-to-date with the important VMware Security Advisories.

With Diagnostics, you can address issues or vulnerabilities related to certificates such as expired SSL certificates.

Diagnostics also provides relevant information in self-help flows for vCenter capabilities such as vSphere vMotion to help you diagnose migration issues.

Key Benefits:
  1. Ensures platform availability by proactively identifying and diagnosing operational issues.

  2. Preserves the security posture of your environment.

  3. Provides built-in known issue detection with remediation guidance and links to supporting Knowledge Base articles.

  4. Self-service improves the time to understand the cause of an issue and determine the next steps for your VMware software environment.

  5. Quick identification of the cause and remediation options helps your business run with less disruption.

Diagnostics monitors the operational state of the following components, which also reports findings indicating the occurrence of a known issue and provides recommendations:
  • vCenter: vCenter Operational State: Ping reachability
  • ESXi Operational State: Connectivity from vCenter
  • VMware vSAN Operational state: Disk group status, Physical disks status
  • General issues: Certificate expiry
Diagnostics includes the following set of capabilities that help understand the operational state of the VMware Cloud Foundation software stack:
  • Workload Provisioning (VMs): VM Provisioning requests and failures, provisioning findings and recommendations, and general troubleshooting.
  • vSphere vMotion : Successful and failed vMotion, findings and recommendations, and general troubleshooting.
  • Snapshots: Snapshot failures, findings and recommendations, and general troubleshooting.

The Diagnostics for VMware Cloud Foundation uses interactive cards to display data. Click View Details or View Dashboard for more information. See the topic Working with VMware Cloud Foundation Diagnostics.

Diagnostics cards with content appear if your environment has VMware Aria Operations integrations configured. To configure the VMware Aria Operations integrations , follow the instructions in Setting up VMware Cloud Foundation Diagnostics.

How Diagnostics for VMware Cloud Foundation works

Diagnostics for VMware Cloud Foundation consolidates signatures from VMware Skyline Advisor and VMware Skyline Health Diagnostics, and integrates VMware Aria Operations for Logs to provide a single pane for monitoring and troubleshooting. The issues that Diagnostics detects are called findings. Diagnostics scans system properties and product logs, and displays findings that you can act upon. You review the findings and decide on the next steps appropriate for your VMware software environment. Findings are different from reports on the operational state of a system, such as connectivity, services status or interface issues. Property-based findings inform you about issues that might affect your environment. Log-based findings inform you if an issue has already affected your system. Diagnostics 5.2 works with more than 300 property and log-based rules. You can see a list of all signatures here.

When you experience an issue in your environment, you can initiate a log scan within Diagnostics which uses existing signatures to detect issues. When a signature matches the information in the log files, findings are displayed. The finding contains information about the matching signature and remediation steps or a Knowledge Base article to help resolve the issues.

Proactive findings are based on rules that inspect system properties using APIs. These rules are run automatically every four hours. To detect issues that have already occurred in your environment, you can initiate a log scan by clicking Refresh Findings. To run a log scan, you must have VMware Aria Operations for Logs installed and integrated in your environment. See, Setting up Diagnostics for VMware Cloud Foundation.

Architecture Diagram and Data Flow of Diagnostics for VMware Cloud Foundation

Architecture Diagram and Data Flow of Diagnostics for VMware Cloud Foundation

See VMware Aria Operations release notes to get updates for the management packs.

How you discover data in the Diagnostics dashboard

Your self-help flow on Diagnostics can start either from one of the cards or from Overall Findings. The starting point for troubleshooting depends on how the issue is reported or identified. You might see triggers for investigation and corrective action in the Diagnostics dashboard or receive a report from an external source, such as an issue reported from an end-user who is not an infrastructure administrator.

Look at the Diagnostics Card Self-help Flow and Diagnostics Findings Self-help Flow flowcharts for more details.

Diagnostics Card Self-help Flow

Diagnostics Card Self-help Flowchart

Diagnostics Findings Self-help FlowDiagnostics Findings Self-help Flowchart

Diagnostics Rules

Log-based rules detect an actual occurrence of an known issue while property-based rules state that the issue is present in the build and could occur. All these rules relate to specific build numbers. Depending on the findings, you can decide if corrective actions are applicable to your environment and how urgent it is to apply such actions. In some cases, the corrective action is to apply a patch or upgrade, which requires planning, and you cannot immediately apply the recommendations. Diagnostics categorizes findings by severity, component, type, and capability, and reports the number of affected objects for each finding which helps in assessing the impact of findings prioritizing their remediation.

While investigating an issue, logging into a vCenter instance might help you get more details, but is not a required step. In cases when logging into a vCenter instance does not provide sufficient details, refreshing the log-based findings in Diagnostics might provide more relevant data. In case you cannot identify a possible cause for the issue in your environment in the Diagnostics findings page, you can refer to the Troubleshooting Guide, which lists knowledge base articles specific for the conditions that each of the Diagnostics cards reflects and provides steps for resolving such conditions. The Troubleshooting Guide link is available in the Workload Provisioning, vMotion, and Snapshots cards, but includes troubleshooting information for all components that Diagnostics monitors. If the issue relates to a vMotion failure, you can use the log analysis function in VMware Aria Operations to retrieve the log statements. You can look up the operation ID on the vMotion detail page.

What To Do When a Log Scan Fails

When a log scan fails due to long duration for the VMware Aria Operations for Logs instance, refer to this troubleshooting Knowledge Base article for resolution.