The logical design provides a high-level overview of the Health Reporting and Monitoring for VMware Cloud Foundation validated solution.

Logical Design

The design consists of a host virtual machine deployed in the management domain of your VMware Cloud Foundation instance, that hosts the PowerShell module for VMware Cloud Foundation Reporting and the Python Module for VMware Cloud Foundation Health Monitoring in VMware Aria Operations. The host virtual machine uses the two modules to periodically connect to the VMware Cloud Foundation Support and Serviceability (SoS) utility and SDDC component APIs to collect health metrics, generate HTML reports, and send the data to VMware Aria Operations. This data is then presented through custom VMware Aria Operations dashboards to provide active health monitoring of your VMware Cloud Foundation instance.

Figure 1. Logical Design of Health Reporting and Monitoring for VMware Cloud Foundation
The host virtual machine is deployed in the management domain. The PowerShell Module for VMware Cloud Foundation Reporting and the Python Module for VMware Cloud Foundation Health Monitoring in VMware Aria Operations are installed with dependencies on the host virtual machine. In an environment with multiple VMware Cloud Foundation instances, multiple copies of the Python module are installed, each corresponding to a VMware Cloud Foundation instance. The PowerShell module connects to SDDC Manager to gather health data on SDDC components. The Python module integrates with VMware Aria Operations and uses the health data to populate custom dashboards, and generate alerts and notifications with remediation steps.

PowerShell Module for VMware Cloud Foundation Reporting

The PowerShell Module for VMware Cloud Foundation Reporting is an open-source PowerShell module that ships with a library of cmdlets that connect to SDDC management components, collect health data, and publish that data in different formats. The cmdlet library contains combined operation, health check, system alert, configuration, and system overview functions. These functions provide insight to the operational state of your VMware Cloud Foundation instance.

The PowerShell module uses the VMware Cloud Foundation Support and Serviceability (SOS) utility as well as SDDC component APIs to collect and publish health data for SDDC Manager, vCenter Server, vSAN, NSX, and VMware Aria Suite Lifecycle. The PowerShell module collects storage, networking, configuration, and security data. You install and configure the PowerShell module on the host virtual machine.

The PowerShell module can generate the following reports:

  • System overview report
  • Health report
  • Alert report
  • Configuration report
  • Upgrade precheck report

Python Module for VMware Cloud Foundation Health Monitoring in VMware Aria Operations

The Python Module for VMware Cloud Foundation Health Monitoring in VMware Aria Operations is an open-source collection of python scripts and VMware Aria Operations artifacts. It uses the VMware Cloud Foundation Supportability and Serviceability (SOS) utility and the supporting PowerShell modules to collect health data for a VMware Cloud Foundation instance and then send this data to objects in VMware Aria Operations as custom metrics for use in dashboards to monitor the platform's health. This enables the creation and configuration of custom dashboards, alerts, notification, and remediation in VMware Aria Operations. You install and configure the Python module on the host virtual machine.

The Python module includes predefined custom VMware Aria Operations dashboards in the VCF Health dashboard group that cover individual component health metrics and an aggregated single pane of glass rollup dashboard.

Table 1. Python Module for VMware Cloud Foundation Health Monitoring Predefined VMware Aria Operations Dashboards

Dashboard

Description

VCF Health Rollup

Rollup for all individual dashboards for VCF Health.

VCF Backup Health

Displays the backup status for SDDC Manager, vCenter Servers, and NSX Local Managers.

VCF Certificate Health

Displays the component certificates are valid (within the expiry date).

VCF Compute Health

Displays ESXi health, including host licenses, disk storage, disk partitions, core dumps, free pool, and overall health status. Shows overall health of vCenter Server instances.

VCF Connectivity Health

Displays connectivity health which verifies the connection between SDDC Manager and the underlying components of VMware Cloud Foundation. Includes Ping, SSH connectivity, and API connectivity health checks for SDDC components.

VCF DNS Health

Displays the Forward and Reverse DNS health summary.

VCF Hardware Compatibility

Displays the data from the Hardware Compatibility check which validates ESXi hosts and vSAN devices.

VCF Networking Health

Displays the health of Local NSX Managers, Edge Clusters, Edge Nodes, Transport Nodes, Transport Node Tunnels and Tier-0 Gateway BGP connections.

VCF NTP Health

Displays the NTP health which​ verifies that components have their time synchronized with the NTP server used by SDDC Manager. It also ensures that the hardware and software time stamp of ESXi hosts are within 5 minutes of the SDDC Manager appliance.

VCF Password Health

Displays the password health checking for expiry across the VMware Cloud Foundation instance.

VCF SDDC Manager and vCenter Services Health

Displays service health for services running within SDDC Manager and vCenter Server.

VCF Snapshot Health

Displays the snapshot status for SDDC Manager, vCenter Servers, and NSX Local Managers.

VCF Storage Health

Displays disk capacity health for SDDC Manager, vCenter Servers, ESXi hosts, and datastores. Also displays VMs with Connected CD-ROMs.

VCF vSAN Health

Displays vSAN health across ESXi hosts and vSphere clusters.

VCF Version Health

Displays the component version and compares it with SDDC Manager inventory, the actual installed Bill of Materials (BoM) component version, and the BoM component versions to detect any drift.

Table 2. Logical Components for Health Reporting and Monitoring

Single VMware Cloud Foundation Instance with a Single Availability Zone

Single VMware Cloud Foundation Instance with Multiple Availability Zones

Multiple VMware Cloud Foundation Instances

A host virtual machine is deployed on the management VLAN in the management domain.

  • A host virtual machine is deployed on the management VLAN in the management domain.
  • A vSphere Distributed Resource Scheduler VM/Host rule ensures that the host virtual machine is running on an ESXi host group in the first availability zone of the management domain.
In the first VMware Cloud Foundation instance, a host virtual machine is deployed on the management VLAN in the management domain.