An alert is a record of a known detected problem. Alerts can be raised at Power On System Validation (POSV), at a set interval when the system polls the alert catalog, or when an event is generated. An event is a record of a system condition that is potentially significant or interesting to you, such as a degradation, failure, or user-initiated configuration change.

Table 1. Cloud Foundation Alerts

Alert Name

Short Description

Severity Level

Detected By

BMC_AUTHENTICATION_FAILURE_ALERT

The system is unable to authenticate to the server's out-of-band (OOB) management port.

ERROR

Event

BMC_MANAGEMENT_FAILURE_ALERT

The system failed to perform a management operation using the server's OOB management port.

ERROR

Event

BMC_NOT_REACHABLE_ALERT

The system is unable to communicate with the BMC server's out-of-band (OOB) management port.

ERROR

Event

COORDINATION_SERVICE_DOWN_ALERT

Cannot establish connection with Zookeeper.

ERROR

CPU_CAT_FAILURE_ALERT

A processor has shut down due to a catastrophic error.

ERROR

Event

CPU_EXTRA_ALERT

Mismatch between CPU spec in manifest file and physical CPU inventory reported by HMS. An extra CPU is present in the physical inventory.

WARNING

Event

CPU_INITIALIZATION_ERROR_ALERT

The system detected that a CPU initialization error has occurred.

ERROR

Event

CPU_INVALID_ALERT

The polling detected a type of CPU in the server that does not match what is expected according to the manifest.

ERROR

POSV and system poll

CPU_MACHINE_CHECK_ERROR_ALERT

A server CPU has failed due to CPU Machine Check Error.

ERROR

Event

CPU_POST_FAILURE_ALERT

A server CPU has shut down due to POST failure.

ERROR

Event

CPU_TEMPERATURE_ABOVE_UPPER_THRESHOLD_ALERT

A CPU temperature has reached its maximum safe operating temperature.

WARNING

Event

CPU_TEMPERATURE_BELOW_LOWER_THRESHOLD_ALERT

A CPU temperature has reached its minimum safe operating temperature.

WARNING

Event

CPU_THERMAL_TRIP_ERROR_ALERT

A server CPU has shut down due to thermal error.

ERROR

Yes

CPU_UNDETECTED_ALERT

A CPU matching the manifest was not detected.

ERROR

Event

DIMM_ECC_MEMORY_ERROR_ALERT

The system detected an uncorrectable Error Correction Code (ECC) error for a server's memory.

ERROR

Event

DIMM_TEMPERATURE_ABOVE_THRESHOLD_ALERT

Memory temperature has reached its maximum safe operating temperature.

WARNING

Event

DIMM_THERMAL_TRIP_ALERT

Memory has shut down due to thermal error.

ERROR

Event

EVO_SDDC_BUNDLE_INCOMPLETE_ALERT

The Cloud Foundation ISO file is missing some elements.

CRITICAL

POSV

EVO_SDDC_BUNDLE_INVALID_ALERT

MD5 checksum generated on the Cloud Foundation ISO bundle does not match the MD5 checksum provided by VIA in the vSAN datastore.

CRITICAL

POSV

EVO_SDDC_BUNDLE_MISSING_ALERT​

The Cloud Foundation bundle ISO file or MD5checksum file is missing.

CRITICAL

POSV

HDD_DOWN_ALERT

Operational status is down for an HDD.

ERROR

Event

HDD_EXCESSIVE_READ_ERRORS_ALERT

Excessive read errors reported for an HDD.

WARNING

Event

HDD_EXCESSIVE_WRITE_ERRORS_ALERT

Excessive write errors reported for an HDD

WARNING

Event

HDD_EXTRA_ALERT

Additional HDD detected that does not match the manifest.

WARNING

POSV and system poll

HDD_INVALID_ALERT

Detected HDD does not match the manifest.

ERROR

POSV and system poll

HDD_TEMPERATURE_ABOVE_THRESHOLD_ALERT

HDD temperature has reached its maximum safe operating temperature.

WARNING

Event

HDD_UNDETECTED_ALERT

HDD matching the manifest was not detected.

ERROR

POSV and system poll

HDD_WEAROUT_ABOVE_THRESHOLD_ALERT

Wear-out state of an HDD is above its defined threshold.

WARNING

Event

HMS_AGENT_DOWN_ALERT

A physical rack's Hardware Management Services agent is down.

CRITICAL

POSV

HMS_DOWN_ALERT

The HMS is down.

CRITICAL

POSV and event

HOST_AGENT_NOT_ALIVE_ALERT

ESXi on a server in a physical rack is not running.

POSV

MANAGEMENT_SWITCH_DOWN_ALERT

Operational status is down for a physical rack's management switch.

WARNING

POSV, system poll, and event

MANAGEMENT_SWITCH_EXTRA_ALERT

Additional management switch detected that does not match the manifest.

WARNING

POSV, system poll, and event

MANAGEMENT_SWITCH_INVALID_ALERT

Detected management switch does not match the manifest.

CRITICAL

POSV, system poll, and event

MANAGEMENT_SWITCH_PORT_DOWN_ALERT

Operational status is down for a port in a physical rack's management switch.

WARNING

Event

MEMORY_EXTRA_ALERT

Detected additional memory that does not match the manifest.

WARNING

POSV and system poll

MEMORY_INVALID_ALERT

Detected memory type does not match manifest.

ERROR

POSV and system poll

MEMORY_UNDETECTED_ALERT

Memory matching the manifest was not detected.

ERROR

POSV and system poll

PCH_TEMPERATURE_ABOVE_THRESHOLD_ALERT

Platform controller hub (PCH) temperature has reached its maximum safe operating temperature.

WARNING

Event

SERVER_DOWN_ALERT

Server is in the powered-down state.

ERROR

POSV and system poll

SERVER_EXTRA_ALERT

Detected additional server that does not match the manifest.

WARNING

POSV and system poll

SERVER_INVALID_ALERT

Detected server does not match the manifest.

ERROR

POSV and system poll

SERVER_PCIE_ERROR_ALERT

A server's system has PCIe errors.

ERROR

Event

SERVER_POST_ERROR_ALERT

A server has POST failures

ERROR

Event

SERVER_UNDETECTED_ALERT

Server matching the manifest as not detected.

ERROR

POSV and system poll

SPINE_SWITCH_DOWN_ALERT

Operational status is down for a physical rack's spine switch.

ERROR

POSV, system poll, and event

SPINE_SWITCH_EXTRA_ALERT

Detected spine switch does not match the manifest.

WARNING

POSV and system poll

SPINE_SWITCH_INVALID_ALERT

Detected spine switch does not match the manifest.

ERROR

POSV and system poll

SPINE_SWITCH_PORT_DOWN_ALERT

Operational status is down for a port: in a physical rack's spine switch.

WARNING

Event

SSD_DOWN_ALERT

Operational status is down for an SSD.

ERROR

Event

SSD_EXCESSIVE_READ_ERRORS_ALERT

Excessive read errors reported for an SSD.

WARNING

Event

SSD_EXCESSIVE_WRITE_ERRORS_ALERT

Excessive write errors reported for an SSD.

WARNING

Event

SSD_EXTRA_ALERT

Detected additional SSD that does not match the manifest.

WARNING

POSV and system poll

SSD_INVALID_ALERT

Detected SSD does not match the manifest.

ERROR

POSV and system poll

SSD_TEMPERATURE_ABOVE_THRESHOLD_ALERT

SSD temperature has reached its maximum safe operating temperature

WARNING

Event

SSD_UNDETECTED_ALERT

SSD matching the manifest was not detected.

ERROR

POSV and system poll

SSD_WEAROUT_ABOVE_THRESHOLD_ALERT

Wear-out state of an SSD is above its defined threshold.

WARNING

Event

STORAGE_CONTROLLER_DOWN_ALERT

Operational status is down for a storage adapter.

ERROR

Event

TOR_SWITCH_DOWN_ALERT

Operational status is down for a physical rack's ToR switch.

ERROR

POSV and system poll

TOR_SWITCH_EXTRA_ALERT

Detected extra ToR switch that does not match the manifest.

WARNING

POSV and system poll

TOR_SWITCH_INVALID_ALERT

Detected ToR switch does not match the manifest.

ERROR

POSV and system poll

TOR_SWITCH_PORT_DOWN_ALERT

Operational status is down for a port in a physical rack's ToR switch.

WARNING

Event