Host System Alert Definitions

The vCenter adapter provides alert definitions that generate alerts on the Host System objects in your environment.

Health/Symptom-Based

These alert definitions have the following impact and criticality information.

Impact: Standalone host has CPU contention caused by overpopulation of virtual machines.
Health

Criticality: Symptom-based


Alert Definition	Symptoms	Recommendations
Standalone host has CPU contention caused by less than half of the virtual machines.	Symptoms include the following: Host inside a cluster Host CPU contention is at warning/immediate/critical level > 0 child virtual machines have [Virtual machine CPU demand at warning /immediate/critical level] <= 50% of child virtual machines have [Virtual machine CPU demand at warning/ immediate/critical level]	Use Add the host to a fully-automated-DRS cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Standalone host has CPU contention caused by more than half of the virtual machines.	Symptoms include the following: Host inside a cluster Host CPU contention is at warning/immediate/critical level Host CPU demand at warning/immediate/critical level > 50% of child virtual machines have [Virtual machine CPU demand at warning/ immediate/critical level]	Add the host to a fully-automated-DRS cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Standalone host has CPU contention caused by overpopulation of virtual machines.	Symptoms include the following: Host inside a cluster Host CPU contention is at warning/immediate/critical level Host CPU demand at warning/immediate/critical level = 0 child virtual machines have [Virtual machine CPU demand at warning/ immediate/critical level]	Add the host to a fully-automated-DRS cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Host in a cluster that does not have fully-automated DRS enabled has contention caused by less than half of the virtual machines.	Symptoms include the following: Host inside a cluster [ DRS Enabled OR ! DRS fully automated ] Host CPU contention is at warning/immediate/critical level > 0 child virtual machines have [Virtual machine CPU demand at warning /immediate/critical level] <= 50% of child virtual machines have [Virtual machine CPU demand at warning /immediate/critical level]	Enable fully-automated DRS in the cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Host in a cluster that does not have fully-automated DRS enabled has CPU contention caused by more than half of the virtual machines.	Symptoms include the following: Host inside a cluster [ DRS Enabled OR ! DRS fully automated] Host CPU contention at warning/immediate/critical level Host CPU demand at warning/immediate/critical level > 50% of child virtual machines have [Virtual machine CPU demand at warning /immediate/critical level]	Enable fully-automated DRS in the cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Host in a cluster that does not have fully-automated DRS enabled has CPU contention caused by overpopulation of virtual machines.	Symptoms include the following: Host inside a cluster [ DRS Enabled OR ! DRS fully automated] Host CPU contention at warning/immediate/critical level Host CPU demand at warning/immediate/critical level = 0 child virtual machines have [Virtual machine CPU demand at warning /immediate/critical level	Enable fully-automated DRS in the cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Standalone host has memory contention caused by less than half of the virtual machines.	Symptoms include the following: Host inside a cluster Host memory workload at warning/immediate/critical level Host memory contention at warning/immediate/critical level > 50% of child virtual machines have [Virtual machine memory workload at warning /immediate/critical level]	Add the host to a fully-automated-DRS cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Upgrade the host to use a host that has larger memory capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Standalone host has memory contention caused by more than half of the virtual machines.	Symptoms include the following: Host inside a cluster Host memory workload at warning/immediate/critical level Host memory contention at warning/immediate/critical level > 50% of child virtual machines have [Virtual machine memory workload at warning /immediate/critical level]	Add the host to a fully-automated-DRS cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Upgrade the host to use a host that has larger memory capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Standalone host has memory contention caused by overpopulation of virtual machines.	Symptoms include the following: Host inside a cluster Host memory workload at warning/immediate/critical level Host memory contention at warning/immediate/critical level = 0 child virtual machines have [Virtual machine memory workload at warning/ immediate/critical level]	Add the host to a fully-automated-DRS cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Upgrade the host to use a host that has larger memory capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Host in a cluster that does not have fully-automated DRS enabled has memory contention caused by less than half of the virtual machines.	Symptoms include the following: [DRS Enabled OR ! DRS fully automated] Host memory contention at warning/immediate/critical level > 0 child virtual machines have [Virtual machine memory workload at warning/ immediate/critical level] <= 50% of child virtual machines have [Virtual machine memory workload at warning/ immediate/critical level]	Enable fully-automated DRS in the cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Host in a cluster that does not have fully-automated DRS enabled has memory contention caused by more than half of the virtual machines.	Symptoms include the following: Host inside a cluster [DRS Enabled OR ! DRS fully automated] Host memory workload at warning/immediate/critical level Host memory contention at warning/immediate/critical level > 50% of child virtual machines have [Virtual machine memory workload at warning /immediate/critical level]	Enable fully-automated DRS in the cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Upgrade the host to use a host that has larger memory capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Host in a cluster that does not have fully-automated DRS enabled has memory contention caused by overpopulation of virtual machines.	Symptoms include the following: Host inside a cluster [DRS Enabled OR ! DRS fully automated] Host memory workload at warning/immediate/critical level Host memory contention at warning/immediate/critical level = 0 child virtual machines have [Virtual machine memory workload at warning /immediate/critical level]	Enable fully-automated DRS in the cluster to allow vSphere to move virtual machine as needed when resources are available on other hosts in the cluster. Use vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity. Upgrade the host to use a host that has larger memory capacity. Right-size large virtual machines as it helps in reducing overall resource contention. Use the Reclaimable Capacity feature within VMware Aria Operations for recommended rightsizing of VMs.
Host is experiencing high number of packets dropped.	Symptoms include the following: Host network received packets dropped Host network transmitted packets dropped	Reduce the amount of network traffic being generated by virtual machines by moving some of them to a host with lower network traffic. Verify the health of the physical network adapter, configuration, driver and firmware versions.
ESXi host has detected a link status 'flapping' on a physical NIC.	Physical NIC link state flapping (fault symptom).	ESXi disables the device to avoid the link flapping state. You might need to replace the physical NIC. The alert will be canceled when the NIC is repaired and functioning. If you replace the physical NIC, you might need to manually cancel the alert.
ESXi host has detected a link status down on a physical NIC.	Physical NIC link state down (fault symptom).	ESXi disables the device to avoid the link flapping state. You might need to replace the physical NIC. The alert will be canceled when the NIC is repaired and functioning. If you replace the physical NIC, you might need to manually cancel the alert.
Battery sensors are reporting problems.	Symptoms include the following: Battery sensor health is red OR Battery sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Baseboard Management Controller sensors are reporting problems.	Symptoms include the following: Baseboard Management Controller sensor health is red OR Baseboard Management Controller sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Fan sensors are reporting problems.	Fan sensor health is red OR Fan sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Hardware sensors are reporting problems.	Hardware sensor health is red OR Hardware sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Memory sensors are reporting problems.	Memory sensor health is red OR Memory sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Path redundancy to storage device degraded	A path to storage device went down Host has no redundancy to storage device	See KB topic, Path redundancy to the storage device is degraded (1009555)
Power sensors are reporting problems.	Power sensor health is red OR Power sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Processor sensors are reporting problems.	Processor sensor health is red Processor sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
IPMI System Event Log for the host is becoming full.	SEL sensor health is red OR SEL sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Storage sensors are reporting problems.	Storage sensor health is red OR Storage sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
System Board sensors are reporting problems.	System board sensor health is red OR System board sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Temperature sensors are reporting problems.	Temperature sensor health is red OR Temperature sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.
Voltage sensors are reporting problems.	Voltage sensor health is red OR Voltage sensor health is yellow	Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Health/Critical

These alert definitions have the following impact and criticality information.

Impact: Health

Criticality: Critical


Alert Definition	Symptoms	Recommendations
Host has lost connection to vCenter.	Host disconnected from vCenter	Click "Open Host in vSphere Web Client" in the Actions menu at the top of Alert details page to connect to the vCenter managing this host and manually reconnect the host to vCenter Server. After the connection to the host is restored by vCenter Server, the alert will be canceled.
vSphere High Availability (HA) has detected a network-isolated host.	vSphere HA detected a network isolated host (fault symptom).	Resolve the networking problem that prevents the host from pinging its isolation addresses and communicating with other hosts. Make sure that the management networks that vSphere HA uses include redundancy. With redundancy, vSphere HA can communicate over more than one path, which reduces the chance of a host becoming isolated.
vSphere High Availability (HA) has detected a possible host failure.	vSphere HA detected a host failure (fault symptom).	Find the computer that has the duplicate IP address and reconfigure it to have a different IP address. This fault is cleared and the alert canceled when the underlying problem is resolved, and the vSphere HA primary agent is able to connect to the HA agent on the host. Note: You can use the Duplicate IP warning in the / var/log/vmkernel log file on an ESX host or the /var/log/messages log file on an ESXi host to identify the computer that has the duplicate IP address.
The host has lost connectivity to a dvPort.	Lost network connectivity to dvPorts (fault symptom).	Replace the physical adapter or reset the physical switch. The alert will be canceled when connectivity is restored to the dvPort.
The host has lost connectivity to the physical network.	Lost network connectivity (fault symptom).	To determine the actual failure or to eliminate possible problems, check the status of the vmnic in the vSphere Client or from the ESX service console: To check the status in the vSphere Client, select the ESX host, click Configuration, and then click Networking. The vmnics currently assigned to virtual switches appear in the diagrams. If a vmnic displays a red X, that link is currently down. From the service console, run the command:esxcfg-nics. The output that appears is similar to the following: Name PCI Driver Link Speed Duplex Description ------------------------------------------------------------------ vmnic0 04:04.00 tg3 Up 1000Mbps Full Broadcom BCM5780 Gigabit Ethernet vmnic1 04:04.01 tg3 Up 1000Mbps Full Broadcom BCM5780 Gigabit Ethernet. The Link column shows the status of the link between the network adapter and the physical switch. The status can be either Up or Down. If some network adapters are up and others are down, you might need to verify that the adapters are connected to the intended physical switch ports. To verify the connections, bring down each ESX host port on the physical switch, run esxcfg-nics -l", and observe the affected vmnics. Verify that the vmnic identified in the alert is still connected to the switch and configured properly: Make sure that the network cable is still connected to the switch and to the host. Make sure that the switch is connected to the system, is still functioning properly, and has not been inadvertently misconfigured. For more information, see the switch documentation. Check for activity between the physical switch and the vmnic. You can check activity by performing a network trace or observing activity LEDs. Check for network port settings on the physical switch. To reconfigure the service console IP address if the affected vmnic is associated with a service console, see http://kb.vmware.com/kb/1000258 If the problem is caused by your hardware, contact your hardware vendor for replacement hardware.
The host lost connectivity to a Network File System (NFS) server.	Lost connection to NFS server (fault symptom).	Verify the NFS server is running. Check the network connection to make sure the ESX host can connect to the NFS server. Determine whether the other hosts that use the same NFS mount are experiencing the same problem, and check the NFS server status and share points. Make sure that you can reach the NFS server by logging into the service console and using vmkping to ping the NFS server: "vmkping <nfs server>". For advanced troubleshooting information, see http://kb.vmware.com/kb/1003967
A fatal error occurred on a PCIe bus during system reboot.	A fatal PCIe error occurred.	Check and replace the PCIe device identified in the alert as the cause of the problem. Contact the vendor for assistance.
A fatal memory error was detected at system boot time.	A fatal memory error occurred.	Replace the faulty memory or contact the vendor.

Health/Immediate

These alert definitions have the following impact and criticality information.

Impact: Health

Criticality: Immediate


Alert Definition	Symptom	Recommendations
The host has lost redundant connectivity to a dvPort.	Lost network redundancy to DVPorts (fault symptom).	Replace the physical adapter or reset the physical switch. The alert will be canceled when connectivity is restored to the DVPort.
The host has lost redundant uplinks to the network.	Lost network redundancy (fault symptom).	To determine the actual failure or to eliminate possible problems, first connect to ESX through SSH or the console: Identify the available uplinks by running esxcfg-nics -l. Remove the reported vmnic from the port groups by running esxcfg-vswitch -U <affected vmnic#> affected vSwitch. Link available uplinks to the affected port groups by running esxcfg-vswitch -L <available vmnic#> affected vSwitch. Next, check the status of the vmnic in vSphere Client or the ESX service console: In vSphere Client, select the ESX host, click Configuration, and then click Networking. The vmnics currently assigned to virtual switches appear in the diagrams. If a vmnic displays a red X, that link is currently unavailable. From the service console, run esxcfg-nics -l. The output that appears is similar to the following example: Name PCI Driver Link Speed Duplex Description. ------------------------------------------------------------------ vmnic0 04:04.00 tg3 Up 1000Mbps Full Broadcom BCM5780 Gigabit Ethernet vmnic1 04:04.01 tg3 Up 1000Mbps Full Broadcom BCM5780 Gigabit Ethernet. The Link column shows the status of the link between the network adapter and the physical switch. The status can be either Up or Down. If some network adapters are up and others are down, you might need to verify that the adapters are connected to the intended physical switch ports. To verify the connections, shut down each ESX host port on the physical switch, run the "esxcfg-nics -l" command, and observe the affected vmnics. Verify that the vmnic identified in the alert is still connected to the switch and configured properly: Make sure that the network cable is still connected to the switch and to the host. Make sure that the switch is connected to the system, is still functioning properly, and was not inadvertently misconfigured. (See the switch documentation.) Perform a network trace or observe activity LEDs to check for activity between the physical switch and the vmnic. Check for network port settings on the physical switch. If the problem is caused by hardware, contact your hardware vendor for a hardware replacement.
A PCIe error occurred during system boot, but the error is recoverable.	A recoverable PCIe error occurred.	The PCIe error is recoverable, but the system behavior is dependent on how the error is handled by the OEM vendor's firmware. Contact the vendor for assistance.
A recoverable memory error has occurred on the host.	A recoverable memory error occurred.	Since recoverable memory errors are vendor-specific, contact the vendor for assistance.

Risk/Symptom-Based

These alert definitions have the following impact and criticality information.

Impact: Risk

Criticality: Symptom-based


Alert Definition	Symptom	Recommendations
ESXi Host is violating vSphere 5.5 Hardening Guide.	Active directory authentication disabled OR Non-compliant NTP service startup policy OR SSH service is running OR NTP service stopped OR Non-compliant timeout value for automatically disabling local and remote shell access OR vSphere Authentication Proxy not used for password protection when adding ESXi hosts to active directory OR Persistent logging disabled OR Bidirectional CHAP for iSCSI traffic disabled OR Non-compliant firewall setting to restrict access to NTP client OR NTP server for time synchronization not configured OR Non-compliant ESXi Shell service startup policy OR Non-compliant firewall setting to restrict access to SNMP server OR ESXi Shell service is running OR Non-compliant DCUI service startup policy OR Dvfilter bind IP address configured OR Non-compliant SSH service startup policy OR DCUI service is running OR Non-compliant idle time before an interactive shell is automatically logged out OR Non-compliant DCUI access user list OR Remote syslog is not enabled	Fix the vSphere 5.5 Hardening Guide Rules Violations according to the recommendations in the vSphere5 Hardening Guide