The vCenter adapter provides alert definitions that generate alerts on the Host System objects in your environment..

Health/Symptom-Based

These alert definitions have the following impact and criticality information.

Impact

Health

Criticality

Symptom-based

Alert Definition

Symptoms

Recommendations

Host has CPU contention caused by less than half of the virtual machines.

Symptoms include all of the following:

  • ! Host inside a cluster

  • Host CPU contention is at warning/immediate/critical level

  • > 0 child virtual machines have [ Virtual machine CPU demand at warning /immediate/critical level ]

  • <= 50% of child virtual machines have [Virtual machine CPU demand at warning/ immediate/critical level ]

Use vSphere vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity.

Host has CPU contention caused by more than half of the virtual machines.

Symptoms include all of the following:

  • ! Host inside a cluster

  • Host CPU contention is at warning/immediate/critical level

  • Host CPU demand at warning/immediate/critical level

  • > 50% of child virtual machines have [Virtual machine CPU demand at warning/ immediate/critical level ]

  1. Use vSphere vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity.

  2. Upgrade the host or use a host that has larger CPU capacity.

Host has CPU contention due to overpopulation of virtual machines.

Symptoms include all of the following:

  • ! Host inside a cluster

  • Host CPU contention is at warning/immediate/critical level

  • Host CPU demand at warning/immediate/critical level

  • Zero child virtual machines have [ Virtual machine CPU demand at warning/ immediate/critical level ]

  1. Use vSphere vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity.

  2. Upgrade the host or use a host that has larger CPU capacity.

Host in a non-DRS cluster has CPU contention caused by less than half of the virtual machines.

Symptoms include all of the following:

  • Host inside a cluster

  • [ ! DRS Enabled OR ! DRS fully automated ]

  • Host CPU contention is at warning/immediate/critical level

  • > 0 child virtual machines have [ Virtual machine CPU demand at warning /immediate/critical level ]

  • <= 50% of child virtual machines have [Virtual machine CPU demand at warning /immediate/critical level ]

Use vSphere vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity.

Host in a non-DRS cluster has CPU contention caused by more than half of the virtual machines.

Symptoms include all of the following:

  • Host inside a cluster

  • [ ! DRS Enabled OR ! DRS fully automated ]

  • Host CPU contention at warning/immediate/critical level

  • Host CPU demand at warning/immediate/critical level

  • > 50% of child virtual machines have [ Virtual machine CPU demand at warning /immediate/critical level ]

  1. Use vSphere vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity.

  2. Upgrade the host or use a host that has larger CPU capacity.

Host in a non-DRS cluster has CPU contention due to overpopulation of virtual machines.

Symptoms include all of the following:

  • Host inside a cluster

  • [ ! DRS Enabled OR ! DRS fully automated ]

  • Host CPU contention at warning/immediate/critical level

  • Host CPU demand at warning/immediate/critical level

  • Zero child virtual machines have [ Virtual machine CPU demand at warning /immediate/critical level

  1. Use vSphere vMotion to migrate some virtual machines with high CPU workload to other hosts that have available CPU capacity.

  2. Upgrade the host or use a host that has larger CPU capacity.

Host has memory contention caused by less than half of the virtual machines.

Symptoms include all of the following:

  • ! Host inside a cluster

  • Host memory contention at warning/immediate/critical level

  • > 0 child virtual machines have [ Virtual machine memory workload at warning /immediate/critical level ]

  • <= 50% of child virtual machines have [Virtual machine memory workload at warning /immediate/critical level ]

Use vSphere vMotion to migrate some virtual machines with high memory workload to other hosts that have available memory capacity.

Host has memory contention caused by more than half of the virtual machines.

Symptoms include all of the following:

  • ! Host inside a cluster

  • Host memory workload at warning/immediate/critical level

  • Host memory contention at warning/immediate/critical level

  • > 50% of child virtual machines have [ Virtual machine memory workload at warning /immediate/critical level ]

  1. Use vSphere vMotion to migrate some virtual machines with high memory workload to other hosts that have available memory capacity.

  2. Upgrade the host to use a host that has larger memory capacity.

Host has memory contention due to overpopulation of virtual machines.

Symptoms include all of the following:

  • ! Host inside a cluster

  • Host memory workload at warning/immediate/critical level

  • Host memory contention at warning/immediate/critical level

  • Zero child virtual machines have [ Virtual machine memory workload at warning/ immediate/critical level ]

  1. Use vSphere vMotion to migrate some virtual machines with high memory workload to other hosts that have available memory capacity.

  2. Upgrade the host or use a host that has larger memory capacity.

Host in a non-DRS cluster has memory contention caused by less than half of the virtual machines.

Symptoms include all of the following:

  • Host inside a cluster

  • [ ! DRS Enabled OR ! DRS fully automated ]

  • Host memory contention at warning/immediate/critical level

  • > 0 child virtual machines have [ Virtual machine memory workload at warning/ immediate/critical level ]

  • <= 50% of child virtual machines have [Virtual machine memory workload at warning/ immediate/critical level ]

Use vSphere vMotion to migrate some virtual machines with high memory workload to other hosts that have available memory capacity.

Host in a non-DRS cluster has memory contention caused by more than half of the virtual machines.

Symptoms include all of the following:

  • Host inside a cluster

  • [ ! DRS Enabled OR ! DRS fully automated ]

  • Host memory workload at warning/immediate/critical level

  • Host memory contention at warning/immediate/critical level

  • > 50% of child virtual machines have [ Virtual machine memory workload at warning /immediate/critical level ]

  1. Use vSphere vMotion to migrate some virtual machines with high memory workload to other hosts that have available memory capacity.

  2. Upgrade the host or use a host that has larger memory capacity.

Host in a non-DRS cluster has memory contention due to overpopulation of virtual machines.

Symptoms include all of the following:

  • Host inside a cluster

  • [ ! DRS Enabled OR ! DRS fully automated ]

  • Host memory workload at warning/immediate/critical level

  • Host memory contention at warning/immediate/critical level

  • Zero child virtual machines have [ Virtual machine memory workload at warning /immediate/critical level ]

  1. Use vSphere vMotion to migrate some virtual machines with high memory workload to other hosts that have available memory capacity.

  2. Upgrade the host or use a host that has larger memory capacity.

Host is experiencing high number of received packets dropped.

Symptoms include all of the following:

  • Host network received packets dropped

  • Host network received packets dropped above DT

  • Host network data receive workload at Warning level

  • Host network data receive workload above DT

  • Host CPU demand at Critical level

  1. If the host has one CPU, upgrade the host or use a host that has larger CPU capacity.

  2. Add an additional NIC to the host.

  3. Reduce the amount of network traffic being generated by virtual machines by moving some of them to a host with lower network traffic.

Host is experiencing high number of transmitted packets dropped.

Symptoms include all of the following:

  • Host network transmitted packets dropped

  • Host network transmitted packets dropped above DT

  • Host network data transmit workload at Warning level

  • Host network data transmit workload above DT

  • Host is dropping high percentage of packets

  1. Add an additional NIC to the host.

  2. Reduce the amount of network traffic being generated by virtual machines by moving some of them to a host with lower network traffic.

ESXi host has detected a link status 'flapping' on a physical NIC.

Physical NIC link state flapping (fault symptom).

ESXi disables the device to avoid the link flapping state. You might need to replace the physical NIC. The alert will be canceled when the NIC is repaired and functioning. If you replace the physical NIC, you might need to manually cancel the alert.

ESXi host has detected a link status down on a physical NIC.

Physical NIC link state down (fault symptom).

ESXi disables the device to avoid the link flapping state. You might need to replace the physical NIC. The alert will be canceled when the NIC is repaired and functioning. If you replace the physical NIC, you might need to manually cancel the alert.

Battery sensors are reporting problems.

  • Battery sensor health is red OR

  • Battery sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

BMC sensors are reporting problems.

  • BMC sensor health is red OR

  • BMC sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Fan sensors are reporting problems.

  • Fan sensor health is red OR

  • Fan sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Hardware sensors are reporting problems.

  • Hardware sensor health is red OR

  • Hardware sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Memory sensors are reporting problems.

  • Memory sensor health is red OR

  • Memory sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exist.

Power sensors are reporting problems.

  • Power sensor health is red OR

  • Power sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Processor sensors are reporting problems.

  • Processor sensor health is red

  • Processor sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

SEL sensors are reporting problems.

  • SEL sensor health is red OR

  • SEL sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Storage sensors are reporting problems.

  • Storage sensor health is red OR

  • Storage sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

System Board sensors are reporting problems.

  • System board sensor health is red OR

  • System board sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Temperature sensors are reporting problems.

  • Temperature sensor health is red OR

  • Temperature sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Voltage sensors are reporting problems.

  • Voltage sensor health is red OR

  • Voltage sensor health is yellow

Change or replace the hardware if necessary. Contact the hardware vendor for assistance. After the problem is resolved, the alert will be canceled when the sensor that reported the problem indicates that the problem no longer exists.

Health/Critical

These alert definitions have the following impact and criticality information.

Impact

Health

Criticality

Critical

Alert Definition

Symptoms

Recommendations

Host has lost connection to vCenter.

  • Connection to the host has been lost (fault symptom) OR

  • Host disconnected from vCenter

Log on to the vSphere Client and vSphere Web Client and manually reconnect the host to the vCenter Server server. After the connection to the host is restored to the vCenter Server, the alert is cancelled.

vSphere High Availability (HA) has detected a network-isolated host.

vSphere HA detected a network isolated host (fault symptom).

Resolve the networking problem that prevents the host from pinging its isolation addresses and communicating with other hosts. Make sure that the management networks that vSphere HA uses include redundancy. With redundancy, vSphere HA can communicate over more than one path, which reduces the chance of a host becoming isolated.

vSphere High Availability (HA) has detected a possible host failure.

vSphere HA detected a host failure (fault symptom).

Find the computer that has the duplicate IP address and reconfigure it to have a different IP address. This fault is cleared and the alert canceled when the underlying problem is resolved, and the vSphere HA master agent is able to connect to the HA agent on the host.

Note:

You can use the Duplicate IP warning in the /var/log/vmkernel log file on an ESX host or the /var/log/messages log file on an ESXi host to identify the computer that has the duplicate IP address.

Host is experiencing network contention caused by too much traffic.

Symptoms include all of the following:

  • Host is experiencing dropped network packets

  • Host network workload at warning/immediate/critical level

  1. Review the load balancing policy in the Port Group and the vSwitch.

  2. Add an additional NIC to the host.

  3. Reduce the amount of network traffic being generated by virtual machines by moving some of them to a host with lower network traffic.

The host has lost connectivity to a dvPort.

Lost network connectivity to dvPorts (fault symptom).

Replace the physical adapter or reset the physical switch. The alert will be canceled when connectivity is restored to the dvPort.

The host has lost connectivity to the physical network.

Lost network connectivity (fault symptom).

To determine the actual failure or to eliminate possible problems, check the status of the vmnic in the vSphere Client or from the ESX service console:

  • To check the status in the vSphere Client, select the ESX host, click Configuration, and then click Networking. The vmnics currently assigned to virtual switches appear in the diagrams. If a vmnic displays a red X, that link is currently down.

  • From the service console, run the command:esxcfg-nics. The output that appears is similar to the following: Name PCI Driver Link Speed Duplex Description ------------------------------------------------------------------ vmnic0 04:04.00 tg3 Up 1000Mbps Full Broadcom BCM5780 Gigabit Ethernet vmnic1 04:04.01 tg3 Up 1000Mbps Full Broadcom BCM5780 Gigabit Ethernet. The Link column shows the status of the link between the network adapter and the physical switch. The status can be either Up or Down. If some network adapters are up and others are down, you might need to verify that the adapters are connected to the intended physical switch ports. To verify the connections, bring down each ESX host port on the physical switch, run esxcfg-nics -l", and observe the affected vmnics.

Verify that the vmnic identified in the alert is still connected to the switch and configured properly:

  • Make sure that the network cable is still connected to the switch and to the host.

  • Make sure that the switch is connected to the system, is still functioning properly, and has not been inadvertently misconfigured. For more information, see the switch documentation.

  • Check for activity between the physical switch and the vmnic. You can check activity by performing a network trace or observing activity LEDs.

  • Check for network port settings on the physical switch.

To reconfigure the service console IP address if the affected vmnic is associated with a service console, see http://kb.vmware.com/kb/1000258 If the problem is caused by your hardware, contact your hardware vendor for replacement hardware.

The host lost connectivity to a Network File System (NFS) server.

Lost connection to NFS server (fault symptom).

  1. Verify the NFS server is running.

  2. Check the network connection to make sure the ESX host can connect to the NFS server.

  3. Determine whether the other hosts that use the same NFS mount are experiencing the same problem, and check the NFS server status and share points.

  4. Make sure that you can reach the NFS server by logging into the service console and using vmkping to ping the NFS server: "vmkping <nfs server>".

  5. For advanced troubleshooting information, seehttp://kb.vmware.com/kb/1003967

A fatal error occurred on a PCIe bus during system reboot.

A fatal PCIe error occurred.

Check and replace the PCIe device identified in the alert as the cause of the problem. Contact the vendor for assistance.

A fatal memory error was detected at system boot time.

A fatal memory error occurred.

Replace the faulty memory or contact the vendor.

Health/Immediate

These alert definitions have the following impact and criticality information.

Impact

Health

Criticality

Immediate

Alert Definition

Symptom

Recommendations

The host has lost redundant connectivity to a dvPort.

Lost network redundancy to DVPorts (fault symptom).

Replace the physical adapter or reset the physical switch. The alert will be canceled when connectivity is restored to the DVPort.

The host has lost redundant uplinks to the network.

Lost network redundancy (fault symptom).

To determine the actual failure or to eliminate possible problems, first connect to ESX through SSH or the console:

  1. Identify the available uplinks by running esxcfg-nics -l.

  2. Remove the reported vmnic from the port groups by running esxcfg-vswitch -U &lt;affected vmnic#&gt; affected vSwitch.

  3. Link available uplinks to the affected port groups by running esxcfg-vswitch -L &lt;available vmnic#&gt; affected vSwitch.

Next, check the status of the vmnic in vSphere Client or the ESX service console:

  1. In vSphere Client, select the ESX host, click Configuration, and then click Networking.

    The vmnics currently assigned to virtual switches appear in the diagrams. If a vmnic displays a red X, that link is currently unavailable.

  2. From the service console, run esxcfg-nics -l. The output that appears is similar to the following example: Name PCI Driver Link Speed Duplex Description.

------------------------------------------------------------------ vmnic0 04:04.00 tg3 Up 1000Mbps Full Broadcom BCM5780 Gigabit Ethernet vmnic1 04:04.01 tg3 Up 1000Mbps Full Broadcom BCM5780 Gigabit Ethernet. The Link column shows the status of the link between the network adapter and the physical switch. The status can be either Up or Down. If some network adapters are up and others are down, you might need to verify that the adapters are connected to the intended physical switch ports. To verify the connections, shut down each ESX host port on the physical switch, run the "esxcfg-nics -l" command, and observe the affected vmnics. Verify that the vmnic identified in the alert is still connected to the switch and configured properly:

  1. Make sure that the network cable is still connected to the switch and to the host.

  2. Make sure that the switch is connected to the system, is still functioning properly, and was not inadvertently misconfigured. (See the switch documentation.)

  3. Perform a network trace or observe activity LEDs to check for activity between the physical switch and the vmnic.

  4. Check for network port settings on the physical switch.

    If the problem is caused by hardware, contact your hardware vendor for a hardware replacement.

A PCIe error occurred during system boot, but the error is recoverable.

A recoverable PCIe error occurred.

The PCIe error is recoverable, but the system behavior is dependent on how the error is handled by the OEM vendor's firmware. Contact the vendor for assistance.

A recoverable memory error has occurred on the host.

A recoverable memory error occurred.

Since recoverable memory errors are vendor-specific, contact the vendor for assistance.

Risk/Symptom-Based

These alert definitions have the following impact and criticality information.

Impact

Risk

Criticality

Symptom-based

Alert Definition

Symptom

Recommendations

ESXi Host is violating vSphere 5.5 Hardening Guide.

  • Active directory authentication disabled OR

  • Non-compliant NTP service startup policy OR

  • SSH service is running OR

  • NTP service stopped OR

  • Non-compliant timeout value for automatically disabling local and remote shell access OR

  • vSphere Authentication Proxy not used for password protection when adding ESXi hosts to active directory OR

  • Persistent logging disabled OR

  • Bidirectional CHAP for iSCSI traffic disabled OR

  • Non-compliant firewall setting to restrict access to NTP client OR

  • NTP server for time synchronization not configured OR

  • Non-compliant ESXi Shell service startup policy OR

  • Non-compliant firewall setting to restrict access to SNMP server OR

  • ESXi Shell service is running OR

  • Non-compliant DCUI service startup policy OR

  • Dvfilter bind IP address configured OR

  • Non-compliant SSH service startup policy OR

  • DCUI service is running OR

  • Non-compliant idle time before an interactive shell is automatically logged out OR

  • Non-compliant DCUI access user list OR

  • Remote syslog is not enabled

Fix the vSphere 5.5 Hardening Guide Rules Violations according to the recommendations in the vSphere5 Hardening Guide