Troubleshooting External Health Monitor

This section discusses how to troubleshoot external health monitor issues.

External health monitor on NSX Advanced Load Balancer uses scripts to provide highly customized and granular health checks. The scripts may be Linux shell, Python, or Perl, which can be used to execute wget, netcat, curl, snmpget, and so on.

Troubleshooting Steps

The directory structure of NSX Advanced Load Balancer is not exposed in the NSX Advanced Load Balancer UI. This is available only through the admin shell or console access. External health monitor scripts have limited access, so as to not affect the normal functioning of the NSX Advanced Load Balancer system. CPU, memory, disk, and other resources are limited for the external health monitor scripts. Hence, it is recommended to have relaxed timeouts for external health monitors.

Using NSX Advanced Load Balancer CLI

When building an external monitor, it is common to manually test the successful execution of the commands. To execute commands from an SE, it is necessary to switch to the proper namespace or tenant. The production external monitor will correctly use the proper tenant.

To attach to an NSX Advanced Load Balancer SE using NSX Advanced Load Balancer CLI, see SSH Access for the Super Userin the VMware NSX Advanced Load Balancer Administration Guide.

For more information on the script parameters, see External Health Monitor.

If the external health monitor script provides an output for the stdout command, this indicates the successful execution of the health monitor. If the script does not provide any output, this is treated as a failure.

Troubleshooting Examples

Check that the output goes to stdout and not stderr.

For example, the following usage fails:

netcat -v -n -z -w 3 $IP $PORT | grep "open" 2>&1 > /dev/null

The netcat command's output is written to stderr. The grep command operates on stdout. Hence, the output data is available under stderr.

You can confirm this by doing:

root@avi-se-iihyz:/run/hmuser# netcat -v -n -z -w 3 $IP $PORT | grep "open" 2>&1 > /dev/null
(UNKNOWN) [10.10.30.34] 80 (http) open ? still shows up.

Changing the above to the following fixes the issue.

netcat -v -n -z -w 3 $IP $PORT 2>&1 | grep "open"

Using Show Command

The show pool <pool-name> server hmonstat command provides information about the failure code, the request, and response strings.

Using NSX Advanced Load Balancer UI

Login to NSX Advanced Load Balancer UI and navigate to Applications > Pools, select the desired pool, and click Events to check health monitor logs.

Error Codes from the Script

The return code of the external health monitor script is used to pick the failure reason code. The valid error codes are:

EINTR, ETIMEDOUT: Connection timeout. (Generated by NSX Advanced Load Balancer infra upon script timeout)
ECONNREFUSED: Connection refused
ECONNRESET: Connection reset
EADDRINUSE/EADDRNOTAVAIL: Address unavailable
EHOSTDOWN/EHOSTUNREACH: Host unreachable
ENETDOWN/ENETUNREACH: Network unreachable
ENOBUFS/ENOMEM: Out of resources. (This could be generated by NSX Advanced Load Balancer Infra if resource allocation fails)

All other errors are treated as the other error.

Note:

The script can write an error to $HM_NAME.$IP.$PORT.out, and this output will be available in the above command’s output, to aid debugging. This works only when the external health monitor debugging is activated.

In order to run the script to troubleshoot the script, the superuser can log in to the Service Engine console with root privileges, and then as a sudo - hmuser and run the script which is stored in the /run/hmuser directory.

Although you can modify the script on the Service Engine for troubleshooting, this change is temporary. Once the Service Engine restarts or you modify the pool/health monitor, the changes will be lost. The correct way to modify the health monitor configuration is from the NSX Advanced Load Balancer UI/CLI/API.

Packet Capture

External health monitor packets are not captured using the option available under Operations > Packet Capture. Use the tcpdump command with filter options from the shell prompt of NSX Advanced Load Balancer Controller.

tcpdump -i <avi_ethX>”

The output for the above commands shows the external health monitor traffic.

For more information on SSH Key-based Login to NSX Advanced Load Balancer Controller, see Password-less SSH-based Login for Admin User in the VMware NSX Advanced Load Balancer Administration Guide.

SNAT IP and External Health Monitor

SNAT on External health monitor is not supported. So SNAT IP and External HM are not supported to be configured together.

When SNAT IP is configured for virtual service and external health monitor for the same virtual service is also configured, the in-built health monitor honors the SNAT and initiates TCP connection through the SNAT IP. However, if an external health monitor is used, the SNAT IP is not honored. This is an expected behavior.