This section discusses how to troubleshoot external health monitor issues.

External health monitor on NSX Advanced Load Balancer uses scripts to provide highly customized and granular health checks. The scripts may be Linux shell, Python, or Perl, which can be used to execute wget, netcat, curl, snmpget, etc.

Troubleshooting Steps

The directory structure of NSX Advanced Load Balancer is not exposed in the NSX Advanced Load Balancer UI. This is available only through the admin shell/console access. External health monitor scripts have limited access, so as to not affect the normal functioning of the NSX Advanced Load Balancer system. CPU, memory, disk, and other resources are limited for the external health monitor scripts. Hence, it is recommended to have relaxed timeouts for external health monitors.

Using NSX Advanced Load Balancer CLI

When building an external monitor, it is common to manually test the successful execution of the commands. To execute commands from an SE, it is necessary to switch to the proper namespace or tenant. The production external monitor will correctly use the proper tenant.

To attach to an NSX Advanced Load Balancer SE using NSX Advanced Load Balancer CLI, refer to SSH Access for Super User.

For more information on the script parameters, refer to External Health Monitors.

If the external health monitor script provides an output for the stdout command, this indicates the successful execution of the health monitor. If the script does not provide any output, this is treated as a failure.

Troubleshooting Examples:

Check that the output goes to stdout and not stderr.

For example, the following usage fails:

netcat -v -n -z -w 3 $IP $PORT | grep "open" 2>&1 > /dev/null

The netcat command's output is written to stderr. The grep command operates on stdout. Hence, the output data is available under stderr.

You can confirm this by doing:

root@avi-se-iihyz:/run/hmuser# netcat -v -n -z -w 3 $IP $PORT | grep "open" 2>&1 > /dev/null
(UNKNOWN) [10.10.30.34] 80 (http) open ? still shows up.

Changing the above to the following fixes the issue.

netcat -v -n -z -w 3 $IP $PORT 2>&1 | grep "open"

Using Show Command

The show pool <pool-name> server hmonstat command provides information about the failure code, the request, and response strings.

Using NSX Advanced Load Balancer UI

Login to NSX Advanced Load Balancer UI and navigate to Applications > Pools, select the desired pool, and click Events to check health monitor logs.

Using Errors Output from the Script

The return code of the external health monitor script is used to pick the failure reason code. The valid error codes are:

  • EINTR, ETIMEDOUT: Connection Timeout. (Generated by NSX Advanced Load Balancer infra upon script timeout)

  • ECONNREFUSED: Connection Refused

  • ECONNRESET: Connection Reset

  • EADDRINUSE/EADDRNOTAVAIL: Address not available

  • EHOSTDOWN/EHOSTUNREACH: Host unreachable

  • ENETDOWN/ENETUNREACH: Network unreachable

  • ENOBUFS/ENOMEM: Out of resources (this could be generated by NSX Advanced Load Balancer Infra if resource allocation fails.)

All other errors are treated as the other error.

Note:
  • The script can write an error to $HM_NAME.$IP.$PORT.out, and this output will be available in the above command’s output, to aid debugging. This works only when the external health monitor debugging is enabled.

  • In order to run the script to troubleshoot the script, the superuser can log in to the Service Engine console with root privileges, and then as a sudo - hmuser and run the script which is stored in the /run/hmuser directory.

  • Although you can modify the script on the Service Engine for troubleshooting, this change is temporary. Once the Service Engine restarts or you modify the pool/health monitor, the changes will be lost. The correct way to modify the health monitor configuration is from the NSX Advanced Load Balancer UI/CLI/API.

Packet Capture

External health monitor packets are not captured using the option available under Operations > Packet Capture. Use the tcpdump command with filter options from the shell prompt of NSX Advanced Load Balancer Controller.

tcpdump -i <avi_ethX>”

The output for the above commands shows the external health monitor traffic.

For more details on SSH Key-based Login to NSX Advanced Load Balancer Controller, refer to SSH Key-based Login to NSX Advanced Load Balancer Controller.