Verifying Monitor Results

NSX Advanced Load Balancer does not include health monitors while recording logs for client traffic for a virtual service. This topic explains methods for inspecting the results received by active health monitors.

Using GUI

From the GUI, the following are the different ways to check the status of a server:

Mouse over a down (red) server icon.
Navigate to Pool > Server page, click the Failed Monitor in the health monitor table to expand the results.
Check for the events of the virtual server and pool record status changes and reasons.

For more information, see Reasons Servers Marked Down section in VMware NSX Advanced Load BalancerMonitoring and Operability guide.

Using CLI and API

You can view the extensive health monitor information from the CLI and API for each server in the pool. The example below shows an abbreviated view:

show pool [poolname] server hmonstat
+---------------------------------+----------------------------------------+
| Field                           | Value                                  |
+---------------------------------+----------------------------------------+
| server_hm_stat[1]               |                                        |
|   server_name                   | 10.90.15.61:8000                       |
|   oper_status                   |                                        |
|     state                       | OPER_UP                                |
|   shm_runtime[1]                |                                        |
|     health_monitor_name         | healthmonitor-1                        |
|     health_monitor_type         | HEALTH_MONITOR_TCP                     |
|     last_transition_timestamp_3 | Tue May 24 20:42:51 2016 ms            |
|     last_transition_timestamp_2 | Tue May 24 20:42:38 2016 ms            |
|     last_transition_timestamp_1 | Tue May 24 20:37:10 2016 ms            |
|     rise_count                  | 255                                    |
|     fall_count                  | 0                                      |
|     total_checks                | 1414                                   |
|     total_failed_checks         | 5                                      |
|     total_count[1]              |                                        |
|       type                      | CONNECTION_TIMEOUT                     |
|       count                     | 5                                      |
|     avg_response_time           | 1                                      |
|     recent_response_time        | 1                                      |
|     min_response_time           | 1                                      |
|     max_response_time           | 1999                                   |
|     port                        | 8000                                   |
|     curr_failed_checks          | 1                                      |
|   ip_addr                       | 10.90.15.61                            |
|   port                          | 8000                                   |
+---------------------------------+----------------------------------------+

Using Packet Capture

By default, NSX Advanced Load Balancer does not include health monitor traffic when performing packet captures. However, you can change this through the CLI using the following flags:

debug_vs_hm_include: Include health monitor packets in the capture
debug_vs_hm_none: This default omits health monitor packets from the capture
debug_vs_hm_only: Only capture health monitor packets

For more information, see Packet Capture section in VMware NSX Advanced Load BalancerMonitoring and Operability guide.

Using Manual Test

You can manually send a ping, curl, or similar Linux CLI accessed utilities to validate the response of a server.

For more information, see Manually Validating Server Health.

Common Monitor Issues

You can review these common issues if the result from a server response is the desired response and NSX Advanced Load Balancer is still marking the server DOWN.

General Monitor Issues

The following are the generic monitor issues:

The system inspects the content returned from servers and compares it to the monitor's Server Response Data as case sensitive.
Most monitors only inspect up to 2k within the server response, which includes both headers and content. If the desired result is further within the response, the server will be marked DOWN.
Duplicate IP is one of the most common issues causing intermittent failures of health checks.

Passive

The system will trigger the passive monitor in the event of a significant error, which will automatically generate the logs for the virtual service. When drilling into a server page, the passive monitor can show less than 100%. You can view the virtual service logs by filtering for the server in question. Then click the Significance tile from the Log Analytics sidebar.

You can check if failures are occurring and increasing over time using the following CLI:

    : > show pool p1 detail            | grep suspect
    |   lb_fail_suspect_state             | 0

Ping

Some devices, including servers and firewalls, restrict the frequency of ICMP messages and can silently discard them. In such cases, you need to lower the frequency of the Send Interval option.

HTTP

You need to send the exact request headers in the send string to the servers. For instance, a space in a host header can cause issues for IIS, such as Host: Avi Server. The HTTP monitor adds a few headers to emulate a valid request. To omit these extra headers, you can use a TCP monitor, which is explicit to the send string defined in the Client Request Data field. If you are using a TCP, ensure that you add \r\n characters for the carriage return line feed.

NSX Advanced Load Balancer includes \r\n at the end of each line of the request. HTTP 1.0 requires a second \r\n to be sent after the last line, which includes:

[Health monitor send string]\r\n
User-Agent: avi/1.0\r\n
Host: [Avi inserted server name]\r\n
Accept: */*\r\n\r\n

For HTTP/S, NSX Advanced Load Balancer does not render the results but inspects them literally. For instance, a server can send a 302 redirect back to NSX Advanced Load Balancer, which does not include server is good. A browser will follow the redirect and display the page with the correct content. The URI encoding of content can also cause an HTTP/S response to failing.

External

You can run external health monitors using hmuser users with lower privileges. You can attach to a Service Engine and log in as root as su - hmuser <-- login as hmuser.

root@test-se2:~# su - hmuser
    hmuser@10-10-25-28:~$ pwd
    /run/hmuser

UDP Health Monitor

UDP health monitors that are configured with no receive-string, rely on ICMP unreachable messages to detect an error. The absence of an ICMP message results in the server being marked up. In a deployment with many servers, the number of ICMP messages can be large, and UDP health monitors can be erroneously marked up.

To overcome the above situation and mark the server down or virtual service down, you can tune the ICMP rate limit configuration.

If ICMP unreachable messages are dropped, in high scale cases due to ICMP unreachable rate-limiter, you can confirm the occurrence of this issue, using the following command:

show serviceengine  [se-name] flowtablestat | grep icmp_rx_rl

 |   icmp_rx_rl_cfg_pps            | 100                        |
 |   icmp_rx_rl_confirming         | 30                         |
 |   icmp_rx_rl_drops              | 0                          |

The following are the commands to configure ICMP rate limit:

 [admin:controller]: configure serviceengineproperties
 [admin:controller]: seproperties:se_runtime_properties
 [admin:controller]: seproperties:se_runtime_properties  se_rate_limiters
 [admin:controller]: seproperties:se_runtime_properties:se_rate_limiters : icmp_rl 100