The following are the common reasons for a server to be marked down:

  • ARP Unresolved — If the Service Engine is unable to resolve the MAC address of the server's IP address (when in the same layer 2 domain) or is unable to initiate a TCP connection (when the server is a layer 3 hop away).

  • Payload Mismatch — The health monitor expects specific content to be returned in the body of the response (HTTP or TCP). In the example, an excerpt of the server's response is shown. Often this type of error occurs when a server's first response is to send a redirect to a client. The expected content appears in the client browser, but from Avi Load Balancer's perspective, the client receives a redirect.

  • Response Code Mismatch — HTTP health checks can be configured to expect a specific response code, such as 2xx. Meanwhile, the server can be sending back a different code, such as 404.

  • Response Timeout with a Threshold Violation — Health monitors wait a timeout period for a response and every health monitor can be assigned to its threshold and timeout period. If a valid response is not received within the timeout period, for N consecutive times equal to the threshold, then the server is marked down.

While Avi Load Balancer is engineered for easy troubleshooting, you will require more advanced tools. Hence you can capture a trace of the conversation between the SE and the server by navigating to Operations > Traffic Capture.

For more details on traffic capture, refer to Traffic Capture.

You can use tools such as ping and curl while launching from a client machine to the server. However, these tools are not reliable if the tools are executed by administrators from SEs. This is due to the dual network stacks used for the data plane and management. For instance, a tool such as ping is executed from Linux using the SE management IP and network. The results can be different than the SE that is reporting its health check via its data NICs and networks. For instance, use ping -1 to verify the interface used.