Server flapping between UP
and DOWN
, is a common issue. Generally, server flapping is caused by the server reaching or slightly exceeding the health monitor's maximum allowed response time.
To validate if a server is flapping, check the specific server's analytics page within the pool as follows:
Navigate to
and click the pool nameClick Servers tab.
Click on a server name to view the its metrics as illustrated below:
Select the Alerts and System Events Overlay icons for the main chart to view server UP
and DOWN
events over the selected duration. The page also displays the list of failed health monitors.
Compare the response times from the server to the health monitor's configured receive timeout window. If the failures can be attributed to these timers, you can use the following steps to rectify the same:
Add additional servers — This will not help if the slowdown is due to a backend database, but this can be a quick and permanent fix for servers that are simply busy or overloaded.
Increase the health monitor's receive timeout window — The timeout value can be 1-300 seconds. The timeout value must always be shorter than the send interval for the health monitor.
Raise the number of successful checks required, and decrease the number of failed checks allowed. This will ensure the server is not brought back into the rotation as quickly, potentially giving it more time to handle the processes causing the slow response.
Change the connection ramp-up (if using the least connections load-balancing algorithm)— Servers can be susceptible to receiving too many connections too fast when first brought up. For instance, if one server has 1 connection and the rest have 100 connections, the new server must get the subsequent 99 connections per the least connections algorithm. This can easily overwhelm the server, leaving a flash crowd of connections that must be dealt with by the remaining servers, causing a domino effect. You can configure the connection ramp-up feature on the Advanced tab of the pool's configuration. The connection ramp-up feature slowly ramps up the percentage of new connections sent to a new server. Increasing the ramp-up time can be beneficial if you see a cascading failure of servers.
Set the maximum number of connections per server. This option, configurable on the Advanced tab of the pool configuration, ensures that servers are not overloaded and can handle connections at optimal speed.