Health Monitor Sharding

Health monitor sharding enables selective monitoring for SEs on NSX Advanced Load Balancer. With this feature, all the SEs do not have to health monitor all the GSLB pool members. A particular SE will be responsible for a set of GSLB services whereas other SEs will be responsible for other GSLB services.

When health monitoring (HM) sharding is not enabled, and datapath monitoring is enabled for GSLB services, all the Service Engines where the DNS virtual service is placed are responsible for monitoring all the GSLB pool members.

For example, if there are 1000 GSLB services and a DNS virtual service is placed over 2 Service Engines, both the SEs would health monitor all 1000 GSLB services based on the configuration. If multiple DNS virtual services are involved for a particular domain, then all the SEs perform health probing. If a DNS VS is present on various sites, probing will be done from other DNS SEs.

Use Case

It is helpful in the deployments where SEs are deployed in huge numbers.
Resiliency within a system within site. Without HM sharding, resiliency within site in the case of DNS is not cost-effective as another SE with the same configuration is required. Another option is to have DNS virtual services across sites, that is, site-level resiliency. But, this required a third-party application (for example, Infoblox) to monitor the availability of the NSX Advanced Load Balancer DNS VSs and remove them from active resolvers if the DNS-VS goes down.
To reduce the load on the back-end system as selective monitoring is performed.

How it Works

A shard server runs on the NSX Advanced Load Balancer Controller leader, and a sharding client runs on each SE.

Status is shared between SEs by the NSX Advanced Load Balancer Controller. Each SE is responsible for a defined number of services.

Example:

There are 10k GSLB services, each with one health monitor. A DNS virtual service is placed over four SEs, so each SE would be roughly responsible for monitoring 2500 GSLB services.

SE1- 1 to 2500 GSLB services
SE2 - 2501 to 5000 GSLB services
SE3 - 5001 to 7500 GSLB services
SE4- 7501 to 10000 GSLB services

Note:

The order mentioned above is for illustration purposes only.

HM sharding feature reduces heath-monitor traffic or load by lowering the health-monitor probes across multiple DNSes and SEs within a DNS virtual service for a domain.

Status Propagation of GSLB Services Across SEs

SEs require an updated health monitoring state to process the DNS requests correctly. Each SE performs health monitoring for a set of GSLB services and will also propagate this information to the state cache manager (SCM). The state cache manager (SCM) propagates the status of the GSLB services across the SEs.

The following assumptions are considered to explain the feature:

SE1 executes health monitor probes for GSLB services Gs1-Gs1k.
SE2 may be executing health-monitor probes for GSLB Services Gs1k – Gs2k.
SE1 needs the status of GSLB services Gs1k-Gs2K.
SE2 needs the status of GSLB services *Gs1-Gs1K().
SCM propagates the statuses from SE1 and SE2.

The SCM has the information of SEs to which the information has to be propagated. This information is retrieved from the shard server (SS) by registering for this information. The SS propagates the shard map information to the SCM whenever a change occurs.

SEs in Headless State

Assume that SE1 goes headless. When a SE goes headless, it waits for the time equal to the send_interval time configured. It is the amount of time SE waits before declaring itself headless. After the send_interval time is expired, SE1 will start monitoring all the GSLB services. This is done to maintain the correct state of GSLB pool members.

The moment the NSX Advanced Load Balancer Controller sees the state of the given member is changed, it immediately tries to push the state to other SEs.

The send_interval time is a standard knob that controls many functionalities. It is the interval at which all the controllers query each other depends on a parameter. This field is configurable through API and CLI to change the interval.

The following CLI can be used to configure / edit the value of send_interval for a GSLB cluster, for example, avidemo_gslb_cluster.

[admin:ctrl]: > configure gslb avidemo_gslb_cluster
[admin:ctrl]: gslb> send_interval <value>
Overwriting the previously entered value for send_interval
[admin:ctrl]: gslb> save

On the SE, it defines:

Debounce the timer for an SE to react to headless behavior.
Batch incoming messages from the shard server to improve responsiveness during warm boot.
For example: if a DNS-VS is placed on n SEs, then the SE can receive n-1 messages during a warm boot. All n-1 messages that arrive within the send interval are batched and processed in one go.

Scale Out: Assumption

To demonstrate a scale-out event, it is assumed that DNS virtual service is placed over 4 SEs. In the case of a scale-out event, virtual services are placed on a new SE (on the fifth SE).

Whenever virtual service placement happens, the new SE advertises a request for registering for the configured domain. Once the NSX Advanced Load Balancer Controller receives this message, it broadcasts the message for the domain xyz.com. There are now five potential receivers (SE1 to SE5).

Since a new SE is added now, that new SE needs to know which GSLB services it must monitor. The SE, which is health monitoring a GSLB service, is the owner of that GSLB service, and other SEs are watchers for the same. Each SE maintains a consistent hash, which it uses to determine if there are any state changes. If a lookup is performed for a consistent hash, it will reveal the SE owner for a given GSLB service. When the consistent hash is changed, all SEs come to the same conclusion.

In this example, 5SEs are available for health monitoring after re-computing the consistent hash. With 5 SEs, each SE will monitor 2000 GSLB services instead of 2500.

Whenever a new resource is added, it does not create a complete remap of the results. It only affects the delta of the result. The scale-out time is directly proportional to the number of GSLB services.

Scale in

In the case of scale-in, recomputing consistent hash when the number of points changes remains the same.

Scale-in time is constant; it does not vary linearly with the number of GSLB services. However, the convergence time would vary in this case and would be longer as compared to the scale-out scenario.

Notes

Neither the NSX Advanced Load Balancer Controller nor SE makes a sharding decision when configuring a new GSLB service. The SE will instantiate the GSLB service and unconditionally run the health-monitor probes.
If both the DNS virtual services are responsible for different domains, that is, DNS VS 1 is for xyz.com whereas DNS VS2 is for abc.com, then the sharding decision for DNS 1 will be independent of DNS VS2.
To ensure that the system is not overloaded due to unstable connections, a mechanism is followed where any state update will be consumed after the send interval.
- If a SE gets the message from the Controller that the shard map has been changed, it won’t consume that immediately. It will start a timer of send_interval duration and wait for that time. This is to make sure that event has happened.
- After the send_interval is passed, it will recompute the hash and get to know if the state has changed or not. This ensures that a momentary event does not churn the entire system.
- In a scenario where SE flaps every 10-12 seconds, recalculating the hash with each flap and changing the state will make the entire system unstable.