This section explains how high availability (HA) operates within NSX Advanced Load Balancer Controller cluster.

Quorum

NSX Advanced Load Balancer Controller-level HA requires a quorum of NSX Advanced Load Balancer Controller nodes to be up. In a 3-node NSX Advanced Load Balancer Controller cluster, quorum can be maintained if at least 2 of the 3 NSX Advanced Load Balancer Controller nodes are up. If one of the Controllers fails, the remaining 2 nodes continue service and NSX Advanced Load Balancer continues to operate. However, if 2 of the 3 nodes go down, then the entire cluster goes down, and NSX Advanced Load Balancer stops working.

Failover

Each NSX Advanced Load Balancer Controller node sends heartbeat messages to the other NSX Advanced Load Balancer Controller nodes in a cluster periodically, through an encrypted SSH tunnel using TCP port 22 (port 5098 if running as Docker containers).



The heartbeat interval is 10 seconds. The maximum number of consecutive heartbeat messages that can be missed is 4. If one of the NSX Advanced Load Balancer Controller's does not hear from another NSX Advanced Load Balancer Controller for 40 seconds (4 missed heartbeats) then, the other NSX Advanced Load Balancer Controller is assumed to be down.

If only one node is down then, the quorum is still maintained and the cluster can continue to operate. Other two scenarios are as given below:

  • If a follower node goes down but the Primary (leader) node remains up then, the access to virtual services continues without any interruption.



  • If the Primary (leader) node goes down, the member nodes form a new quorum and elect a cluster leader. The election process takes about 50-60 seconds and during this period, there is no impact on the data plane. The SEs will continue to operate in the "Headless mode", but the control plane service will not be available. During this period, the users cannot create a VIP through LBaaS or use the NSX Advanced Load Balancer user interface, API, or CLI.