This section explains how HA operates within an NSX Advanced Load Balancer Controller cluster.

Quorum

NSX Advanced Load Balancer Controller-level HA requires a quorum of Controller nodes to be up. In a three-node Controller cluster, quorum can be maintained if at least two of the three Controller nodes are up. If one of the Controllers fails, the remaining two nodes continue service and NSX Advanced Load Balancer continues to operate. However, if two of the three nodes go down, the entire cluster goes down, and NSX Advanced Load Balancer stops working.

Failover

Each Controller node in a cluster periodically sends heartbeat messages to the other Controller nodes in the cluster through an encrypted SSH tunnel using TCP port 22 (port 5098 if running as Docker containers).

The heartbeat interval is ten seconds. The maximum number of consecutive heartbeat messages that can be missed is four. If one of the Controllers does not hear from another Controller for 40 seconds (four missed heartbeats), the other Controller is assumed to be down.

If only one node is down, quorum is maintained, and the cluster can continue to operate.

  • If a follower goes down but the leader node remains up, access to virtual services continues without interruption.

  • If the primary (leader) node goes down, the member nodes form a new quorum and elect a cluster leader. The election process takes about 50-60 seconds, and during this period, there is no impact on the data plane. The SEs will continue to operate in the headless mode, but the control plane service will be unavailable. During this period, users will be unable to create a VIP through LBaaS or use the NSX Advanced Load Balancer UI, API, or CLI.