Elastic High Availability for NSX Advanced Load Balancer Service Engines

This section explains elastic HA for NSX Advanced Load Balancer Service Engines (SE).

High Availability Modes

NSX Advanced Load Balancer supports the following two modes:

Service Engine Elastic HA mode: This combines scale-out performance and high availability
- N+M mode (the default mode)
- Active/ Active
Legacy HA mode: This enables a smooth migration from legacy appliance based load balancers.

Elastic HA N+M Mode

The N+M mode is the default mode of Elastic HA. In this mode, each virtual service is placed on only one SE.

The 'N' in N+M is the minimum number of SEs required to place virtual services in the SE group. This calculation is performed by the NSX Advanced Load Balancer Controller based on Virtual Services per Service Engine parameter. The 'N' varies over time as the virtual services are placed on or removed from the group. The maximum number of Service Engines is labeled 'E'.

The 'M'in N+M is the number of additional SEs the NSX Advanced Load Balancer Controller spins up to handle 'M' number of SE failures without reducing the capacity of the SE group. The 'M' appears in Buffer Service Engines field.

The minimum scale per virtual service is labeled as 'B' and the maximum scale per virtual service is labeled as 'C'.

Note:

The buffer SE in N+M mode is the number of SE failures that the system can tolerate for the virtual services to be up and operational (placed on atleast one SE), but not in the same capacity. In the SE Group, if a minimum scale per virtual service is set and an additional SE is required, then increase the buffer SE according to the calculations.

You can select N+M mode parameters by navigating to Infrastructure > Cloud Resources > Service Engine Group. You can either create a new SE group or edit the existing one.

High Availability Mode options are available under the Placement tab.

Elastic HA N+M Mode Example

In the Prior to SE Failure image there are twenty-four virtual service placements on an SE group.

With virtual services per SE set to 9, N is 3 (27/9 = 3)

With M = 1, a total of N+M = 3 + 1 = 4 SEs are required in the group.

Note that no single SE in the group is completely idle. The Controller places virtual services on all available SEs. In N+M mode, NSX Advanced Load Balancer ensures enough buffer capacity exists in aggregate to handle one (M=1) SE failure. In this example, each of the four SEs has six virtual services placed. A total of 12 spare slots are still available for additional virtual service placements, which is sufficient to handle one SE failure.

The New placements just after SE failure image shows the SE group just after SE2 has failed. The six virtual services from SE2 have been placed onto spare slots found on surviving SEs, namely, SE1, SE3, and S4.

Shows service engine failure before and after

The imbalance in loading disappears over time if one or both of two things happens:

New virtual services are placed on the group. As many as four virtual services can be placed without compromising the M=1 condition. They will be placed on SE5 because NSX Advanced Load Balancer chooses the least-loaded SE first.
The Auto-Rebalance option is selected.

With 'M' set to 1, the SE group is single-SE fault tolerant. Customers desiring multiple-SE fault tolerance can set 'M' higher. NSX Advanced Load Balancer permits 'M' to be dynamically increased by the administrator without interrupting any services. You can start with M=1 (typical of most N+M deployments), and increase it if the conditions warrant.

If an N+M group is scaled out to maximum number of Service Engines and 'N' times virtual services per SE is placed, then NSX Advanced Load Balancer will permit additional virtual service placements (into the spare capacity represented by 'M').

For a Write Access cloud, the Controller will attempt to recover the failed SE after five minutes by rebooting the virtual machine. After a further five minutes, the Controller will attempt to delete the failed SE virtual machine after which a new SE will be spun up to restore the configured buffer capacity.

With only three slots remaining after six replacements, if NSX Advanced Load Balancer's orchestrator mode is set to write access, NSX Advanced Load Balancer spins up SE5 to meet the M=1 condition whenever new VSs are placed in the SE Group. This new SE will have 9 new slots, and 4 new virtual services will occupy 4 of those SE5 slots as depicted in the Back to M=1 state image.

Shows SE group with failed SE2 and newreplacement SE5 populated

Note:

To provide time to identify the cause of a failure, the first SE that fails in an SE group is not automatically deleted even after five minutes. You can then perform troubleshooting on the failed SE and delete the virtual machine manually if restoration is not possible. The Controller will delete the SE virtual machine after three days if you have not manually deleted the same.

Elastic HA Active/Active

In active/active mode, NSX Advanced Load Balancer places each virtual service on more than one SE, as specified by Minimum Scale per Virtual Service parameter, the default minimum is two. If an SE in the group fails, then,

Virtual services that had been running are not interrupted. They continue to run on other SEs with degraded capacity until they can be placed once again.
If NSX Advanced Load Balancer’s orchestrator mode is set to write access, a new SE is automatically deployed to bring the SE group back to its previous capacity. After waiting for the new SE to spin up, the Controller places on it the virtual services that had been running on the failed SE.

Elastic HA Active/Active Example

This section illustrates SE failure and full recovery. It depicts an SE group with the following specifications:

Virtual Services per Service Engine = 3 (label A in the UI)
Minimum Scale per Virtual Service = 2
Maximum Scale per Virtual Service = 4
Max Number of Services Engines = 6 (label E)

In a span of time, five virtual services (VS1-VS5) are placed.NSX Advanced Load Balancer The VS3 is scaled from its initial two placements to a third place, illustrating support for 'N-way active' virtual services. This image depicts five virtual services placed on an active/active SE group.

As a result of the SE3 failing, one of the two VS2 instances and one of three VS3 instances also fails. However, the other three virtual services (VS1, VS4, VS5) are unaffected. Neither VS2 nor VS3 are interrupted, because these instances were placed on SE4, SE5, and SE6 previously and they continue to work with degraded performance. This image depicts a single SE failure in an active/active SE group.

The NSX Advanced Load Balancer Controller deploys SE7 as a replacement for SE3 and places VS2 and VS3 on it. This brings both virtual services up to their prior level of performance. This image depicts the recovery of a single SE in an active/active SE group.

Compact Placement

When Compact placement is enabled, NSX Advanced Load Balancer uses the minimum number of SEs required. When Distributed placement is enabled, NSX Advanced Load Balancer uses as many SEs as required within a limit allowed by maximum number of Service Engines. By default, Compact placement is enabled for Elastic HA, N+M (buffer) mode. And by default, Distributed placement is enabled for Elastic HA, Active/Active mode.

Compact Placement Example

The section describes the effect of compact placement on an Elastic HA, N+M mode SE group where the maximum number of Service Engines is four. In both the compact placement and distributed placement examples, you can observe the following:

Eight virtual services are created in sequence.
After VS1 is placed, SE2 is deployed because M=1 (handles one SE failure).
When VS2 requires placement, NSX Advanced Load Balancer assigns it to an idle SE2 to make the best use of all the running SEs.

At this point, placement behavior diverges and is as described as follows:

Compact Placement ON: Subsequent placements of VS3 through VS8 does not require additional SEs to maintain HA (M=1 => one SE failure). With Compact placement ON, NSX Advanced Load Balancer prefers to place virtual services on existing SEs.

Distributed Placement ON: Subsequent placements of VS3 and VS4 results in scaling the SE group out to its maximum number of four, illustrating NSX Advanced Load Balancer’s preference for performance at the expense of its resources. After reaching four deployed SEs which is the maximum number of SEs for this group, the NSX Advanced Load Balancer places virtual services VS5 through VS8 on pre-existing, least-loaded SEs. The Compact Placement ON and Compact Placement OFF images show the Elastic HA N+1 SE group with Compact placement ON and OFF. It has eight successive virtual service placements as shown.

Interaction of Compact Placement with Elastic HA Modes

The compact placement interacts in a subtle way with the elastic HA modes with respect to the timing.

Elastic HA N+M mode: Since the compact placement is ON by default in N+M mode, the NSX Advanced Load Balancer Controller deferred deployment of spare capacity is preferred instead of immediately packing the virtual services densely onto existing SEs.

Elastic HA active/active mode: Since the distributed placement option is ON by default in active/active mode, the NSX Advanced Load Balancer Controller delays the placement of VS2 and VS3 until the replacement of SE7 spin ups. Additional activities are not placed on the four surviving SEs (SE1, SE2, SE4, SE5). Instead, both virtual services are placed on a fresh SE so that all the virtual services perform like they did previously that is before the failure had taken place.

Auto-Rebalance

The Auto-Rebalance option applies only to the Elastic HA modes, and it is deactivated by default. If the Auto-Rebalance remains in not enabled then, an event is logged instead of performing migrations automatically. To enable Auto-Rebalance, see How To Configure Auto-rebalance Using NSX Advanced Load Balancer CLI.

If auto-rebalance is left in its default state, an event is logged instead of automatically performing migrations.