Avi Load Balancer adds load balancing capacity for a virtual service by placing the virtual service on additional Service Engines (SEs).

For instance, you can add capacity for a virtual service, if required, by scaling out the virtual service to additional SEs within the SE group, then removing (scaling in) the additional SEs when no longer needed. In this case, the primary SE for the virtual service coordinates the distribution of the virtual service traffic among the other SEs, while also continuing to process some of the virtual service’s traffic.

An alternative method for scaling a virtual service is to use a Border Gateway Protocol (BGP) feature, route health injection (RHI), with a layer 3 routing feature, equal-cost multi-path (ECMP). Using Route Health Injection (RHI) with ECMP for virtual service scaling avoids the managerial overhead placed upon the primary SE to coordinate the scaled out traffic among the SEs.

BGP is supported in legacy (active/ standby) and elastic (active/ active and N+M) high availability modes.

If a virtual service is marked down by its health monitor or for any other reason, the Avi Load Balancer SE withdraws the route advertisement to its virtual IP (VIP) and restores the same only when the virtual service is marked up again.

Notes on Limits

Service Engine Count:

The default max_scale_per_vs in a Service Engine Group is four SEs per virtual service and it can be configured to be up to 64 SEs.

Each SE uses RHI to advertise a /32 for v4 VIP and /128 for v6 VIP host route to the virtual service’s VIP address and can accept the traffic. The upstream router uses ECMP to select a path to one of the SEs.

The limit on SE count is imposed by the ECMP support on the upstream router. If the router supports up to 64 equal-cost routes, then a virtual service enabled for RHI can be supported on up to 64 SEs. Similarly, if the router supports a lesser number of paths, then the virtual service count enabled for RHI will be lower.

Subnets and Peers:

When advertising VIP over BGP, Avi Load Balancer supports advertising VIP through four different interfaces. Avi Load Balancer supports configuration of 64 peers in a VRF. Since a BGP peer is configured for an interface subnet, the 64 peers can be distributed in anyway among the interfaces. To illustrate::

  • A VIP can be advertised to 64 peers, all belonging to a single subnet.

  • A VIP can be advertised to 16 peers, each configured on each interface.

BGP-based Scaling

Avi Load Balancer supports the use of the following routing features to dynamically perform virtual service load balancing and scaling:

Route health injection (RHI):

RHI allows traffic to reach a VIP that is not in the same subnet as its SE. The Avi Load Balancer Service Engine (SE) where a virtual service is located advertises a host route to the VIP for that virtual service, with the SE’s IP address as the next-hop router address. Based on this update, the BGP peer connected to the Avi Load Balancer SE updates its route table to use the Avi Load Balancer SE as the next hop for reaching the VIP. The peer BGP router also advertises itself to its upstream BGP peers as a next hop for reaching the VIP.

Note:

In case of parent-child virtual service, the child VS is associated with parent VIP. Disabling or enabling Advertise VIP through BGP of parent VIP, the enable_rhi key will have impact but in case of child VS, flapping this configuration (enable_rhi key) will not have any affect. So in case of child VS, enable_rhi key is no op.

Equal cost multi-path (ECMP):

If the virtual service is scaled out to multiple Avi Load Balancer SEs, each SE advertises the VIP, on each of its links to the peer BGP router. The BGP peer router sees multiple next-hop paths to the virtual service's VIP and uses ECMP to balance traffic across the paths.

When a virtual service enabled for BGP is placed on Avi Load Balancer SE, it performs RHI for the virtual service’s VIP by advertising a host route (/32 for v4 VIP and /128 for v6 VIP network mask) over BGP sessions to the peers. The Avi Load Balancer SE sends the advertisement as a BGP route update to each of its BGP peers. When a BGP peer receives this update from the Avi Load Balancer SE, the peer updates its route table with a route to the VIP that uses the SE as the next hop. Typically, the BGP peer also advertises the VIP route to its other BGP peers.

The BGP peer IP addresses and the local Autonomous System (AS) number and a few other settings are specified in a BGP profile on the Avi Load Balancer Controller. RHI support is deactivated (default) or enabled within the individual virtual service’s configuration. If an Avi Load Balancer SE has more than one link to the same BGP peer, this also enables ECMP support for the VIP. The Avi Load Balancer SE advertises a separate host route to the VIP on each of the Avi Load Balancer SE interfaces with the BGP peer.

If the Avi Load Balancer SE fails, the BGP peers withdraw the routes that were advertised to them by the Avi Load Balancer SE.

Modifying BGP Profile

BGP peer changes are handled as follows:

  • If a new peer is added to the BGP profile, the virtual service IP is advertised to the new BGP peer router without needing to deactivate or enable the virtual service.

  • If a BGP peer is deleted from the BGP profile, any virtual service IPs that had been advertised to the BGP peer will be withdrawn.

  • When a BGP peer IP is updated, it is handled as an Add/ Delete of the BGP peer.

Configuring BGP Upstream Router

The BGP control plane can hog the CPU on the router in case of scale setups. Changes to CoPP policy are needed to have more BGP packets on the router, or this can lead to BGP packets getting dropped on the router when churn happens.

Note:

The ECMP route group or ECMP next-hop group on the router can exhaust if the unique SE BGP next-hops advertised for a different set of virtual service VIPs. When such exhaustion happens, the routers may fall back to a single SE next-hop causing traffic issues.

Sample Configuration

The following is the sample configuration on a Dell S4048 switch for adding 5k network entries and 20k paths:

w1g27-avi-s4048-1#show ip protocol-queue-mapping
 Protocol   Src-Port   Dst-Port   TcpFlag  Queue   EgPort     Rate (kbps)
 --------   --------   --------   -------  -----   ------     -----------
TCP (BGP)     any/179    179/any    _        Q9      _           10000
UDP (DHCP)    67/68      68/67      _        Q10     _           _
UDP (DHCP-R)  67         67         _        Q10     _           _
TCP (FTP)     any        21         _        Q6      _           _
ICMP          any        any        _        Q6      _           _
IGMP          any        any        _        Q11     _           _
TCP (MSDP)    any/639    639/any    _        Q11     _           _
UDP (NTP)     any        123        _        Q6      _           _
OSPF          any        any        _        Q9      _           _
PIM           any        any        _        Q11     _           _
UDP (RIP)     any        520        _        Q9      _           _
TCP (SSH)     any        22         _        Q6      _           _
TCP (TELNET)  any        23         _        Q6      _           _
VRRP          any        any        _        Q10     _           _
MCAST         any        any        _        Q2      _           _
w1g27-avi-s4048-1#show cpu-queue rate cp
 Service-Queue         Rate (PPS)      Burst (Packets)
 --------------        -----------      ----------
Q0                        600             512
Q1                        1000            50
Q2                        300             50
Q3                        1300            50
Q4                        2000            50
Q5                        400             50
Q6                        400             50
Q7                        400             50
Q8                        600             50
Q9                        30000           40000
Q10                       600             50
Q11                       300             50

Bidirectional Forwarding Detection (BFD)

BFD is supported for the fast detection of failed links. BFD enables networking peers on each end of a link to quickly detect and recover from a link failure. Typically, BFD detects and repairs a broken link faster than by waiting for BGP to detect the downlink.

For instance, if an Avi Load Balancer SE fails, BFD on the BGP peer router can quickly detect and correct the link failure.

Note:

The BFD feature supports BGP multi-hop implementation.

Authenticating Message Digest5 (MD5)

BGP supports an authentication mechanism using the Message Digest 5 (MD5) algorithm. When authentication is enabled, any TCP segment belonging to BGP exchanged between the peers, is verified and accepted only if authentication is successful. For authentication to be successful, both the peers must be configured with the same password. If authentication fails, the BGP peer session will not be established. BGP authentication can be very useful because it makes it difficult for any malicious user to disrupt network routing tables.

Enabling MD5 Authentication for BGP

To enable MD5 authentication, specify md5_secret in the respective BGP peer configuration. MD5 support is extended to OpenShift cloud where the Service Engine runs as docker container but peers with other routers masquerading as host.

Enabling BGP Features in Avi Load Balancer

Configuration of BGP features in Avi Load Balancer is accomplished by configuring a BGP profile, and by enabling RHI in the virtual service’s configuration.

  • Configure a BGP profile. The BGP profile specifies the local Autonomous System (AS) ID that the Avi Load Balancer SE and each of the peer BGP routers are in, and the IP address of each peer BGP router.

  • Enable the Advertise VIP using the BGP option on the Advanced tab of the virtual service’s configuration. This option advertises a host route to the VIP address, with the Avi Load Balancer SE as the next hop.

Note:

When BGP is configured on global VRF on LSC in-band, BGP configuration is applied on SE only when a virtual service is configured on the SE. Till then peering between SE and peer router will not happen.