For a site deployed in a High Availability topology where BGP is also used, an HA failover can be both slow and disruptive to customer traffic because the peer Edges have deleted all the routes on a failover. In Release 5.1.0 and later VMware adds the BGP Graceful Restart feature for HA deployments which ensures faster and less disruptive HA failovers.

Overview

BGP Graceful Restart with Graceful Switchover ensures faster Edge restarts and HA failovers by having the neighboring BGP devices participate in the restart to ensure that no route changes occur in the network for the duration of the restart. Without BGP Graceful Restart, the peer Edge deletes all routes once the TCP session terminates between BGP peers and these routes need to be rebuilt post Edge restart or HA failover. BGP Graceful Restart changes this behavior by ensuring that peer Edges retain routes as long as a new session is established within a configurable restart timer.

Note: BGP Graceful Restart is for sites deployed in High-Availability only. This feature is not yet available for sites deployed with a single, standalone Edge even if it uses the BGP routing protocol.

Prerequisites

To use the BGP Graceful Restart feature, a customer site must have the following.
  • A site deployed with a High Availability topology. This can be either Active/Standby or VRRP with 3rd party router. BGP Graceful Restart does not have any effect on a standalone Edge site, only on sites using HA.
  • The customer enterprise must have BGP configured as the routing protocol.
Important: To fully optimize the benefits of BGP Graceful Restart it is strongly recommended that Distributed Cost Calculation (DCC) is also activated for the customer enterprise. With DCC activated, preference and advertisement decisions are local to the Edge and the Edge synchronizes from Active to Standby as soon as it learns the routes from the routing process. DCC's value is not limited to HA sites, and for more information on this feature see VMware SD-WAN Routing Overview and Configure Distributed Cost Calculation.

Configuring BGP Graceful Restart

Configuring BGP Graceful Restart is a two part process, the first part being done on the BGP configuration section, and the second part in the High Availability configuration section. The steps are:
  1. Activate Graceful BGP Restart on Configure > Device > BGP.
    1. In the Customer portal, click either Configure > Profile or > Configure > Edges depending on your preferences. The screenshots will show the steps for a single HA Edge.
    2. Click the Device icon next to an Edge, or click the link to the Edge, and then click the Device tab.
    3. Scroll down to the Routing & NAT section and open up the BGP section for the Edge or Profile.

    4. In the BGP section check the box for Graceful Restart.

    5. Once the box is checked, two additional parameters appear related to Enable Graceful Restart: Restart Time, and Stalepath Time:
      1. Restart Time represents the maximum time the route processor (RP) waits for the RP peer to begin talking before expiring route entries. The default time for this parameter is 120 seconds and can be manually configured withing a range of 1 to 600 seconds.
      2. Stalepath Time represents the maximum time routes are retained after a restart (HA failover). Updated routes from a route processor peer are expected to have been received by this time. The default time for this parameter is 300 seconds and can be manually configured within a range of 1 to 3600 seconds.
    6. Once the user has activated BGP Graceful Restart and is satisfied with the two secondary settings, a user can then move to the High Availability section.
  2. Activate Graceful Switchover on Configure > Device > High Availability.
    1. From the BGP section, scroll down to the High Availability section.

    2. In the High Availability section the option to check the box for Graceful Switchover is now available as a result of BGP Graceful Restart being activated.
    3. Check the box for Graceful Switchover.
    4. Nothing further is required in the High Availability section and there are no secondary parameters for Graceful Switchover.
  3. Scroll down to the bottom of the Configure > Device page and click Save Changes in the bottom right corner. This applies the configuration changes made above.

Limitations/Known Behaviors

  • BGP Graceful Failover and HA Graceful Switchover are segment agnostic and when activated on one segment (for example, the Global Segment) these settings are applied to all other segments on a customer site. This means that the Edge will synchronize routes on other segments and hold stale routes during an HA failover.