For a site deployed in a High Availability topology where BGP is also used, an HA failover can be both slow and disruptive to customer traffic because the peer Edges have deleted all the routes on a failover. In Release 5.1.0 and later VMware adds the BGP Graceful Restart feature for HA deployments which ensures faster and less disruptive HA failovers.
Overview
BGP Graceful Restart with Graceful Switchover ensures faster Edge restarts and HA failovers by having the neighboring BGP devices participate in the restart to ensure that no route changes occur in the network for the duration of the restart. Without BGP Graceful Restart, the peer Edge deletes all routes once the TCP session terminates between BGP peers and these routes need to be rebuilt post Edge restart or HA failover. BGP Graceful Restart changes this behavior by ensuring that peer Edges retain routes as long as a new session is established within a configurable restart timer.
Prerequisites
- A site deployed with a High Availability topology. This can be either Active/Standby or VRRP with 3rd party router. BGP Graceful Restart does not have any effect on a standalone Edge site, only on sites using HA.
- The customer enterprise must have BGP configured as the routing protocol.
Configuring BGP Graceful Restart
- Activate Graceful BGP Restart on .
- In the Customer portal, click either depending on your preferences. The screenshots will show the steps for a single HA Edge.
- Click the Device icon next to an Edge, or click the link to the Edge, and then click the Device tab.
- Scroll down to the Routing & NAT section and open up the BGP section for the Edge or Profile.
- In the BGP section check the box for Graceful Restart.
- Once the box is checked, two additional parameters appear related to Enable Graceful Restart: Restart Time, and Stalepath Time:
- Restart Time represents the maximum time the route processor (RP) waits for the RP peer to begin talking before expiring route entries. The default time for this parameter is 120 seconds and can be manually configured withing a range of 1 to 600 seconds.
- Stalepath Time represents the maximum time routes are retained after a restart (HA failover). Updated routes from a route processor peer are expected to have been received by this time. The default time for this parameter is 300 seconds and can be manually configured within a range of 1 to 3600 seconds.
- Once the user has activated BGP Graceful Restart and is satisfied with the two secondary settings, a user can then move to the High Availability section.
- Activate Graceful Switchover on .
- From the BGP section, scroll down to the High Availability section.
- In the High Availability section the option to check the box for Graceful Switchover is now available as a result of BGP Graceful Restart being activated.
- Check the box for Graceful Switchover.
- Nothing further is required in the High Availability section and there are no secondary parameters for Graceful Switchover.
- Scroll down to the bottom of the Save Changes in the bottom right corner. This applies the configuration changes made above. page and click
Limitations/Known Behaviors
- BGP Graceful Failover and HA Graceful Switchover are segment agnostic and when activated on one segment (for example, the Global Segment) these settings are applied to all other segments on a customer site. This means that the Edge will synchronize routes on other segments and hold stale routes during an HA failover.