NSX Advanced Load Balancer’s behaviour after a failure in a GSLB deployment depends on whether the failure is

  1. At a leader site or one of the followers

  2. Of the entire site or just the NSX Advanced Load Balancer Controller

Follower Site Failures

Note:

In our follower-site failure examples, we focus on infrastructure deployed in Santa Clara, Chicago, and New York.

Full-Site Failure
A full-site failure occurs at the NY-1 follower site, as shown below.

  1. The leader in Santa Clara and Chicago are active sites and therefore detect the failure.

  2. Administrative changes to the GSLB configuration continue to be possible on the leader, but they will not make it to the NY-1 site.

  3. Both control-plane and data-plane health monitors will mark NY-1's GS members Down. Refer toNSX Advanced Load Balancer GSLB Service and Health Monitors for more details.

  4. DNS service for the GSLB configuration remains operational at the two surviving sites.

  5. Global application service will continue on the surviving sites (Santa Clara, Chicago, and NY-2).

Partial-site Failure

If only the NSX Advanced Load Balancer Controller at the NY-1 site fails, the SEs continue to serve applications in headless mode.

  1. The leader and Chicago Controllers detect the failure using their control-plane monitors.

  2. Any administrative changes made on the leader do not propagate to the NY-1 site.

  3. Data-plane health monitors running in Santa Clara and Chicago continue to perceive NY-1's members as Up.

  4. DNS service for the GSLB configuration remains operational at all three sites (because it comes from SEs, none of which have failed).

  5. Global application service continues on all four sites (Santa Clara, Chicago, NY-1, and NY-2).

Follower Site Recovery

The following holds true for either full-site or partial-site failures.

  1. The leader Controller in Santa Clara detects connectivity to the (newly rebooted) follower Controller at NY-1. The latest GSLB configuration is pushed to it.

  2. Other active sites likewise detect successful connectivity to the NY-1 follower Controller as a result of their control-plane health monitors.

  3. If the data-plane never went down (partial-site failure), no more action is required.

  4. If data-plane monitors for NY-1's GS members had been configured and previously marked NY-1's GS members as Down, NY-1's members will be marked Up and traffic to them will resume only after those data-plane monitors once again perceive good health.

Leader Site Failures

Full-Site Failure
A full-site failure occurs at the Santa Clara leader site, as shown below.

  1. As they are active sites, both Chicago and NY-1 detect the failure.

  2. No administrative changes to the GSLB configuration can be made.

  3. Both control-plane and data-plane health monitors mark Santa Clara's GS members as Down.

  4. DNS service for the GSLB configuration remains operational at the two surviving active sites (Chicago and NY-1).

  5. Global application service continues on the three surviving sites (Chicago, NY-1, and NY-2).

Partial-Site Failure

If only the NSX Advanced Load Balancer Controller at the Santa Clara site fails, the site’s SEs continue to serve applications in headless mode.

  1. As they are active sites, both Chicago and NY detect the Controller failure using their control-plane health monitors.

  2. No administrative changes to the GSLB configuration can be made.

  3. Data-plane health monitors running in Chicago and NY continue to perceive Santa Clara's members as Up.

  4. DNS service for the GSLB configuration remains operational at Santa Clara, Chicago, and NY-1.

  5. Global application service will continue on all sites.

Site Configuration Errors

Errors in site configuration related to IP address and credentials show up when the site information is saved. Some sample error screens are as follows:

Authentication Failure

The username and password for the admin of Boston site can be unique to that site, or the same credentials used at all NSX Advanced Load Balancer GSLB sites.

Max Retry Login Failure

Appropriately authenticated individuals log into a leader to perform GSLB-related functions, such as to read a GSLB configuration or to make changes to it. In addition, behind the scenes, the leader GSLB site will robotically log into a follower GSLB site to pass on configuration changes that can only be initiated from the leader. In both cases, a login attempt lockout rule may be in force, whereby a certain number of failures results in locking out the administrative account for some specified number of minutes (default = 30 minutes).

Redress

When defining a new GSLB configuration or adding a GSLB site to an existing configuration, one specifies account credentials to be associated with the site. It is a best practice to define the same GSLB administrative account (e.g., gslbadmin) for all participating GSLB sites. By associating with that account the No-Lockout-User-Account-Profile as shown below, one can eliminate max retry login failures.

To track robotic actions separately from those of GSLB administrator personnel, assign staff members different, individually IDs of their own.

HTTP 400 Error

There are several GSLB contexts in which a 400 error may occur. This particular example illustrates an understandable restriction: An NSX Advanced Load Balancer site can participate exactly in one GSLB configuration. Invitations to join a second are rejected.