NSX Advanced Load Balancer GSLB is comprised of leader and follower sites. Any federated object in the GSLB configuration must be configured on the leader site. The leader site eventually replicates the object to the follower sites.

Whenever any GSLB configuration is done on the GSLB leader site, the configuration is propagated to the active follower site in the following ways:

  1. Continuous Replication mode (Default mode)

  2. Adaptive Replication mode

  3. Manual Replication Mode

Continuous Replication Mode

The continuous replication method is the default method for configuration synchronization across all the sites. The configuration synchronization across the GSLB follower sites from the GSLB leader site is automatic and instant as soon as any configuration change is performed on the leader site. The configuration replication is initiated automatically by the leader site to all the follower sites as soon as any configuration change is performed. This method of replication is called the continuous replication method.

In this mode, the leader site maintains a replication queue for the configuration to all the GSLB sites. Once the configuration change is completed on the leader site, it is automatically pushed to the follower sites. This mode of replication works well in most of the use cases. One limitation is that the user does not have control over the replication, as it is an automatic process initiated by the leader site.

Adaptive Replication Mode

In adaptive replication mode, GSLB configuration changes made on the NSX Advanced Load Balancer are processed on the leader site. The NSX Advanced Load Balancer looks for local feedback on the leader site before replicating to the other active follower sites, that is, federated objects or GSLB configuration changes are first applied to the leader, and their feedback is considered in the replication decision. If it runs on the leader site successfully, the configuration changes are propagated to the active follower sites.

DNS service is mission-critical, and it is essential to have maximum uptime. In GSLB, as config gets propagated to all sites or locations, a minor error can lead to issues on multiple data centers resulting in application failure. To prevent this, adaptive replication becomes essential, as any faulty configuration object will not be replicated to the follower sites.

Using adaptive replication mode, if any federated config object causes an issue on the local site, it will not be replicated to peer follower sites, and replication stalls. When replication stalls, adaptive replication generates an event to notify that replication has stalled, along with a reason and a possible recommendation.

Currently, adaptive replication generates events in two cases, namely:

  • When configured domains or subdomains are not hosted by existing and enabled DNS virtual services.

  • When any federated config version causes replication to stall.

Use Case - Adaptive Replication

At times of peak traffic and major events, having the ability to change the configuration is crucial. The following is an example of an event.

This event was raised when configured GSLB domains, namely, com and local were not hosted by the enabled DNS virtual service. Also, the DNS virtual service was deactivated.



Enable Adaptive Replication

  1. Navigate to Infrastructure > GSLB > Site Configuration. Click the pencil icon.

  2. Select the Adaptive option in Replication Mode.

  3. Click Save.

Adaptive Replication Status

GSLB site configuration screen shows the replication status. If you point to replication status, the pop-up window appears showing the status based on the replication state.

The following parameters are displayed:

Replication Status

The common statuses are Sync In Progress, In Sync, and Sync Stalled.

Number of Pending Objects

The number of GSLB config objects to be replicated.

Reason

The reason shows the possible cause that is triggering replication issue.

Recommendation

This shows some hints or recommended way to resolve the replication issue. If replication issue cannot be determined, the recommendation will have Contact VMware support team message is displayed.



Note:

There are few mandatory requirements for adaptive replication to work, such as:

  • All GSLB configured domains or subdomains must be placed on existing and enabled DNS virtual service on the leader.

  • All the DNS virtual services involved in GSLB domains or subdomains must be on an adaptive compatible version (21.1.3 or above) on the leader.

Manual Replication Mode

While replicating configuration across GSLB sites, you can:

  • Select the follower site.

  • Select the time when a replication must be performed, considering the minimum impact to end-users and other factors.

In manual replication, the configuration sync is initiated based on the pull request by a specific follower site. The user can opt for configuration synchronization as required. This helps in avoiding application downtime or inaccessibility for the selected GSLB sites. In the manual replication method, the configuration synchronization is not initiated instantly from the leader site. This replication method is more controlled than the continuous one.

You can use the manual mode to replicate the GSLB configuration from the leader to the follower sites. The admin creates the manual checkpoints on NSX Advanced Load Balancer. GSLB leader site replicates all federated objects to peer sites till the checkpoint.

Manual Replication mode is useful in the new application deployment to GSLB sites when the accessibility to the other applications and GSLB sites is critical during peak hours of application usage. Any configuration changes can be deployed to the leader site, and changes are pushed to the follower, only when there is low traffic or when the impact for the end-users is minimal. The follower site requests for replication, only when assured that the new changes are not disrupting the application.

When there is a pull request from the follower site for a specific checkpoint replication, the leader pushes the configuration until the specific checkpoint, not for the complete configuration.

For manual replication, do the following:

  • Perform CUD (create, update, or delete) operations on the leader site.

  • Verify the application on the leader.

  • Create a configuration checkpoint on the leader.

  • Replicate or initiate the GSLB synchronization to the selected follower site at the scheduled time from the GSLB leader site.

  • Test the newly synchronized applications on the follower site.

  • Schedule a change window for the other follower sites, and use the previously created checkpoint on the leader site as the reference point for the replication process.

Enable Manual Replication

The manual replication method uses a checkpoint on the leader site, to define the configuration that followers can safely consume and use as the reference point for the last-saved configuration.

  1. Login to the NSX Advanced Load Balancer UI and navigate to Infrastructure > GSLB > Site Configuration. The leader site is siteA, and the follower site is siteB.



  2. Login to the GSLB leader site (siteA) using NSX Advanced Load Balancer CLI. Create a checkpoint CP1 using the configure federationcheckpoint >checkpoint name> command.

    [admin:controller]: > configure federationcheckpoint CP1 [admin:controller]: > save

    Checkpoint can be created using NSX Advanced Load Balancer UI too. Navigate to Infrastructure > GSLB > Federation Checkpoints. Click Create to create a new checkpoint CP1.



  3. Enable the manual replication mode on the leader site, and use CP1 as the checkpoint reference for the manual mode.

    [admin:controller]: > configure gslb Default
    [admin:controller]: gslb> replication_policy replication_mode replication_mode_manual checkpoint_ref CP1
    [admin:controller]: gslb:replication_policy> save


  4. Perform the scheduled operation on the desired applications or create new applications as required.

    For demonstration purposes, the application APP1 is added on the leader site, as shown below.



  5. After adding the application App1, perform the required performance and stability checks for the application App1 as needed.

  6. After the changes, create a new checkpoint – Create another checkpoint CP2 if the configuration changes are eligible to be replicated across the other follower sites. Navigate to Infrastructure > GSLB > Federation Checkpoints. Click Create to create a new checkpoint CP2.

  7. Changing a checkpoint to an active checkpoint–To make the checkpoint CP2 as the reference point for the replication, select it as the active checkpoint. On the NSX Advanced Load Balancer UI, the active checkpoint (CP1) is available with the star mark.

    1. Click the star icon or set to Active option to make the CP2 checkpoint the active checkpoint.

    2. The star sign will be moved to the CP2 checkpoint, which is currently active.

  8. If CP2 is selected as the active checkpoint for the follower replication process, any configuration changes made after it is created will not be replicated to the active follower site.

  9. Configuration Synchronization – Select the active follower site for which the replication is required. Click the Synced Till Checkpoint option to start replicating configuration to the selected active follower site. Click the Kick off Replication option to proceed with the GSLB replication to the active follower site.

  10. Check the configuration update status on the follower site siteB. The selected follower site reflects the configuration changes which are replicated from the leader site. It can be observed that the new application APP1 is available on the follower site too.

Limitations of Replication Modes

This following are the limitations of Replication:

  • The replication queue is maintained in the memory of the NSX Advanced Load Balancer SE. For large-scale deployments, the replication queue grows drastically if the replication is not completed before the warm restart.

  • When the configuration replication process among sites experience any issue, the accurate replication status of configuration objects spanning across different sites will not be available.