When an active NSX Edge fails, the NSX Control Plane ensures the failover happens to the standby NSX Edge node.

In your topology, where you have created L2 segments across multiple sites, each segment relies on two NSX Edge nodes that are in an Active/Standby configuration. For example, as shown in image 1, NSX Edge 1 is Active and NSX Edge 2 is standby node.

For communication between these NSX Edge that are not on the same subnet, communication between the stretched L2 segments happen over the NSX Edge VTEP addresses.

To ensure HA functionality for NSX Edge nodes, each NSX Edge node communicates the VTEP Group State Message to the control plane. In turn, the Control Plane communicates all VTEP group information received from the NSX Edge nodes to all transport nodes that host these stretched segments. The VTEP Group Message includes information about the latest state ofNSX Edge nodes - Active/Standby.

If NSX Edge 1 (active node) shuts down ungracefully or fails or goes into maintenance mode, the control plane cleans up or remove the VTEP entry of the failed NSX Edge node. So, the transport nodes know that the standby node is now the active node. When the stretched segments want to send traffic, they reach the active NSX Edge node and not to the failed one.

To view the HA state of the Active Edge VTEP after a failover:

Note: This is applicable to both preemptive and non-preemptive modes for Tier-0 or Tier-1 gateways.

Prerequisites

Procedure

  1. Copy the UUID of the stretched segment that is attached to a Tier-1 or Tier-0 gateway.
  2. Run get vtep-group to know the VTEP groups present on the NSX Edge node.
  3. Verify whether the output of the get vtep-group command has the stretched segment UUID.
  4. Copy the VTEP-Group ID corresponding to the segment.
  5. In the NSX Manager node CLI terminal, run get vtep-group <vtep-group-ID> vteps-staleness-status.
    VNI           IP                   MAC            LABEL       Segment      TransportNode-Id                      TN-Connection   HA-STATE  STALE-RECORD
    26625     172.20.1.151     00:0c:29:9e:64:5e      0x18001    172.20.1.0   32330174-32bc-11ee-8063-000c299e6454        true         ACTIVE     False
    26625     172.20.1.152     00:0c:29:ea:8e:aa      0xFC01     172.20.1.0   914d0362-32bc-11ee-ba27-000c29ea8ea0        true        STANDBY     False
    The output displays the HA state of both NSX Edge nodes. And the stale record is False, which indicates that the HA state is accurate.
  6. Verify that one of the NSX Edge nodes is active and the other one is in Standby mode.
  7. If the active NSX Edge goes down, the HA state mode changes. The Standby node becomes the Active node.
  8. Run get vtep-group <vtep-group-ID> vteps-staleness-status.
     VNI        IP                  MAC              LABEL       Segment        TransportNode-Id                     TN-Connection   HA-STATE  STALE-RECORD
     26625   172.20.1.151    00:0c:29:9e:64:5e       0x18001    172.20.1.0    32330174-32bc-11ee-8063-000c299e6454        true         ACTIVE      True
     26625   172.20.1.152    00:0c:29:ea:8e:aa       0xFC01     172.20.1.0    914d0362-32bc-11ee-ba27-000c29ea8ea0        true         ACTIVE     False

    In the output, NSX Edge with 172.20.1.151 address has gone down and hence the Stale-Record is True.

Results

The NSX Control Plane ensures that stale entries of failed NSX Edge nodes are correctly recorded in the output of the VTEP groups.