Traffic flows on stateful Tier-0 and Tier-1 gateways configured in active-active HA mode.
South-North Traffic Flow
- Based on a deterministic hash, an incoming packet from a southbound VM is punted to the backplane of the Edge-2.
- Edge-2 determines that Edge-4 is actively managing the traffic flows and forwards the flow out through the external interfaces (which are part of the interface group).
- An IP hash is performed, based on external server destination IP, and traffic is punted from Edge-2 to Edge-4. The packet is further forwarded to Tier-0 gateway service router (SR), where SNAT changes the source IP address to translated IP address.
- After the flow reaches Edge-4 Tier-0 SR, the shadow port forwards the NAT traffic to the uplink interface and then sent out to the physical router.
- If Tier-0 SR on Edge-4 fails, NSX punts traffic to its backup node in the sub-cluster, Edge-3, where SNAT changes the source IP address to translated IP address. The backup interface on Edge-1 takes over the backplane IP and the uplink IP of Tier-0 gateway before beginning to process traffic. The backup interface on Edge-3 is operationally Up and the shadow interface on Edge-4 is Down.
- All traffic flows processed by firewall and NAT rules are synchronized on the Tier-0 SR on Edge-3.
- When Edge-4 comes back up, the flow is resynchronized back to Edge-4. When the shadow port comes back up, NSX punts traffic to it.
North-South Traffic Flow
- A packet from a northbound VM is hashed by the physical router using its own hashing algorithm to send the packet to Edge-3, based on an ECMP routing choice. The Tier-0 gateway is running on Edge-3.
- Edge-3 determines that Edge-4 is actively managing the traffic flow and forwards the flow to Edge-4. The flow is managed by the shadow interface of Edge-4.
- An IP hash is performed, based on external server source IP, traffic is punted from Edge-3 Tier-0 SR to Edge-4 Tier-0 SR, where NAT is enabled. The source IP is changed to the translated IP address.
- The packet is sent from Edge-4 Tier-0 SR to Edge 4 Tier-0 DR and then to Tier-1 gateway, finally reaching the destination VM.
- If Tier-0 service router on Edge-4 fails, NSX punts traffic to its peer node (sub-cluster 2), which is Edge-3. NAT enabled on Edge-3 changes the source IP address to translated IP address.
- Before beginning to process traffic, the backup shadow port on Edge-3 manages the traffic flow. Now, the backup shadow port on Edge-3 is operationally Up and the shadow port on Edge-4 is Down.
- All traffic flows processed by firewall and NAT rules are synchronized on the Tier-0 SR on Edge-3.
- When Edge-4 comes back up, the flow is resynchronized back to it. The shadow port on Edge-4 comes back up and manages the punted traffic.
Sub-cluster Failure
If both the nodes in a sub-cluster go down, the sub-cluster goes down.
- Existing flows are disrupted causing traffic loss.
- New flows are punted to the other sub-cluster.
- When the failed sub-cluster comes back up again, the flows return to the original sub-cluster.
If a sub-cluster goes down for any reason, then the other sub-cluster in the cluster takes over.
Single Node Failure
On failure of an Edge node , the following events happen:
- Interface links of the Edge node fail.
- The shadow port on the failed Edge node is in Down state.
- The backup port of the peer node in the sub-cluster takes over.
- The firewall and the NAT states are synchronized on the peer Edge node.
- The backup port on the peer node provides connectivity to new traffic flows.
- When interface links of the failed node comes back up, the firewall and the NAT states are resynchronized with the shadow port of the active node.
- NSX punts back traffic flows to the original node.