High Availability Design for the NSX Edge Nodes for a Virtual Infrastructure Workload Domain

In this design, the NSX Edge cluster for a VI workload domain runs on a shared edge and workload cluster. vSphere HA and vSphere DRS protect the NSX Edge appliances. In an environment with multiple availability zones, to configure the first availability zone as the main location for the NSX Edge nodes, you use vSphere DRS.

NSX Edge Cluster Design

The NSX Edge cluster is a logical grouping of NSX Edge transport nodes. These NSX Edge appliances run on a vSphere cluster, and provide north-south routing and network services for customer workloads. You can dedicate this cluster only to edge appliances or you can share with customer workloads.

Shared edge and workload cluster in the VI workload domain: The cluster in the VI workload domain contains NSX Edge components and customer workloads. See the vSphere Cluster Design for a Virtual Infrastructure Workload Domain.
Dedicated edge cluster: A dedicated edge vSphere cluster contains only NSX Edge appliances for the VI workload domain.

Note:

For added availability, the NSX Edge appliances, which comprise an NSX Edge cluster, can be deployed across different vSphere clusters. Networks are not required to be stretched between the vSphere clusters because each Edge appliance can be connected to different VLANs.

Table 1. Design Decisions on the NSX Edge Cluster Configuration
Decision ID	Design Decision	Design Justification	Design Implications
VCF-WLD-NSX-EDGE-CFG-002	Deploy the NSX Edge virtual appliances in a shared edge and workload cluster in the VI workload domain.	Keeps customer network traffic local to the VI workload domain. Simplifies configuration and minimizes the number of ESXi hosts required for initial deployment.	NSX Edge appliances are co-located with customer workloads. Ensure that customers workloads do not prevent NSX Edge nodes from handling network traffic.
VCF-WLD-NSX-EDGE-CFG-003	Deploy two NSX Edge appliances in the edge cluster in the shared edge and workload cluster.	Creates the edge cluster for satisfying the requirements for availability and scale.	None.
VCF-WLD-NSX-EDGE-CFG-004	Create a resource pool for the NSX Edge appliances in the root of the shared edge and workload cluster object. Create a resource pool in the root of the shared edge and workload cluster object for customer workloads.	Guarantees that the edge cluster receives sufficient compute resources during times of contention.	Customer workloads must be deployed to a separate Resource Pool at the root of the cluster. To ensure adequate resources for customer and control plane workloads, the root of the cluster must not run any virtual machines. Customer workloads might not be able to use their allocated memory during times of contention.
VCF-WLD-NSX-EDGE-CFG-005	Configure the edge resource pool with a 64-GB memory reservation and normal CPU share value.	Guarantees that the edge cluster receives sufficient memory resources during times of contention.	Edge appliances might not be able to use their allocated CPU capacity during times of contention.
VCF-WLD-NSX-EDGE-CFG-006	Apply VM-VM anti-affinity rules for vSphere DRS to the virtual machines of the NSX Edge cluster.	Keeps the NSX Edge nodes running on different ESXi hosts for high availability.	None.
VCF-WLD-NSX-EDGE-CFG-007	In vSphere HA, set the restart priority policy for each NSX Edge appliance to high.	The NSX Edge nodes are part of the north-south data path for overlay segments. vSphere HA restarts the NSX Edge appliances first so that other virtual machines that are being powered on or migrated by using vSphere vMotion while the edge nodes are offline lose connectivity only for a short time. Setting the restart priority to high reserves highest for future needs.	If the restart priority for another customer workload is set to highest, the connectivity delays for other virtual machines will be longer.
VCF-WLD-NSX-EDGE-CFG-008	Configure all edge nodes as transport nodes.	Enables the participation of edge nodes in the overlay network for delivery of services to the SDDC workloads such as routing and load balancing.	None.
VCF-WLD-NSX-EDGE-CFG-009	Create an NSX Edge cluster with the default Bidirectional Forwarding Detection (BFD) configuration between the NSX Edge nodes in the cluster.	Satisfies the availability requirements by default. Edge nodes must remain available to create services such as NAT, routing to physical networks, and load balancing.	None.

High Availability for a Single VMware Cloud Foundation Instance with Multiple Availability Zones

NSX Edge nodes connect to top of rack switches in each data center to support northbound uplinks and route peering for SDN network advertisement. This connection is specific to the top of rack switch that you are connected to.

If an outage of an availability zone occurs, vSphere HA fails over the edge appliances to the other availability zone. The second availability zone must provide an analog of the network infrastructure which the edge node is connected to in the first availability zone.

To support failover of the NSX Edge appliances, the following networks are stretched across the first and second availability zone. For information about the networks in a VI workload domain with multiple availability zones, see Physical Network Infrastructure Design for NSX Data Center for a Virtual Infrastructure Workload Domain.


Function	HA Layer 3 Gateway - Across Availability Zones
Management for the first availability zone	✓
Uplink01	x
Uplink02	x
Edge Overlay	✓

Note:

These stretched networks only applies to the vSphere clusters which contain NSX Edge appliances.

Note:

The VLAN ID and Layer 3 network must be the same across both the availability zones. Additionally, the Layer 3 gateway at the first hop must be highly available such that it tolerates the failure of an entire availability zone.

Table 2. Design Decisions on High Availability of the NSX Edge Nodes for Multiple Availability Zones
Decision ID	Design Decision	Design Justification	Design Implication
VCF-WLD-NSX-EDGE-CFG-010	Add the NSX Edge appliances to the virtual machine group for the first availability zone.	Ensures that, by default, the NSX Edge appliances are powered on within the primary availability zone hosts group.	None.

High Availability for Multiple VMware Cloud Foundation Instances

The VI workload domain in each VMware Cloud Foundation instance has its own NSX Edge cluster. In each instance, the edge nodes and clusters are deployed with the same design but with instance-specific settings such as IP addressing, VLAN IDs, and names. Each edge cluster is managed by the NSX Local Manager instance for that the VI workload domain.

Workload traffic between VMware Cloud Foundation instances traverses the inter-instance overlay tunnels which terminate on the RTEPs on the NSX Edge nodes. This tunnel is the data plane for inter-instance traffic.

To support inter-region communication, you must allocate additional VLANs on the edge nodes. If the region also contains multiple availability zones, this network must be stretched across all availability zones.

Take into account the following considerations:

The RTEP network segment has VLAN ID and Layer 3 range that are specific to the individual data center fault domain.
If a VMware Cloud Foundation instance is deployed with multiple availability zones, the RTEP network segment must be stretched between the zones with the same VLAN ID and IP range. Additionally, the Layer 3 gateway at the first hop must be highly available such that it tolerates the failure of an entire availability zone.
In case of multiple VMware Cloud Foundation instances, each instance requires an Edge RTEP VLAN configured with a VLAN ID and IP range that are appropriate.

Table 3. Edge RTEP VLAN Configuration for the Management Domains for Multiple VMware Cloud Foundation Instances
Function	First Availability Zone	Second Availability Zone	High Availability Layer 3 Gateway
Edge RTEP in the first VMware Cloud Foundation instance	✓	✓	✓
Edge RTEP in the second VMware Cloud Foundation instance	✓	✓	✓