Use this design decision list for reference related to the configuration of NSX-T Data Center in an environment with a single or multiple VMware Cloud Foundation instances. The design also considers if an instance contains a single or multiple availability zones.

The NSX-T Data Center design covers the following areas:

  • Physical network infrastructure

  • Deployment of and secure access to the NSX-T Data Center nodes

  • Dynamic routing configuration and load balancing

  • NSX segment organization

  • NSX Federation and Tier-0 gateway configuration for north-south routing

After you set up the physical network infrastructure, the configuration tasks for most design decisions are automated in VMware Cloud Foundation. You must perform the configuration manually only for a limited number of decisions as noted in the design implication.

For full design details, see NSX-T Data Center Design for a Virtual Infrastructure Workload Domain.

Physical Network Infrastructure Design

Table 1. Design Decisions on the Physical Network Infrastructure for NSX-T Data Center

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-PHY-001

Use two ToR switches for each rack.

Supports the use of two 10 GbE (25 GbE or greater recommended) links to each server and provides redundancy and reduces the overall design complexity.

Requires two ToR switches per rack which can increase costs.

VCF-WLD-NSX-PHY-002

Implement the following physical network architecture:

  • One 25 GbE (10 GbE minimum) port on each ToR switch for ESXi host uplinks.

  • Layer 3 device that supports BGP.

  • Guarantees availability during a switch failure.

  • Provides support for BGP as the only dynamic routing protocol that is supported by NSX Federation.

  • Might limit the hardware choices.

  • Requires dynamic routing protocol configuration in the physical network

VCF-WLD-NSX-PHY-003

Do not use EtherChannel (LAG, LACP, or vPC) configuration for ESXi host uplinks

  • Simplifies configuration of top of rack switches.

  • Teaming options available with vSphere Distributed Switch and N-VDS provide load balancing and failover.

  • EtherChannel implementations might have vendor-specific limitations.

None.

VCF-WLD-NSX-PHY-004

Use a physical network that is configured for BGP routing adjacency

  • Supports flexibility in network design for routing multi-site and multi-tenancy workloads.

  • Uses BGP as the only dynamic routing protocol that is supported by NSX-T Data Center.

  • Supports failover between ECMP Edge uplinks.

Requires BGP configuration in the physical network.

Table 2. Design Decisions on Access Ports for NSX-T Data Center

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-PHY-005

Assign persistent IP configurations to each management component in the SDDC with the exception for NSX tunnel endpoints (TEPs) that use dynamic IP allocation.

Ensures that endpoints have a persistent management IP address. In VMware Cloud Foundation, you assign storage (vSAN and NFS) and vSphere vMotion IP configurations by using user-defined network pools.

Requires precise IP address management.

VCF-WLD-NSX-PHY-006

Set the lease duration for the DHCP scope for the host overlay network to at least 7 days.

IP addresses of the host overlay VMkernel ports are assigned by using a DHCP server.

  • Host overlay VMkernel ports do not have an administrative endpoint. As a result, they can use DHCP for automatic IP address assignment. IP pools are an option, but the NSX-T Data Center administrator must create them. If you must change or expand the subnet, changing the DHCP scope is simpler than creating an IP pool and assigning it to the ESXi hosts.

  • DHCP simplifies the configuration of default gateway for Host Overlay VMkernel ports if hosts within same cluster are on separate Layer 2 domains.

Requires configuration and management of a DHCP server.

VCF-WLD-NSX-PHY-007

Use VLANs to separate physical network functions.

  • Supports physical network connectivity without requiring many NICs.

  • Isolates the different network functions of the SDDC so that you can have differentiated services and prioritized traffic as needed.

Requires uniform configuration and presentation on all the trunks that are made available to the ESXi hosts.

Table 3. Design Decisions on Jumbo Frames for NSX-T Data Center

Decision ID

Design Decision

Decision Justification

Decision Implication

VCF-WLD-NSX-PHY-008

Set the MTU size to at least 1700 bytes (recommended 9,000 bytes for jumbo frames) on the physical switch ports, vSphere Distributed Switches, vSphere Distributed Switch port groups, and N-VDS switches that support the following traffic types:

  • Geneve (overlay)

  • vSAN

  • vSphere vMotion

  • NFS

  • Improves traffic throughput.

  • Supports Geneve by increasing the MTU size to a minimum of 1600 bytes.

  • Geneve is an extensible protocol. The MTU size might increase with future capabilities. While 1600 is sufficient, an MTU size of 1700 bytes provides more room for increasing the Geneve MTU size without the need to change the MTU size of the physical infrastructure.

When adjusting the MTU size, you must also configure the entire network path (VMkernel ports, virtual switches, physical switches, and routers) to support the same MTU size.

Table 4. Design Decisions on the Physical Network for Multiple Available Zones for NSX-T Data Center

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-PHY-009

Set the MTU size to at least 1700 bytes (recommended 9000 bytes for jumbo frames) on physical inter- availability zone networking components which are part of the networking path between availability zones for the following traffic types.

  • Geneve (overlay)

  • vSAN

  • vSphere vMotion

  • NFS

  • Improves traffic throughput.

  • Geneve packets are tagged as do not fragment.

  • For optimal performance, provides a consistent MTU size across the environment.

  • Geneve is an extensible protocol. The MTU size might increase with future capabilities. While 1600 is sufficient, an MTU size of 1700 bytes provides more room for increasing the Geneve MTU size without the need to change the MTU size of the physical infrastructure.

When adjusting the MTU size, you must also configure the entire network path (VMkernel ports, virtual switches, physical switches, and routers) to support the same MTU size.

In multi-AZ deployments, the MTU must be configured on the entire network path between AZs.

VCF-WLD-NSX-PHY-010

Configure VRRP, HSRP, or another Layer 3 gateway availability method.

Ensures that the VLANs that are stretched between availability zones are connected to a highly- available gateway if a failure of an availability zone occurs. Otherwise, a failure in the Layer 3 gateway will cause disruption in traffic in the SDN setup.

Requires configuration of a high availability technology for the Layer 3 gateways in the data center.

Table 5. Design Decisions on the Physical Network Infrastructure between VMware Cloud Foundation Instances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-PHY-011

Set the MTU size to at least 1,500 bytes (1,700 bytes preferred, 9,000 bytes recommended for jumbo frames) on the physical inter-instance network components which are part of the network path between availability zones for edge RTEP traffic.

  • Jumbo frames are not required between regions. However, increased MTU improves traffic throughput.

  • Increasing the RTEP MTU to 1,700 bytes minimizes fragmentation for standard size workload packets between VMware Cloud Foundation instances.

When adjusting the MTU packet size, you must also configure the entire network path, that is, virtual interfaces, virtual switches, physical switches, and routers to support the same MTU packet size.

VCF-WLD-NSX-PHY-012

Provide a connection between VMware Cloud Foundation instances that is capable of routing between each NSX Manager cluster.

Configuring NSX Federation requires connectivity between NSX Global Managers, NSX Local Managers, and NSX Edge clusters.

Requires unique routable IP addresses for each region.

VCF-WLD-NSX-PHY-013

Ensure that latency between regions is less than 150 ms

A latency below 150 ms is required for the following features.

  • Cross vCenter Server vMotion

  • NSX-T Data Center design

None.

VCF-WLD-NSX-PHY-014

Provide BGP routing between all VMware Cloud Foundation instances.

Automated failover of networks requires a dynamic routing protocol, such as BGP.

None.

NSX Manager Deployment Specification

Table 6. Design Decisions on NSX Manager Deployment Type

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-CFG-001

Deploy three NSX Manager nodes for the VI workload domain in the first cluster in the management domain for configuring and managing the network services for customer workloads.

Customer workloads can be placed on isolated virtual networks, using load balancing, logical switching, dynamic routing, and logical firewalls services.

None.

Table 7. Design Decisions on Sizing Resources for NSX Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-CFG-002

Deploy each node in the NSX Manager cluster for the workload domain as a large-size appliance.

A large-size appliance is sufficient for providing network services to the SDDC tenant workloads.

You must provide enough compute and storage resources in the management domain to support this NSX Manager cluster.

Table 8. Design Decisions on High Availability for NSX Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-CFG-003

Create a virtual IP (VIP) address for the NSX Manager cluster for the VI workload domain.

Provides high availability of the user interface and API of NSX Manager.

  • The VIP address feature provides high availability only. It does not load balance requests across the cluster.

  • When using the VIP address feature, all NSX Manager nodes must be deployed on the same Layer 2 network.

VCF-WLD-NSX-CFG-004

Apply VM-VM anti-affinity rules in vSphere Distributed Resource Scheduler (vSphere DRS) to the NSX Manager appliances.

Keeps the NSX Manager appliances running on different ESXi hosts for high availability.

  • You must allocate at least four physical hosts so that the three NSX Manager appliances continue running if an ESXi host failure occurs.

  • You must perform additional configuration for the anti-affinity rules.

VCF-WLD-NSX-CFG-005

In vSphere HA, set the restart priority policy for each NSX Manager appliance to high.

  • NSX Manager implements the control plane for virtual network segments. If the NSX Manager cluster is restarted, applications that are connected to NSX VLAN-backed or overlay-backed segments lose connectivity only for a short time until the control plane quorum is re-established.

  • Setting the restart priority to high reserves highest for future needs.

If the restart priority for another management appliance is set to highest, the connectivity delays for services will be longer.

Table 9. Design Decisions on High Availability for NSX Manager for Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-CFG-006

Add the NSX Manager appliances to the virtual machine group for the first availability zone.

Ensures that, by default, the NSX Manager appliances are powered on within the primary availability zone hosts group.

None.

NSX Manager Network Design

Table 10. Design Decisions on the Network Segment for NSX Manager

Decision ID

Design Decision

Design Justification

Decision Implication

VCF-WLD-NSX-PHY-007

Place the appliances of the NSX Manager cluster on the management VLAN in the management domain.

  • Provides direct secure connection to the ESXi hosts and vCenter Server for edge node management and distributed network services.

  • Reduces the number of required VLANs because a single VLAN can be allocated to both, vCenter Server and NSX-T Data Center.

None.

Table 11. Design Decisions on the IP Addressing Scheme for NSX Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-NET-001

Allocate a statically assigned IP address and host name to the nodes of the NSX Manager cluster.

Ensures stability across the SDDC, makes it simpler to maintain and track, and to implement a DNS configuration.

Requires precise IP address management.

Table 12. Design Decisions on Name Resolution for NSX Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-NET-002

Configure forward and reverse DNS records for the nodes of the NSX Manager cluster for the VI workload domain.

The NSX Manager nodes and VIP address are accessible by using fully qualified domain names instead of by using IP addresses only.

You must provide DNS records for the NSX Manager nodes for the VI workload domain in each VMware Cloud Foundation instance.

Table 13. Design Decisions on Time Synchronization for NSX Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-NET-003

Configure NTP on each NSX Manager appliance.

NSX Manager depends on time synchronization.

None.

NSX Global Manager Deployment Specification

Table 14. Design Decisions on the Deployment Type of NSX Global Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-FED-CFG-001

Deploy three NSX Global Manager nodes for the VI workload domain in the default management cluster.

Some customer workloads must be placed on isolated virtual networks, using load balancing, logical switching, dynamic routing, and logical firewalls services.

  • You must turn on vSphere HA in the default management cluster.

  • The default management cluster requires four physical ESXi hosts for vSphere HA and for high availability of the NSX Manager cluster.

Table 15. Design Decisions on Sizing Resources for NSX Global Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-FED-CFG-002

Deploy each node in the NSX-T Global Manager cluster for the VI workload domain as a large-size appliance.

A large-size appliance is sufficient for providing network services to the SDDC customer workloads.

You must provide enough compute and storage resources in the management domain to support this NSX Global Manager cluster.

If you extend the workload domain, increasing the size of the NSX Global Manager appliances might be required.

Table 16. Design Decisions on High Availability Configuration for NSX Global Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-FED-CFG-003

Create a virtual IP (VIP) address for the NSX-T Global Manager cluster for the VI workload domain.

Provides high availability of the user interface and API of NSX Global Manager.

  • The VIP address feature provides high availability only. It does not load-balance requests across the cluster.

  • When using the VIP address feature, all NSX Global Manager nodes must be deployed on the same Layer 2 network.

VCF-WLD-NSX-FED-CFG-004

Apply VM-VM anti-affinity rules in vSphere Distributed Resource Scheduler (vSphere DRS) to the NSX Global Manager appliances.

Keeps the NSX Global Manager appliances running on different ESXi hosts for high availability.

  • You must allocate at least four physical hosts in the management domain so that the three NSX Manager appliances continue running if an ESXi host failure occurs.

  • You must perform additional configuration for the anti-affinity rules.

VCF-WLD-NSX-FED-CFG-005

In vSphere HA, set the restart priority policy for each NSX Global Manager appliance to medium.

  • NSX Global Manager implements the management plane for global segments and firewalls.

    NSX Global Manager is not required for control plane and data plane connectivity.

  • Setting the restart priority to medium reserves the high priority for services that impact the NSX control or data planes.

  • Management of NSX global components will be unavailable until at least one NSX Global Manager virtual machine restarts.

  • The NSX Global Manager cluster is deployed in the management domain, where the total number of virtual machines is limited and where it competes with other management components for restart priority.

Table 17. Design Decisions on High Availability for NSX Global Manager for Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-FED-CFG-006

Add the NSX Global Manager appliances to the virtual machine group for the first availability zone.

Ensures that, by default, the NSX Global Manager appliances are powered on on a host in the first availability zone.

None.

Table 18. Design Decisions on High Availability Configuration for NSX Global Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-FED-CFG-007

Deploy an additional NSX Global Manager cluster in the second VMware Cloud Foundation instance.

Enables recoverablity of NSX Global Manager in a second VMware Cloud Foundation instance if a failure in the first instance occurs.

Requires additional NSX Global Manager nodes in the VMware Cloud Foundation instance.

VCF-WLD-NSX-FED-CFG-008

Set the NSX Global Manager cluster in the second VMware Cloud Foundation instance as standby for the VI workload domain.

Enables recoverablity of the NSX Global Manager in a second VMware Cloud Foundation instance if a failure in the first instance occurs.

None.

NSX Global Manager Network Design

Table 19. Design Decisions on the Network Segment for NSX Global Manager

Decision ID

Design Decision

Design Justification

Decision Implication

VCF-WLD-NSX-FED-NET-001

Place the appliances of the NSX Global Manager cluster on the management VLAN network in the default management cluster in the management domain.

Reduces the number of required VLANs because a single VLAN can be allocated to both vCenter Server and NSX-T Data Center.

None.

Table 20. Design Decisions on the IP Addressing Scheme for NSX Global Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-FED-NET-002

Allocate a statically assigned IP address and host name to the nodes of the NSX Global Manager cluster.

Ensures stability across the SDDC, makes it simpler to maintain and track, and to implement a DNS configuration.

Requires precise IP address management.

Table 21. Design Decisions on Name Resolution for NSX Global Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-FED-NET-003

Configure forward and reverse DNS records for the nodes of the NSX Global Manager cluster for the VI workload domain, assigning the record to the child domain in the region.

The NSX Global Manager nodes and VIP address are accessible by using fully qualified domain names instead of by using IP addresses only.

You must provide DNS records for the NSX Global Manager nodes for the VI workload domain in VMware Cloud Foundation instance.

Table 22. Design Decisions on Time Synchronization for NSX Global Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-FED-NET-004

Configure NTP on each NSX Global Manager appliance.

NSX Global Manager depends on time synchronization across all SDDC components.

None.

NSX Edge Deployment Specification

Table 23. Design Decisions on the Form Factor and Sizing for the NSX Edge Nodes

Design ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-EDGE-CFG-001

Use large-size NSX Edge virtual appliances.

The large-size appliance provides the required performance characteristics for most tenant workloads.

None.

Table 24. Design Decisions on the NSX Edge Cluster Configuration

Decision ID

Design Decision

Design Justification

Design Implications

VCF-WLD-NSX-EDGE-CFG-002

Deploy the NSX Edge virtual appliances in a shared edge and workload cluster in the VI workload domain.
  • Keeps customer network traffic local to the VI workload domain.

  • Simplifies configuration and minimizes the number of ESXi hosts required for initial deployment.

NSX Edge appliances are co-located with customer workloads. Ensure that customers workloads do not prevent NSX Edge nodes from handling network traffic.

VCF-WLD-NSX-EDGE-CFG-003

Deploy two NSX Edge appliances in the edge cluster in the shared edge and workload cluster.

Creates the edge cluster for satisfying the requirements for availability and scale.

None.

VCF-WLD-NSX-EDGE-CFG-004

Create a resource pool for the NSX Edge appliances in the root of the shared edge and workload cluster object.

Create a resource pool in the root of the shared edge and workload cluster object for customer workloads.

Guarantees that the edge cluster receives sufficient compute resources during times of contention.

  • Customer workloads must be deployed to a separate Resource Pool at the root of the cluster.

  • To ensure adequate resources for customer and control plane workloads, the root of the cluster must not run any virtual machines.

  • Customer workloads might not be able to use their allocated memory during times of contention.

VCF-WLD-NSX-EDGE-CFG-005

Configure the edge resource pool with a 64-GB memory reservation and high CPU share value.

  • Guarantees that the edge cluster receives sufficient memory resources during times of contention.

  • Setting the CPU share value to high gives priority to edge appliances in times of CPU contention.

Customer workloads might not be able to use their allocated memory or CPU during times of contention.

VCF-WLD-NSX-EDGE-CFG-006

Apply VM-VM anti-affinity rules for vSphere DRS to the virtual machines of the NSX Edge cluster.

Keeps the NSX Edge nodes running on different ESXi hosts for high availability.

None.

VCF-WLD-NSX-EDGE-CFG-007

In vSphere HA, set the restart priority policy for each NSX Edge appliance to high.

  • The NSX Edge nodes are part of the north-south data path for overlay segments. vSphere HA restarts the NSX Edge appliances first so that other virtual machines that are being powered on or migrated by using vSphere vMotion while the edge nodes are offline lose connectivity only for a short time.

  • Setting the restart priority to high reserves highest for future needs.

If the restart priority for another customer workload is set to highest, the connectivity delays for other virtual machines will be longer.

VCF-WLD-NSX-EDGE-CFG-008

Configure all edge nodes as transport nodes.

Enables the participation of edge nodes in the overlay network for delivery of services to the SDDC workloads such as routing and load balancing.

None.

VCF-WLD-NSX-EDGE-CFG-009

Create an NSX Edge cluster with the default Bidirectional Forwarding Detection (BFD) configuration between the NSX Edge nodes in the cluster.

  • Satisfies the availability requirements by default.

  • Edge nodes must remain available to create services such as NAT, routing to physical networks, and load balancing.

None.

Table 25. Design Decisions on High Availability of the NSX Edge Nodes for Multiple Availability Zones

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-EDGE-CFG-010

Add the NSX Edge appliances to the virtual machine group for the first availability zone.

Ensures that, by default, the NSX Edge appliances are powered on within the primary availability zone hosts group.

None.

NSX Edge Network Design

Table 26. Design Decisions on the Network Configuration of the NSX Edge Appliances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-EDGE-NET-001

Connect the management interface eth0 of each NSX Edge node to the management VLAN.

Provides connection to the NSX Manager cluster.

None.

VCF-WLD-NSX-EDGE-NET-002

  • Connect the fp-eth0 interface of each NSX Edge appliance to a VLAN trunk port group pinned to physical NIC 0 of the host.

  • Connect the fp-eth1 interface of each NSX Edge appliance to a VLAN trunk port group pinned to physical NIC 1 of the host.

  • Leave the fp-eth2 interface of each NSX Edge appliance unused.

Because VLAN trunk port groups pass traffic for all VLANs, VLAN tagging can occur in the NSX Edge node itself for easy post-deployment configuration.

  • By using two separate VLAN trunk port groups, you can direct traffic from the NSX-T Edge node to a particular host network interface and top of rack switch as needed.

  • In the event of failure of the top of rack switch, the VLAN trunk port group will failover to the other physical NIC and to ensure both fp-eth0 and fp-eth1 are available.

None.

VCF-WLD-NSX-EDGE-NET-003

Use a single N-VDS in the NSX Edge nodes.

  • Simplifies deployment of the edge nodes.

  • The same N-VDS switch design can be used regardless of edge form factor.

  • Supports multiple TEP interfaces in the edge node.

  • vSphere Distributed Switch is not supported in the edge node.

None.

VCF-WLD-NSX-EDGE-NET-004

Use a dedicated VLAN for the edge overlay network that is segmented from the host overlay VLAN.

The edge overlay network must be isolated from the host overlay network to protect the host overlay traffic from edge-generated overlay traffic.

  • You must have a route between the VLANs for edge overlay and host overlay.

  • You must allocate another VLAN in the data center infrastructure for NSX edge overlay traffic.

Table 27. Design Decisions on the Network Configuration of the NSX Edge Appliances for an Environment with Multiple VMware Cloud Foundation Instances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-EDGE-NET-005

Allocate a separate VLAN for edge RTEP overlay that is different from the edge overlay VLAN.

  • The RTEP network must on a VLAN that is different from the edge overlay VLAN.
  • Dedicated VLAN for inter-site communication.

You must allocate another VLAN in the data center infrastructure for edge RTEP overlay.

NSX Edge Uplink Policy Design

Table 28. Design Decisions on the NSX Edge Uplink Policy

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-EDGE-NET-006

Create one uplink profile for the edge node with three teaming policies.

  • Default teaming policy of load balance source both active uplinks uplink-1 and uplink-2.

  • Named teaming policy of failover order with a single active uplink uplink-1 without standby uplinks.

  • Named teaming policy of failover order with a single active uplink uplink-2 without standby uplinks.

  • An NSX Edge node that uses a single N-VDS can have only one uplink profile.

  • For increased resiliency and performance, supports the concurrent use of both edge uplinks through both physical NICs on the ESXi hosts.

  • The default teaming policy increases overlay performance and availability by using multiple Host Overlay VMkernel ports and appropriate balancing of overlay traffic.

  • By using named teaming policies, you can connect an edge uplink to a specific host uplink and from there to a specific top of rack switch in the data center.

  • Enables ECMP in each availability zone because the NSX-T Edge nodes can uplink to the physical network over two different VLANs.

You can use this policy only with ESXi hosts. Edge virtual machines must use the failover order teaming policy.

Life Cycle Management Design

Table 29. Design Decisions on Life Cycle Management of NSX-T Data Center

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-LCM-001

Use SDDC Manager to perform the life cycle management of NSX Manager and related components in the workload domain.

Because the deployment scope of SDDC Manager covers the full SDDC stack, SDDC Manager performs patching, update, or upgrade of the workload domain as a single process.

The operations team must understand and be aware of the impact of a patch, update, or upgrade operation by using SDDC Manager.

Table 30. Design Decisions on Life Cycle Management of NSX-T Data Center for Multiple VMware Cloud Foundation Instances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-LCM-FED-001

Use the upgrade coordinator in NSX-T Data Center to perform life cycle management on the NSX Global Manager appliances.

The version of SDDC Manager in this design is not currently capable of life cycle operations (patching, update, or upgrade) for NSX Global Manager.

You must always align the version of the NSX Global Manager nodes with the rest of the SDDC stack in VMware Cloud Foundation.

You must explicitly plan upgrades of the NSX Global Manager nodes. An upgrade of the NSX Global Manager nodes might require a cascading upgrade of the NSX Local Manager nodes and underlying SDDC Manager infrastructure prior to the upgrade of the NSX Global Manager nodes.

An upgrade of the VI workload domain from SDDC Manager might include an upgrade of the NSX Local Manager cluster which might require an upgrade of the NSX Global Manager cluster. An upgrade of NSX Global Manager might then require that you upgrade all other VI workload domains connected to it before you can proceed with upgrading the NSX Global Manager instance.

VCF-WLD-NSX-LCM-FED-002

Establish an operations practice to ensure that prior to the upgrade of any VI workload domain, the impact of any version upgrades is evaluated against the need to upgrade NSX Global Manager.

The versions of NSX Global Manager and NSX Local Manager nodes must be compatible with each other.

Because the version of SDDC Manager in this design does not provide of life cycle operations (patching, update, or upgrade) for the NSX Global Manager nodes, upgrade to an unsupported version cannot be prevented.

The administrator must establish and follow an operational practice by using a runbook or automated process to ensure a fully supported and compliant bill of materials prior to any upgrade operation.

VCF-WLD-NSX-LCM-FED-003

Establish an operations practice to ensure that prior to the upgrade of the NSX Global Manager, the impact of any version change is evaluated against the existing NSX Local Manager nodes and VI workload domains.

The versions of NSX Global Manager and NSX Local Manager nodes must be compatible with each other.

Because the version of SDDC Manager in this design does not provide of life cycle operations (patching, update, or upgrade) for the NSX Global Manager nodes, upgrade to an unsupported version cannot be prevented.

The administrator must establish and follow an operational practice by using a runbook or automated process to ensure a fully supported and compliant bill of materials prior to any upgrade operation.

Routing Design for a Single VMware Cloud Foundation Instance

Table 31. Design Decisions on the High Availability Mode of Tier-0 Gateways

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-001

Deploy an active-active Tier-0 gateway.

Supports ECMP north-south routing on all Edge nodes in the NSX Edge cluster.

Active-active Tier-0 gateways cannot provide stateful services such as NAT.

Table 32. Design Decisions on Edge Uplink Configuration for North-South Routing

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-002

To enable ECMP between the Tier-0 gateway and the Layer 3 devices (ToR switches or upstream devices), create two VLANs.

The ToR switches or upstream Layer 3 devices have an SVI on one of the two VLANs and each NSX Edge node in the cluster has an interface on each VLAN.

Supports multiple equal-cost routes on the Tier-0 gateway and provides more resiliency and better bandwidth use in the network.

Additional VLANs are required.

VCF-WLD-NSX-SDN-003

Assign a named teaming policy to the VLAN segments to the Layer 3 device pair.

Pins the VLAN traffic on each segment to its target Edge node interface. From there the traffic is directed to the host physical NIC that is connected to the target top of rack switch.

None.

VCF-WLD-NSX-SDN-004

Create a VLAN transport zone for edge uplink traffic.

Enables the configuration of VLAN segments on the N-VDS in the edge nodes.

Additional VLAN transport zones are required if the edge nodes are not connected to the same top of rack switch pair.

Table 33. Design Decisions on Dynamic Routing

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-005

Use BGP as the dynamic routing protocol.

  • Enables the dynamic routing by using NSX-T Data Center.

  • BGP offers increased scale and flexibility.

  • BGP is a proven protocol that is designed for peering between networks under independent administrative control - data center networks and the NSX-T Data Center SDN.

  • SDDC architectures with multiple availability zones or multiple VMware Cloud Foundation Instances require BGP.

In environments where BGP cannot be used, you must configure and manage static routes.

VCF-WLD-NSX-SDN-006

Configure the BGP Keep Alive Timer to 4 and Hold Down Timer to 12 between the top of rack switches and the Tier-0 gateway.

These timers must be aligned with the data center fabric design of your organization.

Provides a balance between failure detection between the top of rack switches and the Tier-0 gateway and overburdening the top of rack switches with keep-alive traffic.

By using longer timers to detect if a router is not responding, the data about such a router remains in the routing table longer. As a result, the active router continues to send traffic to a router that is down.

VCF-WLD-NSX-SDN-007

Do not enable Graceful Restart between BGP neighbors.

Avoids loss of traffic.

On the Tier-0 gateway, BGP peers from all the gateways are always active. On a failover, the Graceful Restart capability increases the time a remote neighbor takes to select an alternate Tier-0 gateway. As a result, BFD-based convergence is delayed.

None.

VCF-WLD-NSX-SDN-008

Enable helper mode for Graceful Restart mode between BGP neighbors.

Avoids loss of traffic.

During a router restart, helper mode works with the graceful restart capability of upstream routers to maintain the forwarding table which in turn will forward packets to a down neighbor even after the BGP timers have expired causing loss of traffic.

None.

VCF-WLD-NSX-SDN-009

Enable Inter-SR iBGP routing.

In the event that an edge node has all of its northbound eBGP sessions down, north-south traffic will continue to flow by routing traffic to a different edge node.

None.

Table 34. Design Decisions on the Tier-1 Gateway Configuration

Decision ID

Design Decision

Design Implication

Design Justification

VCF-WLD-NSX-SDN-010

Deploy a Tier-1 gateway and connect it to the Tier-0 gateway.

Creates a two-tier routing architecture.

Abstracts the NSX logical components which interact with the physical data center from the logical components which provide SDN services.

A Tier-1 gateway can only be connected to a single Tier-0 gateway.

In cases where multiple Tier-0 gateways are required, you must create multiple Tier-1 gateways.

VCF-WLD-NSX-SDN-011

Deploy a Tier-1 gateway to the NSX-T Edge cluster.

Enables stateful services, such as load balancers and NAT, for SDDC management components.

Because a Tier-1 gateway always works in active-standby mode, the gateway supports stateful services.

None.

VCF-WLD-NSX-SDN-012

Deploy a Tier-1 gateway in non-preemptive failover mode.

Ensures that after a failed NSX-T Edge transport node is back online, it does not take over the gateway services thus causing a short service outage.

None.

VCF-WLD-NSX-SDN-013

Enable standby relocation of the Tier-1 gateway.

Ensures that if an edge failure occurs, a standby Tier-1 gateway is created on another edge node.

None.

Table 35. Design Decisions on North-South Routing for Multiple Availability Zones

Decision ID

Design Decision

Design Implication

Design Justification

VCF-WLD-NSX-SDN-014

Extend the uplink VLANs to the top of rack switches so that the VLANs are stretched between both availability zones.

Because the NSX Edge nodes will fail over between the availability zones, ensures uplink connectivity to the top of rack switches in both availability zones regardless of the zone the NSX Edge nodes are presently in.

You must configure a stretched Layer 2 network between the availability zones by using physical network infrastructure.

VCF-WLD-NSX-SDN-015

Provide this SVI configuration on the top of the rack switches or upstream Layer 3 devices.

  • In the second availability zone, configure the top of rack switches or upstream Layer 3 devices with an SVI on each of the two uplink VLANs.

  • Make the top of rack switch SVI in both availability zones part of a common stretched Layer 2 network between the availability zones.

Enables the communication of the NSX Edge nodes to the top of rack switches in both availability zones over the same uplink VLANs.

You must configure a stretched Layer 2 network between the availability zones by using the physical network infrastructure.

VCF-WLD-NSX-SDN-016

Provide this VLAN configuration.

  • Use two VLANs to enable ECMP between the Tier-0 gateway and the Layer 3 devices (top of rack switches or upstream devices).

  • The ToR switches or upstream Layer 3 devices have an SVI to one of the two VLANS and each NSX-T Edge node has an interface to each VLAN.

Supports multiple equal-cost routes on the Tier-0 gateway, and provides more resiliency and better bandwidth use in the network.

Extra VLANs are required.

Requires stretching uplink VLANs between Availability zones

VCF-WLD-NSX-SDN-017

Create an IP prefix list that permits access to route advertisement by any network instead of using the default IP prefix list.

Used in a route map to prepend a path to one or more autonomous system (AS-path prepend) for BGP neighbors in Availability Zone 2.

You must manually create an IP prefix list that is identical to the default one.

VCF-WLD-NSX-SDN-018

Create a route map-out that contains the custom IP prefix list and an AS-path prepend value set to the Tier-0 local AS added twice.

  • Used for configuring neighbor relationships with the Layer 3 devices in the second availability zone.

  • Ensures that all ingress traffic passes through Availability Zone 1.

You must manually create the route map.

The two NSX Edge nodes will route north-south traffic through the second availability zone only if the connection to their BGP neighbors in the first availability zone is lost, for example, if a failure of the top of the rack switch pair or in the availability zone occurs.

VCF-WLD-NSX-SDN-019

Create an IP prefix list that permits access to route advertisement by network 0.0.0.0/0 instead of using the default IP prefix list.

Used in a route map to configure local-reference on learned default-route for BGP neighbors in the second availability zone.

You must manually create an IP prefix list that is identical to the default one.

VCF-WLD-NSX-SDN-020

Apply a route map-in that contains the IP prefix list for the default route 0.0.0.0/0 and assign a lower local-preference , for example, 80 to the learned default route and a lower local-preference, for example, 90 any routes learned.

  • Used for configuring neighbor relationships with the Layer 3 devices in the second availability zone.

  • Ensures that all egress traffic passes through the first availability zone.

You must manually create the route map.

The two NSX Edge nodes will route north-south traffic through the second availability zone only if the connection to their BGP neighbors in the first availability zone is lost, for example, if a failure of the top of the rack switch pair or in the availability zone occurs.

VCF-WLD-NSX-SDN-021

Configure the neighbors of the second availability zone to use the route maps as In and Out filters respectively.

Makes the path in and out of the second availability zone less preferred because the AS path is longer. As a result, all traffic passes through the first zone.

The two NSX Edge nodes will route north-south traffic through the second availability zone only if the connection to their BGP neighbors in the first availability zone is lost, for example, if a failure of the top of the rack switch pair or in the availability zone occurs.

Routing Design for Multiple VMware Cloud Foundation Instances

Table 36. Design Decisions on the Tier-0 Gateway Configuration for Multiple VMware Cloud Foundation Instances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-FED-001

Extend the VI workload domain active-active Tier-0 gateway to the second VMware Cloud Foundation instance.

  • Supports ECMP north-south routing on all nodes in the NSX Edge cluster.

  • Enables support for cross-region Tier-1 gateways and cross-region network segments.

Active-active Tier-0 gateways cannot provide stateful services such as NAT.

VCF-WLD-NSX-SDN-FED-002

Set the Tier-0 gateway as primary for all VMware Cloud Foundation instances.

  • In NSX Federation, a Tier-0 gateway lets egress traffic from connected Tier-1 gateways only in its primary locations.

  • Local ingress and egress traffic is controlled independently at the Tier-1 gateway level. No segments are provisioned directly to the Tier-0 gateway.

  • In a VI workload domain, this architecture improves flexibility for unique use cases.

  • A mixture of network spans (isolated to a region or spanning multiple regions) is enabled without requiring additional Tier-0 gateways and hence edge nodes.

  • If a failure in a VMware Cloud Foundation instance occurs, the local-instance networking in the other instances remains available without manual intervention.

None.

Table 37. Design Decisions on Dynamic Routing for a Multiple VMware Cloud Foundation Instances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-FED-003

From the global Tier-0 gateway, establish BGP neighbor peering to the ToR switches connected to the second VMware Cloud Foundation instances.

  • Enables the learning and advertising of routes between in the second VMware Cloud Foundation instances.

  • Facilitates a potential automated failover of networks from the first to the second VMware Cloud Foundation instance.

None.

Overlay Design

Table 38. Design Decisions on ESXi Host Transport Nodes

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-022

Enable all ESXi hosts in the VI workload domain as transport nodes in NSX-T Data Center.

Enables distributed routing, logical segments, and distributed firewall.

None.

VCF-WLD-NSX-SDN-023

Configure each ESXi host as a transport node without using transport node profiles.

  • Enables the participation of ESXi hosts and the virtual machines on them in NSX overlay and VLAN networks.

  • Transport node profiles can only be applied at the cluster level. Because in an environment with multiple availability zones each availability zone is connected to a different set of VLANs, you cannot use a transport node profile.

You must configure each transport node with an uplink profile individually.

Table 39. Design Decision on Host TEP Addressing for NSX-T Data Center
Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-024

Use DHCP to assign IP addresses to the host TEP interfaces.

Required for deployments where a cluster spans Layer 3 network domains such as multiple availability zones and clusters in the VI workload domain that span Layer 3 domains.

DHCP server is required for the host overlay VLANs.

Table 40. Design Decision on Virtual Switches for NSX-T Data Center

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-025

Use a vSphere Distributed Switch for the shared edge and workload cluster that is enabled for NSX-T Data Center.

  • Uses the existing vSphere Distributed Switch.

  • Provides NSX logical segment capabilities to support advanced use cases.

To use features such as distributed routing, customer workloads must be connected to NSX segments.

Management occurs jointly from the vSphere Client to NSX Manager. However, you must perform all network monitoring in the NSX Manager user interface or another solution.

Table 41. Design Decisions on Geneve Overlay

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SDN-026

To provide virtualized network capabilities to customer workloads, use overlay networks with NSX Edge nodes and distributed routing.

  • Creates isolated, multi-tenant broadcast domains across data center fabrics to deploy elastic, logical networks that span physical network boundaries.

  • Enables advanced deployment topologies by introducing Layer 2 abstraction from the data center networks.

Requires configuring transport networks with an MTU size of at least 1,700 bytes.

Table 42. Design Decision on the Transport Zone Configuration for NSX-T Data Center

Decision ID

Design Decision

Design Implication

Design Justification

VCF-WLD-NSX-SDN-027

Create a single overlay transport zone for all overlay traffic across the VI workload domain and NSX Edge nodes.

  • Ensures that overlay segments are connected to an NSX Edge node for services and north-south routing.

  • Ensures that all segments are available to all ESXi hosts and NSX Edge nodes configured as transport nodes.

None.

VCF-WLD-NSX-SDN-028

Create a single VLAN transport zone for uplink VLAN traffic that is applied only to NSX Edge nodes.

Ensures that uplink VLAN segments are configured on the NSX Edge transport nodes.

If VLAN segments are needed on hosts, you must create another VLAN transport zone for the host transport nodes only.

Table 43. Design Decisions on the Uplink Profile for ESXi Transport Nodes

Decision ID

Design Decision

Decision Justification

Decision Implication

VCF-WLD-NSX-SDN-029

Create an uplink profile with the load balance source teaming policy with two active uplinks for ESXi hosts.

For increased resiliency and performance, supports the concurrent use of both physical NICs on the ESXi hosts that are configured as transport nodes.

None.

Table 44. Design Decisions on Segment Replication Mode

Decision ID

Design Decision

Design Justification

Design Implications

VCF-WLD-NSX-SDN-030

Use hierarchical two-tier replication on all NSX-T overlay segments.

Hierarchical two-tier replication is more efficient by reducing the number of ESXi hosts the source ESXi host must replicate traffic to.

None.

Information Security and Access Control Design

Table 45. Design Decisions on Certificate Management in NSX Manager

Decision ID

Design Decision

Design Implication

Design Justification

VCF-WLD-NSX-SEC-001

Replace the default self-signed certificate of the NSX Manager instance for the VI workload domain with a certificate that is signed by a third-party certificate authority.

Ensures that the communication between NSX-T Data Center administrators and the NSX Manager instance is encrypted by using a trusted certificate.

Replacing the default certificates with trusted CA-signed certificates from a certificate authority might increase the deployment preparation time because you must generate and submit certificates requests.

VCF-WLD-NSX-SEC-002

Use a SHA-2 algorithm or stronger when signing certificates.

The SHA-1 algorithm is considered less secure and has been deprecated.

Not all certificate authorities support SHA-2.

Table 46. Design Decisions on Certificate Management in NSX Global Manager

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-SEC-FED-001

Replace the default self- signed certificate of the NSX Global Manager instance for the VI workload domain with a certificate that is signed by a third- party certificate authority.

Ensures that the communication between NSX-T Data Center administrators and the NSX Global Manager instance is encrypted by using a trusted certificate.

Replacing the default certificates with trusted CA- signed certificates from a certificate authority might increase the deployment preparation time because you must generate and submit certificates requests.

VCF-WLD-NSX-SEC-FED-002

Establish an operations practice to capture and update on the NSX Global Manager the thumbprint of the NSX Local Manager certificate every time the certificate is updated by using SDDC Manager.

Ensures secured connectivity between the NSX Manager instances.

Each certificate has its own unique thumbprint. The NSX Global Manager stores the unique thumbprint of the NSX Local Manager instances for enhanced security.

If an authentication failure between the NSX Global Manager and NSX Local Manager occurs, objects that are created from the NSX Global Manager will not be propagated to the SDN.

The administrator must establish and follow an operational practice by using a runbook or automated process to ensure that the thumbprint up-to-date.