Design of the physical data center network includes defining the network topology for connecting the physical switches and the ESXi hosts, determining switch port settings for VLANs and link aggregation, and designing routing.

A software-defined network (SDN) both integrates with and uses components of the physical data center. SDN integrates with your physical network to support east-west transit in the data center and north-south transit to and from the SDDC networks.

Several typical data center network deployment topologies exist:

  • Core-Aggregation-Access

  • Leaf-Spine

  • Hardware SDN

This design uses the leaf-spine network topology, because in a single data center deployment, it provides predictable performance, scalable nature, and applicability across multiple vendors. Other data center network topologies, such as core-aggregation-access, are also supported.

In an environment with multiple availability zones, Layer 2 networks must be stretched between the availability zones by the physical infrastructure. You must also provide a Layer 3 gateway that is highly available between the availability zones. The method for stretching these Layer 2 networks and providing a highly available Layer 3 gateway is vendor-specific.

In an environment with multiple availability zones or VMware Cloud Foundation Instances, dynamic routing is needed to provide networks with the ability to fail ingress and egress traffic over from one availability zone to another availability zone, or from a protected VMware Cloud Foundation instance to a recovery VMware Cloud Foundation instance. This design uses BGP as the dynamic routing protocol. As such, BGP must be present in your environment to facilitate the failover of networks from site to site. Because of the complexity of local ingress, local egress is not generally in use. In this design, network traffic flows in and out of a primary site or VMware Cloud Foundation Instance.

Switch Types and Network Connectivity

Follow the best practices for physical switches, switch connectivity, VLANs and subnets, and access port settings.

Figure 1. Host-to-ToR Connectivity

An ESXi host is connected to two ToRs over a 25-Gb connection.
Table 1. Design Components for Physical Switches in the SDDC

Design Component

Configuration Best Practices

Top of rack (ToR) physical switches

  • Configure redundant physical switches to enhance availability.

  • Configure switch ports that connect to ESXi hosts manually as trunk ports.

  • Modify the Spanning Tree Protocol (STP) on any port that is connected to an ESXi NIC to reduce the time to transition ports over to the forwarding state, for example using the Trunk PortFast feature found in a Cisco physical switch.

  • Provide DHCP or DHCP Helper capabilities on all VLANs used by host TEP VMkernel ports. This setup simplifies the configuration by using DHCP to assign IP address according to the IP subnet in use.

    If DHCP is not available, you can use static IP assignment. However, you will be unable to stretch a VI domain cluster across L3 domains such as availability zones or independent racks within a single availability zone.

  • Configure jumbo frames on all switch ports, inter-switch link (ISL), and switched virtual interfaces (SVIs).

Top of rack connectivity and network settings

Each ESXi host is connected redundantly to the top of rack switches SDDC network fabric by two 25 GbE ports. Configure the top of rack switches to provide all necessary VLANs using an 802.1Q trunk. These redundant connections use features in vSphere Distributed Switch and NSX-T Data Center to guarantee that no physical interface is overrun, and available redundant paths are used.

VLANs and Subnets in a Single VMware Cloud Foundation Instance with a Single Availability Zone

Each ESXi host uses VLANs and corresponding subnets.

Follow these guidelines:

  • Consider the use of /24 subnets to reduce confusion and mistakes when handling IPv4 subnet configuration.

  • Use the IP address of the floating interface from a first-hop redundancy protocol such as Virtual Router Redundancy Protocol (VRPP) or Hot Standby Routing Protocol (HSRP) as the gateway.

  • Use the RFC1918 IPv4 address space for these subnets and allocate one octet by VMware Cloud Foundation Instance and another octet by function.

Note:

Implement VLAN and IP subnet configuration according to the requirements of your organization.

VLANs and Subnets in a Single VMware Cloud Foundation Instance with Multiple Availability Zones

You deploy NSX Edge nodes in one of the clusters in a VI workload domain, usually the first cluster. The other clusters in the VI workload domain contain only customer workloads.

In the stretched edge and workload cluster, the management, Uplink 01, Uplink 02, and edge overlay networks in each availability zone must be stretched to facilitate failover of the NSX Edge appliances between availability zones. The Layer 3 gateway for the management and Edge overlay networks must be highly available across the availability zones.

Table 2. VLANs and Subnets for a Stretched Edge and Workload Cluster

Function

First Availability Zone

Second Availability Zone

Highly Available Layer 3 Gateway

Within the Zone

Highly Available Layer 3 Gateway

Across Zones

Management - first availability zone

✓ - across the first and second availability zones

vSphere vMotion - first availability zone

✓ - first availability zone

vSAN - first availability zone

✓ - first availability zone

Host overlay - first availability zone

✓ - first availability zone

Uplink01

Uplink02

Edge overlay

✓ - across the first and second availability zones

Management - second availability zone

✓ - second availability zone

vSphere vMotion - second availability zone

✓ - second availability zone

vSAN - second availability zone

✓ - second availability zone

Host overlay - second availability zone

✓ - second availability zone

Because other stretched clusters in the VI workload domain do not contain NSX Edge nodes, you do not need to stretch networks in these clusters.

Table 3. VLANs and Subnets for a Stretched Workload Only Cluster

Function

First Availability Zone

Second Availability Zone

Highly Available Layer 3 Gateway

Within the Zone

Highly Available Layer 3 Gateway

Across Zones

Management - first availability zone

✓ - first availability zone

vSphere vMotion - first availability zone

✓ - first availability zone

vSAN - first availability zone

✓ - first availability zone

Host overlay - first availability zone

✓ - first availability zone

Management - second availability zone

✓ - second availability zone

vSphere vMotion - second availability zone

✓ - second availability zone

vSAN - second availability zone

✓ - second availability zone

Host overlay - second availability zone

✓ - second availability zone

VLANs and Subnets for Multiple VMware Cloud Foundation Instances

In a deployment with multiple VMware Cloud Foundation instances, add VLANs for remote tunnel end point (RTEP) traffic for the NSX Edge nodes in each VMware Cloud Foundation instance. Edge RTEP VLANs carry VMware Cloud Foundation instance-to-instance dataplane traffic. An Edge RTEP VLAN must be routed to the Edge RTEP VLANs in all other VMware Cloud Foundation instances.

Take into account the following considerations:

  • The RTEP network segment has a VLAN ID and Layer 3 range that are specific to the VMware Cloud Foundation instance.

  • In a single VMware Cloud Foundation instance with multiple availability zones, the RTEP network segment must be stretched between the zones and assigned the same VLAN ID and IP range.

Function

First Availability Zone

Second Availability Zone

Highly Available Layer 3 Gateway

Edge RTEP in the first VMware Cloud Foundation instance

✓ - across the first and second availability zones

Edge RTEP in the VMware Cloud Foundation instance

X

Note:

Each VMware Cloud Foundation instance needs its own unique Layer 2 VLAN for the Edge RTEP network. All Edge RTEP networks must be reachable from each other.

Note:

The RTEP network is only needed for VI Workload Domain clusters which will contain NSX Edge Nodes. The RTEP network does not need to be presented to VI Workload Domain clusters which will not contain NSX Edge Nodes.

Physical Network Requirements

Physical requirements determine the MTU size for networks that carry overlay traffic, dynamic routing support, time synchronization through an NTP server, and forward and reverse DNS resolution.

Requirement

Comment

Use 25 GbE (10 GbE minimum) port on each ToR switch for ESXi host uplinks. Connect each host to two ToR switches.

25 GbE provides required bandwidth for hyperconverged networking traffic. Connection to two ToR switches provides redundant physical network paths to each host.

Provide an MTU size of 1,700 bytes or greater on any network that carries Geneve overlay traffic.

Geneve packets cannot be fragmented. The MTU size must be large enough to support the extra encapsulation overhead.

Geneve is an extensible protocol, therefore the MTU size might increase with future capabilities. While 1,600 bytes is sufficient, an MTU size of 1,700 bytes provides more room for increasing the Geneve MTU without the need to change physical infrastructure MTU.

This design uses an MTU size of 9,000 bytes for Geneve traffic.

Enable BGP dynamic routing support on the upstream Layer 3 devices.

Consider the following requirements for the BGP configuration:

  • Configure each ToR switch with only one uplink VLAN. The first ToR switch to which vmnic0 of each ESXi host is connected must be configured only with the Uplink VLAN 1 gateway. The second ToR switch to which vmnic1 of each ESXi host is connected must be configured only with the Uplink VLAN 2 gateway.

  • Make sure that BGP default-originate or a similar feature is enabled on the ToR switches to inject default route into BGP routes exchange with the Tier-0 gateway in NSX-T Data Center

You use BGP on the upstream Layer 3 devices to establish routing adjacency with the Tier-0 service routers (SRs). NSX-T Data Center supports only the BGP routing protocol with NSX Federation.

Dynamic routing enables ECMP failover for upstream connectivity.

BGP Autonomous System Number (ASN) allocation

A BGP ASN must be allocated for the SDN in NSX-T Data Center. Use a private ASN according to RFC1930.

Physical Network Design Decisions

The design decisions you make for the physical network determine the physical layout and use of VLANs. They also include decisions on jumbo frames and on other network-related requirements such as DNS and NTP.

Table 4. Design Decisions on the Physical Network Infrastructure for NSX-T Data Center

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-PHY-001

Use two ToR switches for each rack.

Supports the use of two 10 GbE (25 GbE or greater recommended) links to each server and provides redundancy and reduces the overall design complexity.

Requires two ToR switches per rack which can increase costs.

VCF-WLD-NSX-PHY-002

Implement the following physical network architecture:

  • One 25 GbE (10 GbE minimum) port on each ToR switch for ESXi host uplinks.

  • Layer 3 device that supports BGP.

  • Guarantees availability during a switch failure.

  • Provides support for BGP as the only dynamic routing protocol that is supported by NSX Federation.

  • Might limit the hardware choices.

  • Requires dynamic routing protocol configuration in the physical network

VCF-WLD-NSX-PHY-003

Do not use EtherChannel (LAG, LACP, or vPC) configuration for ESXi host uplinks

  • Simplifies configuration of top of rack switches.

  • Teaming options available with vSphere Distributed Switch provide load balancing and failover.

  • EtherChannel implementations might have vendor-specific limitations.

None.

VCF-WLD-NSX-PHY-004

Use a physical network that is configured for BGP routing adjacency

  • Supports flexibility in network design for routing multi-site and multi-tenancy workloads.

  • Uses BGP as the only dynamic routing protocol that is supported by NSX-T Data Center.

  • Supports failover between ECMP Edge uplinks.

Requires BGP configuration in the physical network.

Access Port Network Settings

Configure additional network settings on the access ports that connect the ToR switches to the corresponding servers.

Table 5. Access Port Network Configuration

Setting

Value

Spanning Tree Protocol (STP)

Although this design does not use the STP, switches usually include STP configured by default. Designate the access ports as trunk PortFast.

Trunking

Configure the VLANs as members of an 802.1Q trunk. Optionally, the management VLAN can act as the native VLAN.

MTU

  • Set the MTU for management VLANs and SVIs to 1,500 bytes.

  • Set the MTU for vSphere vMotion, vSAN, NFS, uplinks, host overlay, and edge overlay VLANs and SVIs to 9,000 bytes.

DHCP Helper

Configure a DHCP helper (sometimes called a DHCP relay) on all TEP VLANs. Set the DHCP helper (relay) to point to a DHCP server by IPv4 address.

If DHCP is not available, you can use static IP assignment. However, you will be unable to stretch the the cluster across availability zones.

Table 6. Design Decisions on Access Ports for NSX-T Data Center

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-PHY-005

Assign persistent IP configurations to each management component in the SDDC with the exception for NSX tunnel endpoints (TEPs) that use dynamic IP allocation.

Ensures that endpoints have a persistent management IP address. In VMware Cloud Foundation, you assign storage (vSAN and NFS) and vSphere vMotion IP configurations by using user-defined network pools.

Requires precise IP address management.

VCF-WLD-NSX-PHY-006

Set the lease duration for the DHCP scope for the host overlay network to at least 7 days.

IP addresses of the host overlay VMkernel ports are assigned by using a DHCP server.

  • Host overlay VMkernel ports do not have an administrative endpoint. As a result, they can use DHCP for automatic IP address assignment. IP pools are an option, but the NSX-T Data Center administrator must create them. If you must change or expand the subnet, changing the DHCP scope is simpler than creating an IP pool and assigning it to the ESXi hosts.

  • DHCP simplifies the configuration of default gateway for Host Overlay VMkernel ports if hosts within same cluster are on separate Layer 2 domains.

Requires configuration and management of a DHCP server.

VCF-WLD-NSX-PHY-007

Use VLANs to separate physical network functions.

  • Supports physical network connectivity without requiring many NICs.

  • Isolates the different network functions of the SDDC so that you can have differentiated services and prioritized traffic as needed.

Requires uniform configuration and presentation on all the trunks that are made available to the ESXi hosts.

Jumbo Frames

IP storage throughput can benefit from the configuration of jumbo frames. Increasing the per-frame payload from 1,500 bytes to the jumbo frame setting improves the efficiency of data transfer. You must configure jumbo frames end-to-end. Select an MTU that matches the MTU of the physical switch ports.

  • According to the purpose of the workload, determine whether to configure jumbo frames on a virtual machine. If the workload consistently transfers large amounts of network data, configure jumbo frames, if possible. In that case, confirm that both the virtual machine operating system and the virtual machine NICs support jumbo frames.

  • Using jumbo frames also improves the performance of vSphere vMotion.

  • The Geneve overlay requires an MTU value of 1,600 bytes or greater.

Table 7. Design Decisions on Jumbo Frames for NSX-T Data Center

Decision ID

Design Decision

Decision Justification

Decision Implication

VCF-WLD-NSX-PHY-008

Set the MTU size to at least 1700 bytes (recommended 9,000 bytes for jumbo frames) on the physical switch ports, vSphere Distributed Switches, vSphere Distributed Switch port groups, and N-VDS switches that support the following traffic types:

  • Geneve (overlay)

  • vSAN

  • vSphere vMotion

  • NFS

  • Improves traffic throughput.

  • Supports Geneve by increasing the MTU size to a minimum of 1600 bytes.

  • Geneve is an extensible protocol. The MTU size might increase with future capabilities. While 1600 is sufficient, an MTU size of 1700 bytes provides more room for increasing the Geneve MTU size without the need to change the MTU size of the physical infrastructure.

When adjusting the MTU size, you must also configure the entire network path (VMkernel ports, virtual switches, physical switches, and routers) to support the same MTU size.

Networking for a Single VMware Cloud Foundation Instance with Multiple Availability Zones

Specific requirements for the physical data center network exist for a topology with multiple availability zones. These requirements extend those for an environment with a single availability zone.
Table 8. Physical Network Requirements for Multiple Availability Zones

Component

Requirement

MTU

  • VLANs that are stretched between availability zones must meet the same requirements as the VLANs for intra-zone connection including the MTU size.

  • MTU value must be consistent end-to-end including components on the inter zone networking path.

  • Set the MTU for the management VLANs and SVIs to 1500 bytes.

  • Set MTU for the vSphere vMotion, vSAN, NFS, uplinks, host overlay, and edge overlay VLANs and SVIs to 9,000 bytes.

Layer 3 gateway availability

For VLANs that are stretched between availability zones, configure data center provided method, for example, VRRP or HSRP, to fail over the Layer 3 gateway between availability zones.

DHCP availability

For VLANs that are stretched between availability zones, provide high availability for the DHCP server so that a failover operation of a single availability zone will not impact DHCP availability.

BGP routing

The data center of each availability zone data center must be configured with an Autonomous System Number (ASN). The ASN can be unique or identical between the availability zone.

Ingress and egress traffic

  • For VLANs that are stretched between availability zones, traffic flows in and out of a single zone. Local egress is not supported.

  • For VLANs that are not stretched between availability zones, traffic flows in and out of the zone where the VLAN is located.

  • For NSX network segments that are stretched between availability zones, traffic flows in and out of a single availability zone. Local egress is not supported.

Latency

  • Maximum network latency between NSX Manager instances is 10 ms.

  • Maximum network latency between the NSX Manager cluster and transport nodes is 150 ms.

Table 9. Design Decisions on the Physical Network for Multiple Available Zones for NSX-T Data Center

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-PHY-009

Set the MTU size to at least 1700 bytes (recommended 9000 bytes for jumbo frames) on physical inter- availability zone networking components which are part of the networking path between availability zones for the following traffic types.

  • Geneve (overlay)

  • vSAN

  • vSphere vMotion

  • NFS

  • Improves traffic throughput.

  • Geneve packets are tagged as do not fragment.

  • For optimal performance, provides a consistent MTU size across the environment.

  • Geneve is an extensible protocol. The MTU size might increase with future capabilities. While 1600 is sufficient, an MTU size of 1700 bytes provides more room for increasing the Geneve MTU size without the need to change the MTU size of the physical infrastructure.

When adjusting the MTU size, you must also configure the entire network path (VMkernel ports, virtual switches, physical switches, and routers) to support the same MTU size.

In multi-AZ deployments, the MTU must be configured on the entire network path between AZs.

VCF-WLD-NSX-PHY-010

Configure VRRP, HSRP, or another Layer 3 gateway availability method.

Ensures that the VLANs that are stretched between availability zones are connected to a highly- available gateway if a failure of an availability zone occurs. Otherwise, a failure in the Layer 3 gateway will cause disruption in traffic in the SDN setup.

Requires configuration of a high availability technology for the Layer 3 gateways in the data center.

Networking for VMware Cloud Foundation Instances

For a topology with multiple VMware Cloud Foundation instances, specific requirements for the networking in a data center and between data centers exist. These requirements extend those for an environment with a single availability zone and those for multiple availability zones.

Table 10. Additional Requirements for the Physical Network for a Multiple-Region SDDC

Component

Requirement

MTU

  • The Edge RTEP VLAN must have standard MTU or greater over the entire end-to-end data path to the RTEP VLAN in other VMware Cloud Foundation instances.

  • Set the MTU for the RTEP VLAN to 1,700 bytes or greater for best performance.

BGP Routing

  • Each VMware Cloud Foundation instance must have its own ASN.

  • Provide connectivity for BGP between the all data centers.

  • Deployments without BGP are not supported.

Ingress and egress traffic

  • For NSX virtual network segments that are not stretched between VMware Cloud Foundation instances, traffic flows in and out of the zone where the segment is located.

  • For NSX virtual network segments that are stretched between regions, traffic flows in and out of a single VMware Cloud Foundation instance or availability zone. Local-egress is not supported. Failover to other VMware Cloud Foundation instances occurs over BGP route withdrawal or advertisement.

Latency

In a VMware Cloud Foundation instance:

  • Maximum network latency between NSX Manager nodes within an NSX Manager cluster must be 10 ms.

  • Maximum network latency between the NSX Manager cluster and transport nodes must be 150 ms.

Between multiple VMware Cloud Foundation instances:

  • Maximum network latency between the primary and standby NSX Global Manager clusters must be 150 ms.

Maximum network latency between NSX Local Manager clusters must be 150 ms.

Required connectivity between VMware Cloud Foundation instances

  • NSX Edge Node RTEP interfaces:

    • First and second VMware Cloud Foundation instances.

  • NSX Local Manager clusters:

    • First and second VMware Cloud Foundation instances.

  • NSX Global Manager clusters:

    • First and second VMware Cloud Foundation instances.

  • NSX Global Manager to NSX Local Manager clusters:

    • First VMware Cloud Foundation instance to the first and second VMware Cloud Foundation instances.

    • Second VMware Cloud Foundation instance to the first and second VMware Cloud Foundation instances.

Table 11. Design Decisions on the Physical Network Infrastructure between VMware Cloud Foundation Instances

Decision ID

Design Decision

Design Justification

Design Implication

VCF-WLD-NSX-PHY-011

Set the MTU size to at least 1,500 bytes (1,700 bytes preferred, 9,000 bytes recommended for jumbo frames) on the physical inter-instance network components which are part of the network path between availability zones for edge RTEP traffic.

  • Jumbo frames are not required between regions. However, increased MTU improves traffic throughput.

  • Increasing the RTEP MTU to 1,700 bytes minimizes fragmentation for standard size workload packets between VMware Cloud Foundation instances.

When adjusting the MTU packet size, you must also configure the entire network path, that is, virtual interfaces, virtual switches, physical switches, and routers to support the same MTU packet size.

VCF-WLD-NSX-PHY-012

Provide a connection between VMware Cloud Foundation instances that is capable of routing between each NSX Manager cluster.

Configuring NSX Federation requires connectivity between NSX Global Managers, NSX Local Managers, and NSX Edge clusters.

Requires unique routable IP addresses for each region.

VCF-WLD-NSX-PHY-013

Ensure that latency between regions is less than 150 ms

A latency below 150 ms is required for the following features.

  • Cross vCenter Server vMotion

  • NSX-T Data Center design

None.

VCF-WLD-NSX-PHY-014

Provide BGP routing between all VMware Cloud Foundation instances.

Automated failover of networks requires a dynamic routing protocol, such as BGP.

None.