NSX is an implementation of software-defined networking. It provides network services such as switching, routing, load balancing, firewall, and Virtual Private Networking (VPN).

NSX focuses on providing networking, security, automation, and operational simplicity for the underlying physical network. NSX is a non-disruptive solution and can be deployed on any IP network such as traditional networking models and next-generation fabric architectures, regardless of the vendor. This is accomplished by decoupling the virtual networks from their physical counterparts.

NSX Manager

The NSX Manager is the centralized network management component of VMware NSX. It implements the management and control planes for the NSX infrastructure.

NSX Manager provides the following functions:

  • The Graphical User Interface (GUI) and the RESTful API for creating, configuring, and monitoring NSX components, such as segments and gateways.

  • An aggregated system view

  • A method for monitoring and troubleshooting workloads attached to virtual networks

  • Configuration and orchestration of the following services:

    • Logical networking components, such as logical switching and routing

    • Networking and edge services

    • Security services and distributed firewall

  • A RESTful API endpoint to automate consumption. Because of this architecture, you can automate all configuration and monitoring operations using any cloud management platform, security vendor platform, or automation framework.

Some of the components of the NSX Manager are as follows:

  • NSX Management Plane Agent (MPA): Available on each ESXi host. The MPA persists the desired state of the system and communicates Non-Flow-Controlling (NFC) messages such as configuration, statistics, status, and real-time data between transport nodes and the management plane.

  • NSX Controller: Controls the virtual networks and overlay transport tunnels. The controllers are responsible for the programmatic deployment of virtual networks across the entire NSX architecture.

  • Central Control Plane (CCP): Logically separated from all data plane traffic. A failure in the control plane does not affect existing data plane operations. The controller provides configuration to other NSX Controller components such as the segments, gateways, and edge VM configuration.

  • Local Control Plane (LCP): The LCP runs on transport nodes. It is adjacent to the dataplane it controls and is connected to the CCP. The LCP is responsible for programming the forwarding entries and firewall rules of the data plane.

Virtual Distributed Switch

vSphere Distributed switch (vSphere 7 and later versions) supports NSX Distributed Port Groups. NSX and vSphere integration consolidates the use of NSX on VDS, and this model is known as a Converged Virtual Distributed Switch (C-VDS).

NSX implements each logical broadcast domain by tunneling VM-to-VM traffic and VM-to-gateway traffic using the Geneve tunnel encapsulation mechanism. The network controller has a global view of the data center and ensures that the virtual switch flow tables in the ESXi host are updated as the VMs are created, moved, or removed.

NSX implements virtual switching in Standard and Enhanced Data Path (EDP) modes. EDP provides better network performance for telco workloads. The EDP mode supports both overlay and VLAN based network segments.

Transport Zones

Transport zones determine which hosts can consume a particular network. A transport zone identifies the type of traffic, such as VLAN, overlay, and the teaming Policy. You can configure one or more transport zones. A transport zone does not represent a security boundary.

Figure 1. Transport Zones
Transport Zones

In NSX, when an ESXi host is converted into a transport node, it is attached to one or more transport zones. The switch type and mode are configured during the host provisioning or within the transport node profiles.

Note:

Only a single overlay transport zone is supported for ESXi hosts. Multiple VLAN transport zones can be configured.

Logical Switching

NSX Segments create logically abstracted segments to which the workloads can be connected. A single segment is mapped to a unique Geneve segment ID (or VLAN ID) that is distributed across the ESXi hosts (and NSX edge nodes) within a given transport zone. NSX Segments support switching in the ESXi host without the constraints of VLAN sprawl or spanning tree issues.

Gateways

NSX Gateways provide the North-South connectivity for the workloads to access external networks and the East-West connectivity between different logical networks.

A gateway is a configured partition of a traditional network hardware router. It replicates the functionality of the hardware, creating multiple routing domains in a single router. Gateways perform a subset of the tasks that are handled by the physical router. Each gateway can contain multiple routing instances and routing tables. Using gateways can be an effective way to maximize the use of routers.

  • Distributed Router: A Distributed Router (DR) spans across all ESXi nodes to which VMs are connected to the NSX gateway. The DR construct also expands to the edge nodes. Functionally, the DR is responsible for one-hop distributed routing between segments and other gateways connected to the NSX gateway.

  • Service Router: A Service Router (SR) implements stateful services such as Border Gateway Protocol (BGP) and stateful Network Address Translation (NAT). These services cannot be implemented in a distributed way. A gateway always has a DR. A gateway has SRs when it is a Tier-0 Gateway or a Tier-1 Gateway. It is configured with services such as load balancing, NAT, or Dynamic Host Configuration Protocol (DHCP).

Figure 2. Traditional NSX Routing
Traditional NSX Routing

Virtual Routing and Forwarding

A Virtual Routing and Forwarding (VRF) gateway enables multiple instances of a routing table to exist simultaneously within the same gateway. A VRF is a layer 3 equivalent of a VLAN. A Tier-1 router or network segment must be linked to a tier-0 VRF gateway. The configuration of tier-0 VRF gateway is inherited from the parent T0 configuration.

In a multi-tenant solution, VRFs allow a single Tier-0 gateway to be deployed and managed while keeping the routing tables between tenants isolated. Each VRF can peer to a different eBGP neighbor and autonomous system (AS).

Figure 3. NSX VRF Routing
NSX VRF Routing

Enhanced Data Path

NSX Enhanced Data Path (EDP) is a networking stack mode, which provides better network performance. It is primarily designed for data plane intensive workloads.

  • When EDP mode is enabled on a transport node, a combination of technologies is used. The technologies include mbuf packet representation, DPDK fastpath, polling, flow-cache, dedicated CPUs, and vmxnet3 optimizations. These technologies together significantly accelerate the network traffic for the network function workloads.

  • The NSX EDP mode virtual switches use the same emulated virtual NIC adapter, VMXNET3, as the Standard or Distributed vSphere Virtual Switch. Unlike other solutions such as SR-IOV, you can consume EDP with no loss of functionality such as vMotion.

  • The NSX implementation of EDP remains in the ESXi Hypervisor space (vmkernel). Hence the user-space security concerns associated with open-source DPDK networking stacks are not applicable to EDP.

When using EDP, CPU cores must be pre-allocated to the infrastructure dimensioning plan. The CPU cores are assigned per EDP switch instance, per host, and per NUMA. The quantity depends on the throughput required by the workloads. SR-IOV, in comparison, does not require any CPU cores in the infrastructure compute budget. However, the benefits of EDP outweigh the CPU cores saved using SR-IOV.

The following table provides a high-level comparison between the two design choices:

  • NSX EDP only: A single virtual switch is used to manage data plane and non-data plane traffic.

  • NSX & SR-IOV: NSX Standard mode switch is used for non-data plane traffic. SR-IOV is used for data plane traffic.

Note:

Uplinks that are used for EDP switching cannot also be leveraged for SR-IOV VFs.

Table 1. NSX EDP and SR-IOV Comparison

NSX EDP only

NSX + SRIOV

CPU

2-4 physical cores per CPU

Additional Core could be considered

None

Note: The assumption is that the NSX threads use the cores reserved for ESXi.

NIC

2 ports per socket

2 ports for NSX

2 ports per socket for SR-IOV

Throughput

The lowest throughput among the physical network, EDP virtual switch, or workload.

The lowest throughput among the physical network or the workload.

Programmable L2/L3 networks (SDN)

Fully programmable

Not supported

Programmable Edge

Fully programmable, high-performance Edges

Not supported

End-to-end visibility and analytics

Available

Not supported

Provisioning and resiliency

NIC teaming, vMotion, HA, DRS

Not supported

Driver Standardization

Available (VMXNET3)

Not supported

Note: Application vendor-specific drivers are required.

NIC support

VMware Hardware Compatibility List

VMware + App Vendor HCL for SR-IOV

Upgrade compatibility

Available across NSX releases

Vendor-specific for SR-IOV

Support model

VMware Carrier-Grade Support

Complex multivendor resolution

Note:

EDP is supported in conjunction with the Virtual Hyperthreading (vHT) functionality provided by vSphere 8. When configuring EDP interfaces as part of Dynamic Infrastructure Provisioning (DIP) in VMware Telco Cloud Automation. These interfaces can be consumed while configuring vHT.

The provisioning of EDP requires the assignment of lCores. These lCores must be considered for the overall resource budget.

Ethernet VPN (EVPN)

Ethernet VPN (EVPN) is a standards-based BGP control plane that enables the extension of Layer 2 and Layer 3 connectivity.

In the Route Server mode, EVPN allows workloads such as an Evolved Packet Core (EPC) that supports BGP peering and high throughput and low latency to bypass the NSX edge node and route traffic directly to the physical network.

The NSX edge resides in the control path and not the data path. The NSX edge peers to both the workload and the physical network. The data path bypasses the NSX edge node and routes directly to the physical network using VXLAN encapsulation, enabling high throughput and low latency required by this class of applications.

Figure 4. EVPN Route Server Mode
EVPN Route Server Mode
Figure 5. EVPN Inline Mode
EVPN Inline Mode
Note:

EVPN route server mode is implemented using a specific set of RFCs. These RFCs must be supported on the network underlay devices for the service to work end-to-end.

Table 2. Recommended NSX Design

Design Recommendation

Design Justification

Design Implication

Domain Applicability

Deploy a three-node NSX Manager cluster using the large-sized appliance to configure and manage all NSX-based compute clusters.

The large-sized appliance supports more than 64 ESXi hosts. The small-sized appliance is for proof of concept and the medium size supports up to 64 ESXi hosts only.

The large-sized deployment requires more resources in the vSphere management cluster.

Management domain, Compute clusters VNF, CNF, C-RAN, Near/Far Edge, and NSX Edge

Not applicable to RAN sites

Apply vSphere Distributed Resource Scheduler (DRS) anti-affinity rules to the NSX Manager/Controller cluster nodes.

Using DRS prevents Manager or Controller nodes from running on the same ESXi host, and thereby risking their high availability.

Additional configuration is required to set up anti-affinity rules and the rules must be maintained in the event of node restores.

Create a VLAN and Overlay Transport zone.

Ensures that all segments are available to all ESXi hosts and edge VMs are configured as Transport Nodes.

None

Management domain, Compute clusters VNF, CNF, C-RAN, Near/Far Edge, and NSX Edge

Not applicable to RAN sites

Configure ESXi hosts to use the vSphere Distributed Switch with EDP mode in each NSX compute cluster.

Provides a high-performance network stack for NFV workloads.

EDP mode requires more CPU resources, and compatible NICs compared to standard or ENS interrupt mode.

Use large-sized NSX Edge VMs.

The large-sized appliance provides the required performance characteristics if a failure occurs.

Large-sized Edges consume more CPU and memory resources.

Deploy at least two large-sized NSX Edge VMs in the vSphere Edge Cluster.

Creates the NSX Edge cluster to meet availability requirements.

None

Create an uplink profile with the load balance source teaming policy with two active uplinks for ESXi hosts.

For increased resiliency and performance, supports the concurrent use of two physical NICs on the ESXi hosts is supported by creating two TEPs.

None

Create a second uplink profile with the load balance source teaming policy with two active uplinks for Edge VMs.

For increased resiliency and performance, supports the concurrent use of two virtual NICs on the Edge VMs by creating two TEPs.

None

Create a Transport Node Policy with the VLAN and Overlay Transport Zones, VDS settings, and Physical NICs per vSphere Cluster.

Allows the profile to be assigned directly to the vSphere cluster and ensures consistent configuration across all ESXi hosts in the cluster.

You must create all required Transport Zones before creating the Transport Node Policy.

Create two VLANs to enable ECMP between the Tier-0 Gateway and the Layer 3 device (ToR or upstream device).

The ToR switches or the upstream Layer 3 devices have an SVI on one of the two VLANs. Each edge VM has an interface on each VLAN.

Supports multiple equal-cost routes on the Tier-0 Gateway and provides more resiliency and better bandwidth use in the network.

Extra VLANs are required.

Deploy an Active-Active Tier-0 Gateway.

Supports ECMP North-South routing on all edge VMs in the NSX Edge cluster.

Active-Active Tier-0 Gateways cannot provide services such as NAT. If you deploy a specific solution that requires stateful services on the Tier-0 Gateway, you must deploy a Tier-0 Gateway in Active-Standby mode.

Deploy a Tier-1 Gateway to the NSX Edge cluster and connect it to the Tier-0 Gateway.

Creates a two-tier routing architecture that supports load balancers and NAT.

Because Tier-1 is always Active/Standby, the creation of services such as load balancers or NAT is possible.

None

Deploy Tier-1 Gateways with Non-Preemptive setting.

Ensures that when the failed Edge Transport Node comes back online it does not move services back to itself resulting in a small service outage.

None

Replace the certificate of the NSX Manager instances with a certificate that is signed by a third-party Public Key Infrastructure.

Ensures that the communication between NSX administrators and the NSX Manager instance is encrypted by using a trusted certificate.

Replacing and managing certificates is an operational overhead.

Replace the NSX Manager cluster certificate with a certificate that is signed by a third-party Public Key Infrastructure.

Ensures that the communication between the virtual IP address of the NSX Manager cluster and NSX administrators is encrypted using a trusted certificate.

Replacing and managing certificates is an operational overhead.