Enable NetQ Receive Side Scaling to enable vNIC requests to be offloaded to a physical NIC. It improves packet performance of the receive-side data.

Starting with NSX 4.1.0 and ESXi 8.0, NSX supports NetQ Receive Side Scaling. When a physical NIC card sends packets to a host, the Enhanced Network Stack (ENS), which runs as the host switch is configured in Enhanced Datapath mode, on that host distributes data across different logical cores on NUMA nodes. There are a couple of ways to configure RSS engines.

As a network admin wanting to improve the throughput packet performance of receive-side data, you might want to consider one of these ways to configure RSS to leverage the benefits.
These two modes are:
  • RSS engine is dedicated to a single vNIC queue: A dedicated RSS engine completely offloads any request coming from a vNIC to the physical NIC. In this mode, a single RSS engine is dedicated to a single vNIC queue. It improves throughput performance as pNIC manages the recieve side data and shares it among the available hardware queues to serve the request. The vNIC queues are co-located on the same logical core or fastpath as pnic queues.
  • RSS engine is shared by multiple vNIC queues: In this mode, multiple hardware queues are made available to vNIC queues. However, the vNIC handling flows might not be aligned with the physical hardware queue that will process data. It means, there is no guarantee that vNIC and physical NICs will be aligned.
Note: If Default Queue Receive Side Scaling (DRSS) is enabled on the NIC card, deactivate it.

Prerequisites

  • Hosts must be running ESXi version 8 or later.
  • Ensure NIC card supports RSS functionality.
  • EDP NETQ RSS is supported from NSX 4.0 and ESXi version 8.0 onwards. Supported inbox drivers are Intel40en (async driver) and Mellanox nmlx. Refer to the driver documentation to confirm whether it has ENS compatible RSS implementation.

Procedure

  1. To enable NetQ/RSS, esxcli system module parameters set -m -i40en -p DRSS=0,0 RSS=1,0.

    Where, DRSS=0,0 indicates DRSS is deactivated on both NIC ports.

    RSS=1,0 indicates NetQ RSS is enabled on one of the NIC ports.

  2. To unload driver, run vmkload_mod -u i40en.
  3. To reload driver for the RSS setting to take effect, run vmkload_mod i40en.
  4. Stop the device manager to trigger PCI fastconnect so that it can scan devices and associate the driver with a NIC.

    Run kill -HUP 'ps | grep mgr | awk '{print $1}'.

  5. To configure multiple RSS engines to be available to serve RSS requests from vNICs, configure these parameters in the .vmx file of VM.

    ethernet.pnicfeatures = '4', which indicates RSS feature is requested by vNICs.

    ethernet.ctxPerDev = '3', which indicates that multiple contexts (multiple logical cores) are enabled to process each vNIC. The VMs connected to the vSphere switch are configured for multiple queues. It means multiple logical cores of a NUMA node can process the Tx and Rx traffic coming from vNICs.

    When multiple vNICs request RSS offloading, the Enhanced Network Stack (ENS) does not offload their RSS requests to the pnic, but the shared RSS engine processes their requests. For shared RSS, multiple RSS queues are available but co-location of a vNIC queue or a pNIC queue is not guranteed.

  6. To configure a dedicated RSS engine to process requests from a vNIC, configure these parameters in the .vmx file of the VM.

    ethernet.rssoffload=True,

    With the preceding configuration enabled, RSS requests from a vNIC is offloaded to the physical NIC. Only one vNIC can offload its requests to an RSS engine. In this mode, vNIC queues are aligned to the pNIC queues.

  7. Verify that packet flow is distributed on the hardware queues provided by the RSS engine.

    Run the following commands.

    vsish

    get /net/pNics/vmnicX/stats

    Sample output:

    rxq0: pkts=0 bytes=0 toFill=2047 toProc=0 noBuf=0 csumErr=0
    rxq1: pkts=0 bytes=0 toFill=2047 toProc=0 noBuf=0 csumErr=0
    rxq2: pkts=0 bytes=0 toFill=2047 toProc=0 noBuf=0 csumErr=0
    rxq3: pkts=0 bytes=0 toFill=2047 toProc=0 noBuf=0 csumErr=0
    rxq4: pkts=0 bytes=0 toFill=2047 toProc=0 noBuf=0 csumErr=0
    rxq5: pkts=0 bytes=0 toFill=2047 toProc=0 noBuf=0 csumErr=0
    rxq6: pkts=0 bytes=0 toFill=2047 toProc=0 noBuf=0 csumErr=0
    rxq7: pkts=0 bytes=0 toFill=2047 toProc=0 noBuf=0 csumErr=0
    txq0: pkts=0 bytes=0 toFill=0 toProc=0 dropped=0
    txq1: pkts=0 bytes=0 toFill=0 toProc=0 dropped=0
    txq2: pkts=0 bytes=0 toFill=0 toProc=0 dropped=0
    txq3: pkts=0 bytes=0 toFill=0 toProc=0 dropped=0
    txq4: pkts=0 bytes=0 toFill=0 toProc=0 dropped=0
    txq5: pkts=0 bytes=0 toFill=0 toProc=0 dropped=0
    txq6: pkts=0 bytes=0 toFill=0 toProc=0 dropped=0
    txq7: pkts=0 bytes=0 toFill=0 toProc=0 dropped=0