RDMA over Converged Ethernet ensures low-latency, light-weight, and high-throughput RDMA communication over an Ethernet network. RoCE requires a network that is configured for lossless traffic of information at layer 2 alone or at both layer 2 and layer 3.
RDMA over Converged Ethernet (RoCE) is a network protocol that uses RDMA to provide faster data transfer for network-intensive applications. RoCE allows direct memory transfer between hosts without involving the hosts' CPUs.
There are two versions of the RoCE protocol. RoCE v1 operates at the link network layer (layer 2). RoCE v2 operates at the Internet network layer (layer 3) . Both RoCE v1 and RoCE v2 require a lossless network configuration. RoCE v1 requires a lossless layer 2 network, and RoCE v2 requires that both layer 2 and layer 3 are configured for lossless operation.
Lossless Layer 2 Network
To ensure lossless layer 2 environment, you must be able to control the traffic flows. Flow control is achieved by enabling global pause across the network or by using the Priority Flow Control (PFC) protocol defined by Data Center Bridging group (DCB). PFC is a layer 2 protocol that uses the class of services field of the 802.1Q VLAN tag to set individual traffic priorities. It puts on pause the transfer of packets towards a receiver in accordance with the individual class of service priorities. This way, a single link carries both lossless RoCE traffic and other lossy, best-effort traffic. With traffic flow congestion, important lossy traffic can be affected. To isolate different flows from one another, use RoCE in a PFC priority-enabled VLAN.
Lossless Layer 3 Network
RoCE v2 requires that lossless data transfer is preserved at layer 3 routing devices. To enable the transfer of layer 2 PFC lossless priorities across layer 3 routers, configure the router to map the received priority setting of a packet to the corresponding Differentiated Serviced Code Point (DSCP) QoS setting that operates at layer 3. The transferred RDMA packets are marked with layer 3 DSCP, layer 2 Priority Code Points (PCP) or with both. To extract priority information from the packet routers use either DSCP or PCP. In case PCP is used, the packet must be VLAN-tagged and the router must copy the PCP bits of the tag and forward them to the next network. If the packet is marked with DSCP, the router must keep the DSCP bits unchanged.
Like RoCE v1, RoCE v2 must run on a PFC priority-enabled VLAN.
For vendor-specific configuration information, refer to the official documentation of the respective device or the switch vendor.