You can use a Paravirtual Remote Direct Memory Access (PVRDMA) adapter with your vSphere Bitfusion servers and clients to improve the performance of your cluster.
One of the benefits to run ML and AI workloads in vSphere Bitfusion is keeping the pipeline of GPU work filled, which hides the network latency. Since the GPU pipeline cannot always be kept full, it is recommended to have a network connection with a low latency of 50 microseconds or less.
Remote Direct Memory Access (RDMA) allows direct memory access from the memory of one computer to the memory of another computer without involving the operating system or CPU. The transfer of memory is offloaded to an RDMA-capable Host Channel Adapter (HCA). Large maximum transmission units (MTUs) are commonly used in RDMA networking, for example 9000 bytes per frame. Both the direct access and large frame sizes, lower the network overhead, latency, and improve the performance of vSphere Bitfusion.
Paravirtual Remote Direct Memory Access (PVRDMA) allows RDMA between virtual machines (VMs) over a distributed network without dedicating an entire physical adapter to a VM by using DirectPath I/O. PVRDMA network adapters provide remote direct memory access in a virtual environment, where the VMs can be either on the same physical host or other hosts in the same network. When you are not using DirectPath I/O, and physical RDMA-capable adapters and switches are available, it is recommended to use PVRDMA instead of VMXNET3.
For more information, see Remote Direct Memory Access for Virtual Machines in the vSphere Networking documentation.