VMware vSphere Bitfusion virtualizes hardware accelerators such as graphical processing units (GPUs) to provide a pool of shared, network-accessible resources that support artificial intelligence (AI) and machine learning (ML) workloads.

vSphere Bitfusion Architecture

vSphere Bitfusion has a client-server architecture. The product allows multiple client virtual machines (VMs) running artificial intelligence (AI) and machine learning (ML) applications to share access to remote GPUs on virtual machines running vSphere Bitfusion server software. You run the applications on the vSphere Bitfusion client machines, while the GPUs that provide acceleration are installed on the vSphere Bitfusion server machines across a network. The applications can open files, allocate memory, and call CUDA as if operating on a machine with local GPUs.

The following figure is an example of a small vSphere Bitfusion cluster, such as a set of vSphere Bitfusion server-client machines and vCenter Server on a switched network. A minimal vSphere Bitfusion cluster configuration is one client, one server, and one vCenter Server. You can create large clusters with multiple clients and multiple servers.

Figure 1. Example of a small vSphere Bitfusion cluster
Example of small vSphere Bitfusion cluster with one primary server, two secondary or subsequent servers, and two clients, which are connected to the same vCenter Server instance.
Note: Before using VMware vSphere Bitfusion, you must deploy a vSphere Bitfusion server, and install and activate vSphere Bitfusion software on a client machine. For more information, see Overview of the vSphere Bitfusion Installation Process in the VMware vSphere Bitfusion Installation Guide.

vSphere Bitfusion Functionality

When you start an AI or ML application on the vSphere Bitfusion client, vSphere Bitfusion intercepts the CUDA calls of the application, and sees the data and data pointers of the calls. The vSphere Bitfusion server does not require a connection to the storage, where the application data is kept, but to the vSphere Bitfusion client only. The client transfers the data and the rest of the CUDA calls to the server. The vSphere Bitfusion server processes the calls and returns the results back to the client.

When you start AI and ML applications in vSphere Bitfusion, you can share the GPUs resources.

  • You can dynamically allocate and access GPU resources from vSphere Bitfusion servers.

    Applications can share GPU resources that are not installed on local machines and you can run each application in a suitable environment on a configured vSphere Bitfusion client, such as a VM, bare metal machine, or container. Applications consume GPU acceleration services from a pool of vSphere Bitfusion servers across a network, and consume the resources only for the length of time that an application or session runs. GPUs return to the pool when applications or sessions complete.

  • You can access partitions of GPU resources for concurrent sharing with other applications.

    The memory of a physical GPU can be divided into fractions of an arbitrary size. Each fraction can be allocated to a different application at the same time. vSphere Bitfusion performs sharing with an interposition technology. vSphere Bitfusion intercepts API calls that normally address a local accelerator on a PCIe host bus and sends the API calls and related data across a network. vSphere Bitfusion provides sharing services for AI and ML applications, and supports the CUDA API to target NVIDIA GPUs.

vSphere Bitfusion Components

vSphere Bitfusion Server
vSphere Bitfusion server runs on an ESXi host with locally installed GPUs as a VMware appliance, which is a preconfigured virtual machine (VM) with prepackaged software and services. The server requires access to the local GPUs, usually through VMware vSphere ® DirectPath I/O™.
vSphere Bitfusion Client
vSphere Bitfusion client runs on VMs which run the AI and ML applications.
vSphere Bitfusion Plug-In
The vSphere Bitfusion servers register a vSphere Bitfusion Plug-in with VMware vCenter Server. The plug-in provides monitoring and management of vSphere Bitfusion clients and servers.
vSphere Bitfusion Cluster
vSphere Bitfusion cluster is the set of all vSphere Bitfusion servers and clients in a vCenter Server instance.
vSphere Bitfusion Linux User Group
During the installation process of a vSphere Bitfusion client, the client creates a vSphere Bitfusion Linux user group, bitfusion. Only the members of the group can use vSphere Bitfusion. Certain configuration files are set up with appropriate permissions and the members of the group inherit appropriate limits to work effectively with vSphere Bitfusion.
vSphere Client
The vSphere Client lets you connect to vCenter Server instances by using a Web browser, so that you can manage your vSphere infrastructure. You access the vSphere Bitfusion Plug-in through the vSphere Client.
Command-Line Interface (CLI)
You can manage vSphere Bitfusion servers and clients by using command-line interface (CLI) commands.
vCenter Server
vCenter Server is the server management software that provides a centralized platform for controlling your vSphere environment.

vSphere Bitfusion FAQ

To understand more about vSphere Bitfusion, see the frequently asked questions (FAQ) section in The Cloud Platform Tech Zone.