You can run an application in the entire GPU memory or only in a dedicated partition of the memory. vSphere Bitfusion can allocate a GPU, run an application, and deallocate the GPU with a single CLI command or you can use individual commands to perform the same tasks.
How to run applications
The vSphere Bitfusion client can run artificial intelligence (AI) and machine learning (ML) applications on remote shared GPUs.
You use the run command to allocate GPUs for a single application. The application runs by default in the entire memory resource of the GPUs. All GPUs that are requested by using the run command must be allocated from a single vSphere Bitfusion server, and the server must list the GPUs as separate devices with different PCIe addresses.
- Allocates GPUs from the shared pool.
- Starts an application in an environment that can access the GPUs when the application makes CUDA calls.
- Deallocates the GPUs when the application closes.
-n num_gpus
argument for the number of the GPUs. To distinguish
vSphere Bitfusion arguments from applications, you use a double-hyphen separator or place the application within quotes.
bitfusion run -n num_gpus other switches -- applications and arguments
bitfusion run -n num_gpus other switches "applications and arguments"
For example, the AI application, asimov_i.py, requires two arguments: the number of GPUs and a batch size.
- When the application expects 1 GPU, run bitfusion run -n 1 -- python asimov_i.py --num_gpus=1 --batchsz=64
- When the application expects 2 GPUs, run bitfusion run -n 2 -- python asimov_i.py --num_gpus=2 --batchsz=64
How to run applications with partial GPUs
You can run your application in a dedicated partition of a GPU's memory and other applications can use the remaining GPU's memory.
The GPU partitioning arguments are optional run command arguments. You use the arguments to run your application in a partition of a GPU memory.
- The GPU partitioning process is dynamic. When you start a run command with a partial argument, vSphere Bitfusion allocates a partition before the application runs and deallocates the partition afterwards.
- The applications that are sharing GPUs concurrently are isolated from each other by using separate client processes, network streams, server processes, and memory partitions.
- vSphere Bitfusion partitions only the memory of the GPU and not the compute resource. An application is strictly contained to the assigned memory partition, but it can access the complete compute resource, if needed. When the same compute cells are required, the applications compete for compute resources, otherwise the applications run concurrently.
You can specify the partition size in MB or as a fraction of the total GPU memory.
bitfusion run -n num_gpus -p gpu_fraction -- applications and arguments
Partitioning GPU memory size by MB
bitfusion run -n num_gpus -p MBs_per_gpu -- applications and arguments
GPU partitioning examples
Multiple concurrent applications might use a GPU's computational capacity more efficiently than a single application. There are several ways you can partition the memory of your GPUs.
If you are using inference applications with smaller model sizes or small batches of work, such as number of images, you can run the applications concurrently on partitioned GPUs.
You can perform empirical testing to understand the memory size an application requires. Some applications expand to use all available memory, but they might not achieve better performance beyond a certain threshold.
The following examples presume knowledge of acceptable memory requirements with different batch sizes.
- When you expect that an application with a batch size of 64 requires no more than 66% of GPU memory, run bitfusion run -n 1 -p 0.66 -- python asimov_i.py --num_gpus=1 --batchsz=64
- When you expect that an application with a batch size of 32 requires no more than 5461 MB of GPU memory, run bitfusion run -n 1 -m 5461 -- python asimov_i.py --num_gpus=1 --batchsz=32
When you request multiple GPUs, all GPUs allocate the same amount of memory. The fraction size specification must be based on the GPU with the smallest amount of memory.
In the following example, the -p argument requests 33% of the memory of each of the two requested GPUs. The GPUs must physically reside on the same server. If the GPUs are 16 GB devices or if the smallest GPU is a 16 GB device, then approximately 5461 MB is allocated on each GPU. While no other applications are running, asimov_i.py can access the full compute power of the two GPUs.
Run bitfusion run -n 2 -p 0.33 -- python asimov_i.py --num_gpus=1 --batchsz=64
You can run multiple applications from a single client on the same GPU concurrently.
- bitfusion run -n 1 -p 0.66 -- python asimov_i.py --num_gpus=1 --batchsz=64 &
- bitfusion run -n 1 -p 0.33 -- python asimov_i.py --num_gpus=1 --batchsz=32 &
NVIDIA system management interface (nvidia-smi)
You can run the NVIDIA System Management Interface nvidia-smi monitoring application, for example, to check your GPU partition size or verify the resources available on a vSphere Bitfusion server. Typically, the application is provided on the server when you install the NVIDIA driver.
Applications that run on the vSphere Bitfusion clients do not require the NVIDIA driver, but might require the nvidia-smi application, for example, to understand the capabilities of the GPU or to determine the GPU memory sizing. To support such operations, since vSphere Bitfusion 3.0, the nvidia-smi application is provided on all vSphere Bitfusion clients. vSphere Bitfusion copies the application from the server to the client.
The output of the nvidia-smi application displays the requested partition value of 1024MiB
.
Requested resources: Server List: 172.16.31.241:56001 Client idle timeout: 0 min Wed Sep 23 15:21:17 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.100 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:13:00.0 Off | 0 | | N/A 36C P8 9W / 70W | 0MiB / 1024MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
How to run applications with reserved GPUs
You can allocate GPUs and run multiple applications on the same GPUs.
While the run command allocates GPUs, runs applications, and deallocates GPUs collectively, vSphere Bitfusion has three individual commands to perform the same tasks.
- To allocate GPUs, run the request_gpus command.
- To start applications, run the client command.
- To deallocate the GPUs, run the release_gpus command.
By using the individual commands, you can use the same GPU for multiple applications and have greater control when you are integrating vSphere Bitfusion into other tools and workflows, such as the scheduling software, SLURM.
request_gpus
and
client
commands.
To understand the use of the individual commands, see the following example workflow that is using the AI application asimov_i.py.
- To allocate GPUs to start multiple and sequential applications, run bitfusion request_gpus -n 1 -m 5461.
Requested resources: Server List: 172.16.31.241:56001 Client idle timeout: 0 min
- To start an application by running the client command, run bitfusion client nvidia-smi.
Wed Sep 23 15:26:02 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.100 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:13:00.0 Off | 0 | | N/A 36C P8 10W / 70W | 0MiB / 5461MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |
- To start sequentially another application on the same GPUs by running the client command, run bitfusion client -- python asimov_i.py --num_gpus=1 --batchsz=64.
- To deallocate the GPUs , run bitfusion release_gpus.
How to run applications on specific GPUs and servers
Since vSphere Bitfusion 4.0, you can use CLI command arguments to filter the GPUs in your resource pool and start your applications on a specific set of GPUs.
You can use a --filter
argument with the run, request_gpus, and list_gpus commands and run the commands with a specific set of GPUs or servers. You can also combine filters to list servers and GPUs that satisfy multiple conditions. For each data type, you must use an appropriate operator, such as <
, >
, >=
, <=
, =
, or !=
.
Filter | Data Type | Description |
---|---|---|
device.index |
Comma Separated Integer List | The system index of a GPU. For example, --filter "device.index=1" or --filter "device.index=0,1,2,3" . To see the indices of your GPUs, run the nvidia-smi command. |
device.name |
String | The model name of a GPU device. For example, --filter "device.name=NVIDIA Tesla T4" . |
device.memory |
Integer | The physical memory size of a GPU device in MB. For example, enter --filter "device.memory=16384" for a GPU device with 16 GB memory size. |
device.capability |
Version | The NVIDIA device CUDA compute capability. The CUDA compute capability is a mechanism that NVIDIA uses with the CUDA API to specify the features that your GPUs support. The value must be entered in a X or X.Y format. For example, --filter "device.capability=8.0" . For more information, see the NVIDIA CUDA GPUs documentation. |
server.addr |
String | The IP address of a vSphere Bitfusion server. For example, --filter "server.addr=10.202.8.209" . |
server.hostname |
String | The hostname of a vSphere Bitfusion server. For example, --filter "server.hostname=bitfusion-server-4.0.0-13" . |
server.has-rdma |
Boolean | The vSphere Bitfusion server uses an RDMA network connection. For example, --filter "server.has-rdma=true" . |
server.cuda-version |
Version | The CUDA version that is installed on a vSphere Bitfusion server. The value must be entered in a X , X.Y , or X.Y.Z format. For example, --filter "server.cuda-version=11.3" . |
server.driver-version |
Version | The NVIDIA driver version that is installed on a vSphere Bitfusion server. The value must be entered in a X , X.Y , or X.Y.Z format. For example, --filter "server.driver-version=460.73" . |
bitfusion list_gpus --filter "device.memory>16384"
command.
bitfusion run -n 1 --filter "device.capability=8.0" -- workload
command. Similarly, to run the workload on GPU devices with Volta GPU microarchitecture only, run the
bitfusion run -n 1 --filter "device.capability=7.0" -- workload
command.
How to run applications with an alternative list of vSphere Bitfusion servers
You can create an alternative vSphere Bitfusion server list and use the run, request_gpus, and list_gpus commands to start your applications on specific GPUs. The alternative list is a subset of the primary server list of GPU servers maintained by vSphere Bitfusion in the ~/.bitfusion/servers.conf file.
vSphere Bitfusion supports IPv4 addresses only.
Procedure
- ♦ Use a servers.conf file or create a server list.
- To use an alternative servers.conf file of vSphere Bitfusion servers, run
bitfusion run --servers value, -s value
, where the value argument is the filepath to a servers.conf file. - To create an alternative list of vSphere Bitfusion servers, run
bitfusion run --server_list value, -l value
, where the value argument is a server list in a"ip_address:port;ip_address:port"
format.Note: Enclose the list within quotes, because a semicolon is used as a separator when you list multiple addresses and the command-line interpreter can parse the list as multiple commands.
- To use an alternative servers.conf file of vSphere Bitfusion servers, run
How to change the waiting time for available GPUs
When you start an application, by default, a vSphere Bitfusion client waits for up to 30 minutes for enough GPUs to be available.
To modify the default time that a vSphere Bitfusion client waits for available GPUs, you can add the --timeout value, -t value
argument to the run and request_gpus commands.
10 | 10 seconds |
10s | 10 seconds |
10m | 10 minutes |
10h | 10 hours |