You can allocate a number of GPUs and run multiple applications on the same GPUs.
While the run command allocates GPU, runs applications, and deallocates GPU collectively, vSphere Bitfusion has three individual commands to perform the same tasks. By using the individual commands, you can use the same GPU for multiple applications and have greater control when you are integrating vSphere Bitfusion into other tools and workflows, such as the scheduling software, SLURM.
- To allocate GPUs, run request_gpus.
- To start applications in an environment that can access the GPUs when the application makes CUDA calls, run client.
- To deallocate the GPUs, run release_gpus.
Note: The request_gpus command creates a file and environment variables that can be forwarded to other tools. The tools can run the client command with the same allocation configuration.
The arguments of the run command are split between the request_gpus
and client
commands.
To understand the use of the individual commands, see the following example workflow that is using the AI application asimov_i.py.
- To allocate GPUs to start multiple and sequential applications, run bitfusion request_gpus -n 1 -m 5461.
Requested resources: Server List: 172.16.31.241:56001 Client idle timeout: 0 min
- To start an application by running the client command, run bitfusion client nvidia-smi.
Wed Sep 23 15:26:02 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.100 Driver Version: 440.64.00 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:13:00.0 Off | 0 | | N/A 36C P8 10W / 70W | 0MiB / 5461MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |
- To start another application by running the client command, run bitfusion client -- python asimov_i.py --num_gpus=1 --batchsz=64.
- To deallocate the GPUs , run bitfusion release_gpus.