You can allocate a number of GPUs and run multiple applications on the same GPUs.

While the run command allocates GPU, runs applications, and deallocates GPU collectively, vSphere Bitfusion has three individual commands to perform the same tasks. By using the individual commands, you can use the same GPU for multiple applications and have greater control when you are integrating vSphere Bitfusion into other tools and workflows, such as the scheduling software, SLURM.

  • To allocate GPUs, run request_gpus.
  • To start applications in an environment that can access the GPUs when the application makes CUDA calls, run client.
  • To deallocate the GPUs, run release_gpus.
    Note: The request_gpus command creates a file and environment variables that can be forwarded to other tools. The tools can run the client command with the same allocation configuration.

The arguments of the run command are split between the request_gpus and client commands.

To understand the use of the individual commands, see the following example workflow that is using the AI application asimov_i.py.

  1. To allocate GPUs to start multiple and sequential applications, run bitfusion request_gpus -n 1 -m 5461.
    Requested resources:
    Server List: 172.16.31.241:56001
    Client idle timeout: 0 min
  2. To start an application by running the client command, run bitfusion client nvidia-smi.
    Wed Sep 23 15:26:02 2020
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 440.100      Driver Version: 440.64.00    CUDA Version: 10.2     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla T4            Off  | 00000000:13:00.0 Off |                    0 |
    | N/A   36C    P8    10W /  70W |      0MiB /  5461MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+                                        	|
    
  3. To start another application by running the client command, run bitfusion client -- python asimov_i.py --num_gpus=1 --batchsz=64.
  4. To deallocate the GPUs , run bitfusion release_gpus.