Multiple concurrent applications might use a GPU's computational capacity more efficiently than a single application. There are several ways you can partition the memory of your GPUs.

If you are using inference applications with smaller model sizes or small batches of work, such as number of images, you can run the applications concurrently on partitioned GPUs.

You can perform empirical testing to understand the memory size an application requires. Some applications expand to use all available memory, but they might not achieve better performance beyond a certain threshold.

The following examples presume knowledge of acceptable memory requirements with different batch sizes.

  • When you expect that an application with a batch size of 64 requires 66% of GPU memory, run bitfusion run -n 1 -p 0.66 -- python --num_gpus=1 --batchsz=64
  • When you expect that an application with a batch size of 32 requires 5461 MB of GPU memory, run bitfusion run -n 1 -m 5461 -- python --num_gpus=1 --batchsz=32

When you request multiple GPUs, all GPUs allocate the same amount of memory. The fraction size specification must be based on the GPU with the smallest amount of memory.

In the following example, the -p argument requests 33% of the memory of each of the two requested GPUs. The GPUs must physically reside on the same server. If the GPUs are 16 GB devices or if the smallest GPU is a 16 GB device, then approximately 5461 MB is allocated on each GPU. While no other applications are running, can access the full compute power of the two GPUs.

Run bitfusion run -n 2 -p 0.33 -- python --num_gpus=1 --batchsz=64

You can run multiple applications from a single client on the same GPU concurrently.

For example, to start two concurrent application instances in the background, run both these commands.
  1. bitfusion run -n 1 -p 0.66 -- python --num_gpus=1 --batchsz=64 &
  2. bitfusion run -n 1 -p 0.33 -- python --num_gpus=1 --batchsz=32 &

NVIDIA System Management Interface (nvidia-smi)

You can run the NVIDIA System Management Interface nvidia-smi monitoring application, for example, to check your GPU partition size or verify the resources available on a vSphere Bitfusion server. Typically, the application is provided on the server when you install the NVIDIA driver.

Applications that run on the vSphere Bitfusion clients do not require the NVIDIA driver, but might require the nvidia-smi application, for example, to understand the capabilities of the GPU or to determine the GPU memory sizing. To support such operations, since vSphere Bitfusion 3.0, the nvidia-smi application is provided on all vSphere Bitfusion clients. vSphere Bitfusion copies the application from the server to the client.

For example, to request a 1024 MB partition on a GPU, run bitfusion run -n 1 -m 1024 -- nvidia-smi.

The output of the nvidia-smi application displays the requested partition value of 1024MiB.

Requested resources:
Server List:
Client idle timeout: 0 min
Wed Sep 23 15:21:17 2020
| NVIDIA-SMI 440.100      Driver Version: 440.64.00    CUDA Version: 10.2     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:13:00.0 Off |                    0 |
| N/A   36C    P8     9W /  70W |      0MiB /  1024MiB |      0%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No running processes found                                                 |