To verify that the installation of vSphere Bitfusion is successful, you can test vSphere Bitfusion by running an example ML workload. Since vSphere Bitfusion 4.5, you can start a script from a vSphere Bitfusion client that automates the process of installing associated software for vSphere Bitfusion and a GPU benchmark.

After your setup of vSphere Bitfusion is complete, you require additional dependencies before you can run a machine-learning (ML) application in Bitfusion, such as TensorFlow or PyTorch. First you must install NVIDIA CUDA, NVIDIA cuDNN, and Linux operating system dependencies. Then you can install an ML application and run the GPU benchmarks to verify that the vSphere Bitfusion environment is working and to test the overall performance of vSphere Bitfusion. The vSphere Bitfusion client includes a script, which automates all required installation steps and minimizes the manual effort. The script can be used only on Ubuntu Linux 20.04 operating system and runs TensorFlow GPU benchmarks.

Alternatively, if you have a different operating system or require a deeper understanding, you can manually perform the installation of the additional dependencies and GPU benchmarks. The manual steps present you with additional options to verify your vSphere Bitfusion installation, such as running PyTorch tests on Red Hat and CentOS operating systems. For more information, see the vSphere Bitfusion Example Guide.

Install vSphere Bitfusion dependencies and ML benchmarks by using a script

To verify that your vSphere Bitfusion environment is working and to check the performance of vSphere Bitfusion, by using a client_vm_starter.sh script, you can install additional dependencies for vSphere Bitfusion and run Tensorflow benchmarks .

In the following procedure, the client_vm_starter.sh script installs NVIDIA CUDA, NVIDIA cuDNN, TensorFlow 2.6, TensorFlow benchmarks, and additional dependencies. For more options, see Script command reference.

Prerequisites

  • Verify that you have installed a vSphere Bitfusion server.
  • Verify that you have installed the supported NVIDIA driver on the vSphere Bitfusion server.
  • Verify that you have installed and activated a vSphere Bitfusion client.
  • Verify that your vSphere Bitfusion client runs on Ubuntu Linux 20.04 operating system.
  • Verify that you have root privileges on your Ubuntu operating system.
  • Verify that you have at least 20 GB of free space on your vSphere Bitfusion client.

Procedure

  1. In the terminal of a vSphere Bitfusion client, create a bitfusion folder by running the mkdir ~/bitfusion command.
  2. To navigate to the bitfusion folder, run the cd ~/bitfusion/ command.
  3. To dowload the client_vm_starter.sh script, run the sudo wget https://packages.vmware.com/bitfusion/scripts/client_vm_starter.sh command.
  4. To use the script, run the sudo ./client_vm_starter.sh -p install_cuda_deps command.
  5. To use the TensorFlow tf_cnn_benchmarks.py benchmark script, run the following command.
    bitfusion run -n 1 -- python3 \
    ./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \
    --batch_size=64 \
    --model=resnet50 \
    --num_gpus=1 \
    --num_batches=100

Results

You can now run TensorFlow benchmarks with vSphere Bitfusion with shared GPUs from a remote server. This result verifies that your vSphere Bitfusion deployment is sucessful. You can run the GPU benchmark script without using vSphere Bitfusion and compare the performance.

What to do next

The benchmarks support many models and parameters to help you explore a large space within the machine learning discipline. For more information, see Starting Applications in vSphere Bitfusion in the Using VMware vSphere Bitfusion.

Script command reference

The following section lists all parameters and options that you can use with the client_vm_starter.sh install script.

Requirements

To run the script, verify that the requirements are satisfied. See Install vSphere Bitfusion dependencies and ML benchmarks by using a script.

Parameters and Options

Parameter Parameter Description Available Option Option Description
-p install_bundle Install the vSphere Bitfusion client, NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, and TensorFlow benchmarks. -d Install Docker service and NVIDIA container toolkit.
-p install_cuda_deps Install NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, and TensorFlow benchmarks. -d Install Docker service and NVIDIA container toolkit.
-p list_clients List the available vSphere Bitfusion client version that are available in the official vSphere Bitfusion repository. - b X.Y.Z Install a specific version of vSphere Bitfusion. For example, - b 4.0.1.
-p install_client Install the vSphere Bitfusion client.
-p install_docker Install Docker service and NVIDIA container toolkit.
-p remove_client Remove the vSphere Bitfusion client.
-p remove_bundle Remove the vSphere Bitfusion client, NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, and TensorFlow benchmarks.

Examples

For example, you can run the following script commands.
  • To install the vSphere Bitfusion client, NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, TensorFlow benchmarks, Docker service, and NVIDIA container toolkit, run the sudo ./client_vm_starter.sh -p install_bundle -d command.
  • To install NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, TensorFlow benchmarks, Docker service, and NVIDIA container toolkit, run the sudo ./client_vm_starter.sh -p install_cuda_deps -d command.
  • To install the vSphere Bitfusion 4.0.1 client, run the sudo ./client_vm_starter.sh -p install_client -b 4.0.1 command.
  • To install the Docker service and NVIDIA container toolkit, run the sudo ./client_vm_starter.sh -p install_docker command.
  • To list the available BF clients in the official repository, run the sudo ./client_vm_starter.sh -p list_clients command.