To verify that the installation of vSphere Bitfusion is successful, you can test vSphere Bitfusion by running an example ML workload. Since vSphere Bitfusion 4.5, you can start a script from a vSphere Bitfusion client that automates the process of installing associated software for vSphere Bitfusion and a GPU benchmark.
After your setup of vSphere Bitfusion is complete, you require additional dependencies before you can run a machine-learning (ML) application in Bitfusion, such as TensorFlow or PyTorch. First you must install NVIDIA CUDA, NVIDIA cuDNN, and Linux operating system dependencies. Then you can install an ML application and run the GPU benchmarks to verify that the vSphere Bitfusion environment is working and to test the overall performance of vSphere Bitfusion. The vSphere Bitfusion client includes a script, which automates all required installation steps and minimizes the manual effort. The script can be used only on Ubuntu Linux 20.04 operating system and runs TensorFlow GPU benchmarks.
Alternatively, if you have a different operating system or require a deeper understanding, you can manually perform the installation of the additional dependencies and GPU benchmarks. The manual steps present you with additional options to verify your vSphere Bitfusion installation, such as running PyTorch tests on Red Hat and CentOS operating systems. For more information, see the vSphere Bitfusion Example Guide.
Install vSphere Bitfusion dependencies and ML benchmarks by using a script
To verify that your vSphere Bitfusion environment is working and to check the performance of vSphere Bitfusion, by using a client_vm_starter.sh script, you can install additional dependencies for vSphere Bitfusion and run Tensorflow benchmarks .
In the following procedure, the client_vm_starter.sh script installs NVIDIA CUDA, NVIDIA cuDNN, TensorFlow 2.6, TensorFlow benchmarks, and additional dependencies. For more options, see Script command reference.
Prerequisites
- Verify that you have installed a vSphere Bitfusion server.
- Verify that you have installed the supported NVIDIA driver on the vSphere Bitfusion server.
- Verify that you have installed and activated a vSphere Bitfusion client.
- Verify that your vSphere Bitfusion client runs on Ubuntu Linux 20.04 operating system.
- Verify that you have root privileges on your Ubuntu operating system.
- Verify that you have at least 20 GB of free space on your vSphere Bitfusion client.
Procedure
Results
You can now run TensorFlow benchmarks with vSphere Bitfusion with shared GPUs from a remote server. This result verifies that your vSphere Bitfusion deployment is sucessful. You can run the GPU benchmark script without using vSphere Bitfusion and compare the performance.
What to do next
The benchmarks support many models and parameters to help you explore a large space within the machine learning discipline. For more information, see Starting Applications in vSphere Bitfusion in the Using VMware vSphere Bitfusion.
Script command reference
The following section lists all parameters and options that you can use with the client_vm_starter.sh
install script.
Requirements
To run the script, verify that the requirements are satisfied. See Install vSphere Bitfusion dependencies and ML benchmarks by using a script.
Parameters and Options
Parameter | Parameter Description | Available Option | Option Description |
---|---|---|---|
-p install_bundle |
Install the vSphere Bitfusion client, NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, and TensorFlow benchmarks. | -d |
Install Docker service and NVIDIA container toolkit. |
-p install_cuda_deps |
Install NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, and TensorFlow benchmarks. | -d |
Install Docker service and NVIDIA container toolkit. |
-p list_clients |
List the available vSphere Bitfusion client version that are available in the official vSphere Bitfusion repository. | - b X.Y.Z |
Install a specific version of vSphere Bitfusion. For example, - b 4.0.1 . |
-p install_client |
Install the vSphere Bitfusion client. | ||
-p install_docker |
Install Docker service and NVIDIA container toolkit. | ||
-p remove_client |
Remove the vSphere Bitfusion client. | ||
-p remove_bundle |
Remove the vSphere Bitfusion client, NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, and TensorFlow benchmarks. |
Examples
- To install the vSphere Bitfusion client, NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, TensorFlow benchmarks, Docker service, and NVIDIA container toolkit, run the
sudo ./client_vm_starter.sh -p install_bundle -d
command. - To install NVIDIA CUDA, NVIDIA cuDNN, TensorFlow, TensorFlow benchmarks, Docker service, and NVIDIA container toolkit, run the
sudo ./client_vm_starter.sh -p install_cuda_deps -d
command. - To install the vSphere Bitfusion 4.0.1 client, run the
sudo ./client_vm_starter.sh -p install_client -b 4.0.1
command. - To install the Docker service and NVIDIA container toolkit, run the
sudo ./client_vm_starter.sh -p install_docker
command. - To list the available BF clients in the official repository, run the
sudo ./client_vm_starter.sh -p list_clients
command.