Deploy a Deep Learning VM Directly on a vSphere Cluster in VMware Private AI Foundation with NVIDIA

To quickly test the deep learning VM templates in VMware Private AI Foundation with NVIDIA, you can deploy a deep learning VM directly on a vSphere cluster by using the vSphere Client.

Prerequisites

Verify that the following prerequisites are in place for the AI-ready infrastructure.

VMware Private AI Foundation with NVIDIA is deployed and configured. See Deploying VMware Private AI Foundation with NVIDIA.
A content library with deep learning VMs is available. See Create a Content Library with Deep Learning VM Images for VMware Private AI Foundation with NVIDIA.

Procedure

Log in to the vCenter Server instance for the VI workload domain.
From the vSphere Client home menu, select Content Libraries.
Deploy a deep learning VM from the content library.
1. Navigate to the deep learning VM image in the content library.
2. Right-click an OVF template and select New VM from This Template.
3. Follow the wizard to enter a name and select a VM folder, and select a GPU-enabled cluster in the VI workload domain.
4. Select a datastore and a network on the distributed switch for the cluster.
5. On the Customize template page, enter the custom VM properties that are required for setting up the AI functionality.
  See OVF Properties of Deep Learning VMs.
6. Click Finish
After the deep learning VM is created, in the virtual machine settings, assign it an NVIDIA vGPU device.
See Add an NVIDIA GRID vGPU to a Virtual Machine.
For a deep learning VM that is running an NVIDIA RAG, select the full-sized vGPU profile for time-slicing mode or a MIG profile. For example, for NVIDIA A100 40GB in vGPU time-slicing mode, select nvidia_a100-40c.
For a deep learning VM that is running an NVIDIA RAG, in the Advanced Parameters tab of the virtual machine settings, set the pciPassthru<vgpu-id>.cfg.enable_uvm parameter to 1.

where <vgpu-id> identifies the vGPU assigned to the virtual machine. For example, if two vGPUs are assigned to the virtual machine, you set pciPassthru0.cfg.parameter=1 and pciPassthru1.cfg.parameter = 1.
Power on the deep learning VM.

Results

The vGPU guest driver and the specified deep learning workload is installed the first time you start the deep learning VM.

You can examine the logs or open the JupyterLab instance that comes with some of the images. You can share access details with data scientists in your organization. See Deep Learning Workloads in VMware Private AI Foundation with NVIDIA.

What to do next

Connect to the deep learning VM over SSH and verify that all components are installed and running as expected.
Send access details to your data scientists.