Deploy a Deep Learning VM Directly on a vSphere Cluster in VMware Private AI Foundation with NVIDIA

To quickly give data scientists the opportunity to test the deep learning VM templates in VMware Private AI Foundation with NVIDIA, as a cloud administrator, you can deploy a deep learning VM directly on a vSphere cluster by using the vSphere Client.

For information about deep learning VM images in VMware Private AI Foundation with NVIDIA, see About Deep Learning VM Images in VMware Private AI Foundation with NVIDIA.

Deploying a deep learning VM with NVIDIA RAG requires a vector database. You can use a PostgreSQL database with pgvector in VMware Data Services Manager. For information about deploying such a database and integrating it in a deep learning VM, see Deploy a Deep Learning VM with a RAG Workload.

Prerequisites

Verify that VMware Private AI Foundation with NVIDIA is deployed and configured. See Preparing VMware Cloud Foundation for Private AI Workload Deployment.

Procedure

Log in to the vCenter Server instance for the VI workload domain.
From the vSphere Client home menu, select Content Libraries.
Navigate to the deep learning VM image in the content library.
Right-click an OVF template and select New VM from This Template.
On the Select name and folder page of wizard that appears, enter a name and select a VM folder, select Customize this virtual machine's hardware, and click Next.
Select a GPU-enabled cluster in the VI workload domain, select if the virtual machine must be powered-on after deployment is complete, and click Next.
Follow the wizard to select a datastore and a network on the distributed switch for the cluster.
On the Customize template page, enter the custom VM properties that are required for setting up the AI functionality and click Next.
See OVF Properties of Deep Learning VMs.
On the Customize hardware page, assign an NVIDIA vGPU device to the virtual machine as a New PCI Device and click Next.

For a deep learning VM that is running an NVIDIA RAG, select the full-sized vGPU profile for time-slicing mode or a MIG profile. For example, for NVIDIA A100 40GB in vGPU time-slicing mode, select nvidia_a100-40c.
For a deep learning VM that is running an NVIDIA RAG or using Triton Inference Server with the TensorRT backend, in the Advanced Parameters tab of the virtual machine settings, set the pciPassthru<vgpu-id>.cfg.enable_uvm parameter to 1.

where <vgpu-id> identifies the vGPU assigned to the virtual machine. For example, if two vGPUs are assigned to the virtual machine, you set pciPassthru0.cfg.parameter=1 and pciPassthru1.cfg.parameter = 1.

Important: This configuration turns off vSphere vMotion migration for the deep learning VM.
Review the deployment specification and click Finish.

Results

The vGPU guest driver and the specified deep learning workload is installed the first time you start the deep learning VM.

You can examine the logs or open the JupyterLab instance that comes with some of the images. You can share access details with data scientists in your organization. See Deep Learning Workloads in VMware Private AI Foundation with NVIDIA.

What to do next

Connect to the deep learning VM over SSH and verify that all components are installed and running as expected.
Send access details to your data scientists.