Configure vGPU-Based VM Classes for AI Workloads for VMware Private AI Foundation with NVIDIA

In these VM classes, as a cloud administrator, you set the compute requirements and a vGPU profile for an NVIDIA GRID vGPU device according to the vGPU devices configured on the ESXi hosts in the Supervisor cluster.

Note: This documentation is based on VMware Cloud Foundation 5.2.1. For information on the VMware Private AI Foundation with NVIDIA functionality in VMware Cloud Foundation 5.2, see VMware Private AI Foundation with NVIDIA Guide for VMware Cloud Foundation 5.2.

Prerequisites

Verify that VMware Private AI Foundation with NVIDIA is configured up to this step of the deployment workflow. See Preparing VMware Cloud Foundation for Private AI Workload Deployment.

Procedure

For a VMware Cloud Foundation 5.2.1 instance, log in to the vCenter Server instance for the management domain at https://<vcenter_server_fqdn>/ui.
In the vSphere Client side panel, click Private AI Foundation.
In the Private AI Foundation workflow, click the Set Up a Workload Domain section.

Create the VM classes with NVIDIA vGPUs.

The wizard in the guided deployment workflow has the same options as the analogous wizard in the Workload Management area of the vSphere Client.

For information about setting up vGPU-based VM classes for virtual machines, see Create a Custom VM Class Using the vSphere Client and Add PCI Devices to a VM Class in vSphere with Tanzu.
For information about setting up vGPU-based VM classes for TKG worker nodes, see Create a Custom VM Class with a vGPU Profile in vSphere 8 Update 2b and later and Configuring vSphere Namespaces for TKG Clusters on Supervisor.

Set the following additional settings in the VM class dialog box according to the contents of the deep learning VM.


Use Case	VM Class Additional Settings
Deep learning VMs with NVIDIA RAG workloads	Select the full-sized vGPU profile for time-slicing mode or a MIG profile. For example, for NVIDIA A100 40GB card in vGPU time-slicing mode, select nvidia_a100-40c. On the Virtual Hardware tab, allocate more than 16 virtual CPU cores and 64 GB of virtual memory. On the Advanced Parameters tab, set the `pciPassthru<vgpu-id>.cfg.enable_uvm` parameter to `1`. where `<vgpu-id>` identifies the vGPU assigned to the virtual machine. For example, if two vGPUs are assigned to the virtual machine, you set `pciPassthru0.cfg.parameter=1` and `pciPassthru1.cfg.parameter = 1`. Important: This configuration turns off vSphere vMotion migration for the deep learning VM.
Deep learning VMs using Triton Inference Server with the TensorRT backend	On the Advanced Parameters tab, set the `pciPassthru<vgpu-id>.cfg.enable_uvm` parameter to `1`. where `<vgpu-id>` identifies the vGPU assigned to the virtual machine. For example, if two vGPUs are assigned to the virtual machine, you set `pciPassthru0.cfg.parameter=1` and `pciPassthru1.cfg.parameter = 1`. Important: This configuration turns off vSphere vMotion migration for the deep learning VM.