Deploying a VM with vGPU and Other PCI Devices in vSphere IaaS Control Plane

If ESXi hosts in your vSphere IaaS control plane environment have one or more NVIDIA GRID GPU graphics devices, you can configure VMs to use the NVIDIA GRID virtual GPU (vGPU) technology. You can also configure other PCI devices on an ESXi host to make them available to a VM in a passthrough mode.

Deploying a VM with vGPU in vSphere IaaS Control Plane

NVIDIA GRID GPU graphics devices are designed to optimize complex graphics operations and enable them to run at high performance without overloading the CPU. NVIDIA GRID vGPU provides unparalleled graphics performance, cost-effectiveness, and scalability by sharing a single physical GPU among multiple VMs as separate vGPU-enabled passthrough devices.

Considerations

The following considerations apply when you use NVIDIA vGPU:

Three-zone Supervisor does not support VMs with vGPU.
VMs with vGPU devices that are managed by VM Service are automatically powered off when an ESXi host enters maintenance mode. This might temporarily affect workloads running in the VMs. The VMs are automatically powered on after the host exists the maintenance mode.
DRS distributes vGPU VMs in a breadth-first manner across cluster's hosts. For more information, see DRS Placement of vGPU VMs in the vSphere Resource Management guide.

Requirements

To configure NVIDIA vGPU, follow these requirements:

Verify that the ESXi is supported in the VMware Compatibility Guide, and check with the vendor to verify the host meets power and configuration requirements.
Configure ESXi host graphics settings with at least one device in Shared Direct mode. See Configuring Host Graphics in the vSphere Resource Management documentation.
The content library you use for VMs with vGPU devices must include images with the boot mode set to EFI, such as CentOS.
Install NVIDIA vGPU software. NVIDIA provides a vGPU software package that includes the following components.
For more information, see appropriate NVIDIA Virtual GPU Software documentation.
- vGPU Manager that a vSphere administrator installs on the ESXi host. See VMware Knowledge Base article 2033434.
- Guest VM driver that a DevOps engineer installs in the VM after deploying and booting the VM. See Install the NVIDIA Guest Driver in a VM in vSphere IaaS Control Plane.

Add a vGPU Device to a VM Class Using the vSphere Client

Create or edit an existing VM class to add an NVIDIA GRID virtual GPU (vGPU).

Prerequisites

Required privileges:

Namespaces.Modify cluster-wide configuration
Namespaces.Modify namespace configuration
Virtual Machine Classes.Manage Virtual Machine Classes

Procedure

Create or edit an existing VM class.

Option	Action
Create a new VM class	From the vSphere Client home menu, select Workload Management. Click the Services tab and click Manage on the VM Service pane. On the VM Service page, click VM Classes and click Create VM Class. Follow the prompts.
Edit a VM class	From the vSphere Client home menu, select Workload Management. Click the Services tab and click Manage on the VM Service pane. On the VM Service page, click VM Classes. In the existing VM class pane, click Manage and click Edit. Follow the prompts.

On the Configuration page, click the Virtual Hardwaretab, click Add New Device, and select PCI Device.
From the list of available devices on the Device Selection page, select the NVIDIA GRID vGPU and click Select.
The device appears on the Virtual Hardware page.

Click the Advanced Parameters tab and set the parameters with the following attributes and values.

Option	Description
Parameter	Value
pciPassthru0.cfg.enable_uvm	1
pciPassthru1.cfg.enable_uvm	1

Adding attributes and values for advanced parameters

Review your configuration and click Finish.

Results

A PCI Devices tag on the VM class pane indicates that the VM class is vGPU-enabled.

PCI Devices tag on the VM Class pane

Add a vGPU Device to a VM Class Using Data Center CLI

In addition to the vSphere Client, you can use the Data Center CLI (DCLI) command to add vGPUs and advanced configurations.

For more information about DCLI commands, see Create and Manage VM Classes Using the Data Center CLI.

Procedure

Run the following command to create a VM class.

In the following example, the my-class VM class includes two CPUs, 2048 of memory, and a VirtualMachineConfigSpec with two sample vGPUs profiles, mockup-vmiop-8c and mockup-vmiop. The extraConfig fields pciPassthru0.cfg.enable_uvm and pciPassthru1.cfg.enable_uvm are set to 1.

dcli +i +show-unreleased com vmware vcenter namespacemanagement virtualmachineclasses create --id my-class --cpu-count 2 --memory-mb 2048 --config-spec '{"_typeName":"VirtualMachineConfigSpec","deviceChange":[{"_typeName":"VirtualDeviceConfigSpec","operation":"add","device":{"_typeName":"VirtualPCIPassthrough","key":20,"backing":{"_typeName":"VirtualPCIPassthroughVmiopBackingInfo","vgpu":"mockup-vmiop-8c"}}},{"_typeName":"VirtualDeviceConfigSpec","operation":"add","device":{"_typeName":"VirtualPCIPassthrough","key":20,"backing":{"_typeName":"VirtualPCIPassthroughVmiopBackingInfo","vgpu":"mockup-vmiop"}}}],"extraConfig":[{"_typeName":"OptionValue","key":"pciPassthru0.cfg.enable_uvm","value":{"_typeName":"string","_value":"1"}},{"_typeName":"OptionValue","key":"pciPassthru1.cfg.enable_uvm","value":{"_typeName":"string","_value":"1"}}]}'

Install the NVIDIA Guest Driver in a VM in vSphere IaaS Control Plane

If the VM includes a PCI device configured for vGPU, after you create and boot the VM in your vSphere IaaS control plane environment, install the NVIDIA vGPU graphics driver to fully enable GPU operations.

Prerequisites

Deploy the VM with vGPU. Make sure that the VM YAML file references the VM class with vGPU definition. See Deploy a Virtual Machine in vSphere IaaS Control Plane.
Verify that you downloaded the vGPU software package from the NVIDIA download site, uncompressed the package, and have the guest drive component ready. For information, see appropriate NVIDIA Virtual GPU Software documentation.
Note: The version of the driver component must correspond to the version of the vGPU Manager that a vSphere administrator installed on the ESXi host.

Procedure

Copy the NVIDIA vGPU software Linux driver package, for example NVIDIA-Linux-x86_64-version-grid.run, to the guest VM.
Before attempting to run the driver installer, terminate all applications.

Start the NVIDIA vGPU driver installer.

sudo ./NVIDIA-Linux-x86_64-version-grid.run

Accept the NVIDIA software license agreement and select Yes to update the X configuration settings automatically.

Verify that the driver has been installed.

For example,

~$ nvidia-smi
Wed May 19 22:15:04 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.63       Driver Version: 460.63       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100-4Q        On   | 00000000:02:00.0 Off |                  N/A|
| N/AN/AP0    N/A/  N/A|    304MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A|
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Deploying a VM with PCI Devices in vSphere IaaS Control Plane

In addition to vGPU, you can configure other PCI devices on an ESXi host to make them available to a VM in a passthrough mode.

vSphere IaaS control plane supports Dynamic DirectPath I/O devices. Using Dynamic DirectPath I/O, the VM can directly access the physical PCI and PCIe devices connected to a host. You can use Dynamic DirectPath I/O to assign multiple PCI passthrough devices to a VM. Each passthrough device can be specified by its PCI vendor and device identifier.

Note: When configuring Dynamic DirectPath I/O for PCI passthrough devices, connect the PCI devices to the host and mark them as available for passthrough. See Enable Passthrough for a Network Device on a Host in the vSphere Networking documentation.