If ESXi hosts in your vSphere IaaS control plane environment have one or more NVIDIA GRID GPU graphics devices, you can configure VMs to use the NVIDIA GRID virtual GPU (vGPU) technology. You can also configure other PCI devices on an ESXi host to make them available to a VM in a passthrough mode.

Deploying a VM with vGPU in vSphere IaaS Control Plane

NVIDIA GRID GPU graphics devices are designed to optimize complex graphics operations and enable them to run at high performance without overloading the CPU. NVIDIA GRID vGPU provides unparalleled graphics performance, cost-effectiveness, and scalability by sharing a single physical GPU among multiple VMs as separate vGPU-enabled passthrough devices.

Considerations

The following considerations apply when you use NVIDIA vGPU:
  • Three-zone Supervisor does not support VMs with vGPU.
  • VMs with vGPU devices that are managed by VM Service are automatically powered off when an ESXi host enters maintenance mode. This might temporarily affect workloads running in the VMs. The VMs are automatically powered on after the host exists the maintenance mode.
  • DRS distributes vGPU VMs in a breadth-first manner across cluster's hosts. For more information, see DRS Placement of vGPU VMs in the vSphere Resource Management guide.

Requirements

To configure NVIDIA vGPU, follow these requirements:

  • Verify that the ESXi is supported in the VMware Compatibility Guide, and check with the vendor to verify the host meets power and configuration requirements.
  • Configure ESXi host graphics settings with at least one device in Shared Direct mode. See Configuring Host Graphics in the vSphere Resource Management documentation.
  • The content library you use for VMs with vGPU devices must include images with the boot mode set to EFI, such as CentOS.
  • Install NVIDIA vGPU software. NVIDIA provides a vGPU software package that includes the following components.

    For more information, see appropriate NVIDIA Virtual GPU Software documentation.

Add a vGPU Device to a VM Class Using the vSphere Client

Create or edit an existing VM class to add an NVIDIA GRID virtual GPU (vGPU).

Prerequisites

Required privileges:
  • Namespaces.Modify cluster-wide configuration
  • Namespaces.Modify namespace configuration
  • Virtual Machine Classes.Manage Virtual Machine Classes

Procedure

  1. Create or edit an existing VM class.
    Option Action
    Create a new VM class
    1. From the vSphere Client home menu, select Workload Management.
    2. Click the Services tab and click Manage on the VM Service pane.
    3. On the VM Service page, click VM Classes and click Create VM Class.
    4. Follow the prompts.
    Edit a VM class
    1. From the vSphere Client home menu, select Workload Management.
    2. Click the Services tab and click Manage on the VM Service pane.
    3. On the VM Service page, click VM Classes.
    4. In the existing VM class pane, click Manage and click Edit.
    5. Follow the prompts.
  2. On the Configuration page, click the Virtual Hardwaretab, click Add New Device, and select PCI Device.
    PCI Device option in the Add New Device menu
  3. From the list of available devices on the Device Selection page, select the NVIDIA GRID vGPU and click Select.
    The device appears on the Virtual Hardware page.
  4. Click the Advanced Parameters tab and set the parameters with the following attributes and values.
    Option Description
    Parameter Value
    pciPassthru0.cfg.enable_uvm 1
    pciPassthru1.cfg.enable_uvm 1
    Adding attributes and values for advanced parameters
  5. Review your configuration and click Finish.

Results

A PCI Devices tag on the VM class pane indicates that the VM class is vGPU-enabled.

PCI Devices tag on the VM Class pane

Add a vGPU Device to a VM Class Using Data Center CLI

In addition to the vSphere Client, you can use the Data Center CLI (DCLI) command to add vGPUs and advanced configurations.

For more information about DCLI commands, see Create and Manage VM Classes Using the Data Center CLI.

Procedure

  1. Log in to vCenter Server using the root user account and type dcli +i to use the DCLI in interactive mode.
  2. Run the following command to create a VM class.
    In the following example, the my-class VM class includes two CPUs, 2048 of memory, and a VirtualMachineConfigSpec with two sample vGPUs profiles, mockup-vmiop-8c and mockup-vmiop. The extraConfig fields pciPassthru0.cfg.enable_uvm and pciPassthru1.cfg.enable_uvm are set to 1.
    dcli +i +show-unreleased com vmware vcenter namespacemanagement virtualmachineclasses create --id my-class --cpu-count 2 --memory-mb 2048 --config-spec '{"_typeName":"VirtualMachineConfigSpec","deviceChange":[{"_typeName":"VirtualDeviceConfigSpec","operation":"add","device":{"_typeName":"VirtualPCIPassthrough","key":20,"backing":{"_typeName":"VirtualPCIPassthroughVmiopBackingInfo","vgpu":"mockup-vmiop-8c"}}},{"_typeName":"VirtualDeviceConfigSpec","operation":"add","device":{"_typeName":"VirtualPCIPassthrough","key":20,"backing":{"_typeName":"VirtualPCIPassthroughVmiopBackingInfo","vgpu":"mockup-vmiop"}}}],"extraConfig":[{"_typeName":"OptionValue","key":"pciPassthru0.cfg.enable_uvm","value":{"_typeName":"string","_value":"1"}},{"_typeName":"OptionValue","key":"pciPassthru1.cfg.enable_uvm","value":{"_typeName":"string","_value":"1"}}]}'
    

Install the NVIDIA Guest Driver in a VM in vSphere IaaS Control Plane

If the VM includes a PCI device configured for vGPU, after you create and boot the VM in your vSphere IaaS control plane environment, install the NVIDIA vGPU graphics driver to fully enable GPU operations.

Prerequisites

  • Deploy the VM with vGPU. Make sure that the VM YAML file references the VM class with vGPU definition. See Deploy a Virtual Machine in vSphere IaaS Control Plane.
  • Verify that you downloaded the vGPU software package from the NVIDIA download site, uncompressed the package, and have the guest drive component ready. For information, see appropriate NVIDIA Virtual GPU Software documentation.
    Note: The version of the driver component must correspond to the version of the vGPU Manager that a vSphere administrator installed on the ESXi host.

Procedure

  1. Copy the NVIDIA vGPU software Linux driver package, for example NVIDIA-Linux-x86_64-version-grid.run, to the guest VM.
  2. Before attempting to run the driver installer, terminate all applications.
  3. Start the NVIDIA vGPU driver installer.
    sudo ./NVIDIA-Linux-x86_64-version-grid.run
  4. Accept the NVIDIA software license agreement and select Yes to update the X configuration settings automatically.
  5. Verify that the driver has been installed.
    For example,
    ~$ nvidia-smi
    Wed May 19 22:15:04 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 460.63       Driver Version: 460.63       CUDA Version: 11.2     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GRID V100-4Q        On   | 00000000:02:00.0 Off |                  N/A|
    | N/AN/AP0    N/A/  N/A|    304MiB /  4096MiB |      0%      Default |
    |                               |                      |                  N/A|
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+

Deploying a VM with PCI Devices in vSphere IaaS Control Plane

In addition to vGPU, you can configure other PCI devices on an ESXi host to make them available to a VM in a passthrough mode.

vSphere IaaS control plane supports Dynamic DirectPath I/O devices. Using Dynamic DirectPath I/O, the VM can directly access the physical PCI and PCIe devices connected to a host. You can use Dynamic DirectPath I/O to assign multiple PCI passthrough devices to a VM. Each passthrough device can be specified by its PCI vendor and device identifier.
Note: When configuring Dynamic DirectPath I/O for PCI passthrough devices, connect the PCI devices to the host and mark them as available for passthrough. See Enable Passthrough for a Network Device on a Host in the vSphere Networking documentation.