You can configure an instance with multiple vGPU profiles or create instances with different vGPU profiles in the same Nova Compute node.

There is limited support for configuring multiple vGPU profiles and the following limitations apply to vSphere and VMware Integrated OpenStack:
  • You cannot set profile_fb_size_kb to large integer such as 2097152 in Nova Compute CR.
  • You cannot power on virtual machines with different vgpu-profiles on the same ESXi at the same time. If there is only one graphic card in the ESXi host, you must poweroff the virtual machine with vgpu-profileA before you run a virtual machine with vgpu-profileB.
  • You must not power on an instance with two different vgpu-profiles.
  • To power on an instance with two same vgpu-profiles, you must have a minimum of two unengaged GPU graphic card and the graphic card must provide support only for a specific vgpu-profile. For example, for NVIDIA Tesla T4 graphic card, it can only create an instance with two grid_t4_16q profiles. There is no support for other vgpu-profiles.
  • For the ESXi configured with one graphic card of 16-G graphic RAM, it only supports running eight instances with grid_t4_2q profiles, or four instances with grid_t4_4q. The total graphic RAM cannot exceed more than the specified value.
  • VMware Integrated OpenStack cannot detect or specify which graphic card it can use for VM creation. In some cases, you must ensure that there is a graphic card available to boot instance with vgpu-profile that is different from the running instances. Admin can use validate_instance_vgpu, and list_vms_by_device command to get the information of the devices that can use ESXi. To free up the graphic card, you must shutt off instance from openstack.
  • For a default GPU profile setting in nova-compute.conf like gpu_profile: grid_t4-2q, you can boot instance with both the vmware:vgpu=1 and vmware:vgpu_profiles=grid_t4-2q. However, it is not a recommended option to boot instance in both ways within the same Nova Compute node.
  • There is no support for resize and migration.

Procedure

  1. For list supported vgpu-profiles, within the Nova Compute node, you can run the command in a specific Nova Compute pod.
    +--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    
    |  Host Name   |                                                                                                   Profile Names                                                                                                   |
    
    +--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    
    | 10.196.4.135 | [grid_t4-8q, grid_t4-8c, grid_t4-8a, grid_t4-4q, grid_t4-4c, grid_t4-4a, grid_t4-2q, grid_t4-2b4, grid_t4-2b, grid_t4-2a, grid_t4-1q, grid_t4-1b4, grid_t4-1b, grid_t4-1a, grid_t4-16q, grid_t4-16c, grid_t4-16a] |
    
    +--------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    You can also run the command in VMware Integrated OpenStack manager.
    osctl exec -it    <compute-pod-name> -- bash -c "export PYTHONWARNINGS=\"ignore:Unverified HTTPS request\" && nova-manage --config-file /tmp/share/nova-compute.conf --config-file /etc/nova/nova.conf --log-file /tmp/nova-manage.log graphic list_vgpus"
  2. Create VMs with different vgpu-profiles without reconfiguring Nova Compute.

    Ensure that there are two graphic cards in one ESXi host or there are two ESXi with one graphic card installed.

    1. Configure nova-compute.conf as described in NVIDIA GRID vGPU.
    2. Create a flavor, set property vmware:vgpu_profiles.
      openstack flavor create --public --disk 20 --ram 4096 --vcpus 2 --property "vmware:vgpu_profiles=grid_t4-4q" gpu_flavor_4q
    3. Boot an instance from gpu_flavor_4q.
      openstack server create --flavor gpu_flavor_4q --image Photo3 --network net1 gpu_inst1
    4. Create a flavor, set property vmware:vgpu_profiles with different profile.
      openstack flavor create --public --disk 20 --ram 4096 --vcpus 2 --property "vmware:vgpu_profiles=grid_t4-8a" gpu_flavor_8a
    5. Boot an instance from gpu_flavor_8a.
      openstack server create --flavor gpu_flavor_8a --image Photo3 --network net1 gpu_inst2
  3. Create an instance with two vgpu-profiles.
    1. Configure nova-compute.conf as described in NVIDIA GRID vGPU.
    2. Create a flavor, set property vmware:vgpu_profiles with two vgpu-profiles.
      openstack flavor create --public --disk 20 --ram 4096 --vcpus 2 --property "vmware:vgpu_profiles=grid_t4-16q,grid_t4-16q" gpu_flavor_2_profiles
    3. Boot an instance from gpu_flavor_2_profiles.
      openstack server create --flavor gpu_flavor_2_profiles --image Photo3 --network net1 gpu_inst3
  4. Poweroff some instances to free up graphic card space.

    When there are no unused graphic cards, you must poweroff some instances if you want to boot instances with different vgpu-profiles.

    1. Validate the fix for the unmatched resource providers.
      If you see the following logs, this means some resource providers do not match the GPU use in ESXi.
      osctl exec -it <compute-pod-name> -- bash -c "export PYTHONWARNINGS=\"ignore:Unverified HTTPS request\" && nova-manage --config-file /tmp/share/nova-compute.conf --config-file /etc/nova/nova.conf graphic validate_instance_vgpu --fix true
      Change the logs as follows:
      instance 718bb1c4-c0ec-40a9-9ebd-c9ab97e85c9d's graphic_str NVIDIATesla T4 (host_40478:0000:03 does not match provider NVIDIATesla T4 (host_40478:0000:82
      Fix the provider for instance 718bb1c4-c0ec-40a9-9ebd-c9ab97e85c9d
      move 9b2901c2-3646-5cef-9664-81c6483d5687 to 95e5f2c3-54ad-51f9-9002-66dc8b93ce5c
      
      To fix the issue, add --fix command.
      instance 9e7c44f9-129a-4264-ab40-d369e6b8dab9 matches
      instance ba155ebc-b59b-4b7a-80f8-4afbd98cd1aa matches
      instance 30c57985-56c9-43bb-96be-305eb5e4742e matches
      instance 609cbc58-8a34-4d52-962c-4acc2496994c matches
      Done
    2. List virtual machines with GPU device within the Nova Compute node. You can run this command in a specific Nova Compute pod.
      nova-manage --config-file /tmp/share/nova-compute.conf --config-file /etc/nova/nova.conf graphic list_vms_by_device
      
      +--------------+----------------+--------------+-------------------------------------------------------+---------------+
      | Host Name | Device Name | pciId | VM Name | Profile Names |
      +--------------+----------------+--------------+-------------------------------------------------------+---------------+
      | 10.196.4.135 | NVIDIATesla T4 | 0000:03:00.0 | inst_def-3 (718bb1c4-c0ec-40a9-9ebd-c9ab97e85c9d) | [grid_t4-2q] |
      | 10.196.4.135 | NVIDIATesla T4 | 0000:03:00.0 | inst_def-2 (9e7c44f9-129a-4264-ab40-d369e6b8dab9) | [grid_t4-2q] |
      | 10.196.4.135 | NVIDIATesla T4 | 0000:03:00.0 | inst_def-5 (ba155ebc-b59b-4b7a-80f8-4afbd98cd1aa) | [grid_t4-2q] |
      | 10.196.4.135 | NVIDIATesla T4 | 0000:82:00.0 | yingji-t4-4q-1 (7b6769b4-015f-4d3d-ad0e-ab2236cb9343) | [grid_t4-4q] |
      | 10.196.4.135 | NVIDIATesla T4 | 0000:82:00.0 | yingji-t4-4q-3 (8c853458-539f-4b08-b208-518c88f24e86) | [grid_t4-4q] |
      | 10.196.4.135 | NVIDIATesla T4 | 0000:82:00.0 | yingji-t4-4q-2 (8db11b02-45ac-4837-8e82-2ebe0b84bd08) | [grid_t4-4q] |
      +--------------+----------------+--------------+-------------------------------------------------------+---------------+
      
      For the instance with 2 grid_t4-16q profiles, the display looks like below.
      
      +--------------+----------------+--------------+---------+----------------------------+
      
      |  Host Name   |  Device Name   |    pciId     | VM Name |       Profile Names        |
      
      +--------------+----------------+--------------+---------+----------------------------+
      
      | 10.196.4.135 | NVIDIATesla T4 | 0000:03:00.0 |  vm-08  | [grid_t4-16q, grid_t4-16q] |
      
      | 10.196.4.135 | NVIDIATesla T4 | 0000:82:00.0 |  vm-08  | [grid_t4-16q, grid_t4-16q] |
      
      +--------------+----------------+--------------+---------+----------------------------+
      
      For the instance with two grid_t4-16q profiles, the display looks like the following:
      +--------------+----------------+--------------+---------+----------------------------+
      
      |  Host Name   |  Device Name   |    pciId     | VM Name |       Profile Names        |
      
      +--------------+----------------+--------------+---------+----------------------------+
      
      | 10.196.4.135 | NVIDIATesla T4 | 0000:03:00.0 |  vm-08  | [grid_t4-16q, grid_t4-16q] |
      
      | 10.196.4.135 | NVIDIATesla T4 | 0000:82:00.0 |  vm-08  | [grid_t4-16q, grid_t4-16q] |
      
      +--------------+----------------+--------------+---------+----------------------------+
      Run the command in VMware Integrated OpenStack manager.
      osctl exec -it <compute-pod-name> -- bash -c "export PYTHONWARNINGS=\"ignore:Unverified HTTPS request\" && nova-manage --config-file /tmp/share/nova-compute.conf --config-file /etc/nova/nova.conf --log-file /tmp/nova-manage.log graphic list_vms_by_device"
    3. From the output, we can see that there are two graphics card with address (0000:03:00.0 and 0000:82:00.0) on host 10.196.4.135. The two profile grids used are grid_t4-2q and grid_t4-4q. So, there is no graphic card available for instance with other profiles. To boot instance with other vgpu-profile, you must free up some graphic card. To free device with 0000:03:00.0 on 10.196.4.135, you must poweroff instances inst_def-3, inst_def-2, and inst_def-5. You must use OpenStack command in toolbox.
      Note: Do not poweroff the virtual machines from vCenter web console.
      openstack server stop inst_def-3 inst_def-2 inst_def-5