Enable NVIDIA GRID vGPU for Instant-Clone Pools

You can configure NVIDIA GRID vGPU in ESXi hosts and in the golden image in vSphere Client.

The ESXi host assigns GPU hardware resources to virtual machines on a first-come, first-served basis as virtual machines are created. By default, the ESXi host assigns virtual machines to the physical GPU with the fewest virtual machines already assigned. This is the best performance mode. If you would rather have the ESXi host assign virtual machines to the same physical GPU until the maximum number of virtual machines is reached before placing virtual machines on the next physical GPU, you can use the GPU consolidation mode. You can configure this mode in vCenter Server for each ESXi host that has vGPU installed. For more information, see the VMware Knowledge Base (KB) article https://kb.vmware.com/s/article/55049.

If you are only using a single vGPU profile per vSphere cluster, set the GPU assignment policy for all GPU hosts within the cluster to the best performance mode in order to maximize performance. In this case, you can also have instant-clone pools and full-clone pools that use the same vGPU profile in the same vSphere cluster.

You can have a cluster with some GPU enabled hosts and some non-GPU enabled hosts.

Note: vMotion of vGPU Virtual Machines

vMotion of vGPU Virtual Machines is supported starting with vSphere 6.7. See here for details on how to configure this and more information.
vSphere Distributed Resource Scheduler (DRS) in vSphere 6.7 Update 1 and later supports initial placement of vGPU VMs without load balancing support.
DRS in vSphere 6.7 or vSphere 7.0 versions earlier than vSphere 7.0 U3f will not automatically vMotion vGPU VMs when ESXi hosts are placed in maintenance mode. An administrator is required to manually initiate vMotion of vGPU VMs in order to allow ESXi hosts to enter maintenance mode.
DRS in vSphere 7.0 U3f and later can be configured to allow automatic vMotion when hosts are placed in maintenance mode. See https://kb.vmware.com/s/article/88271 for instructions. DRS load balancing remains unsupported for vGPU VMs.

NVIDIA GRID vGPU has these potential constraints:

RDP is not supported.
The virtual machines must be hardware version 11 or later.
Horizon 8 does support creating a vGPU instant-clone pool using a cluster with some vGPU enabled hosts and non-vGPU enabled hosts, and will just ignore the non-vGPU enabled hosts when creating the pool. You can not use vMotion to move an instant-clone from a GPU-enabled ESXi host to an ESXi host that does not have GPU hardware configured.

To enable an instant-clone pool to use NVIDIA GRID vGPU:

Procedure

Install NVIDIA GRID vGPU in the physical ESXi hosts.
In vCenter Server hardware graphics configuration, select the Host Graphics tab, and in Edit Host Graphics Settings, select Shared Direct.
ESXi host uses the NVIDIA GRID card for vGPU.
Prepare a golden image with NVIDIA GRID vGPU configured, including selecting the vGPU profile you want to use.
Take a snapshot of the golden image.
In Horizon Console, when you create an instant-clone pool, select this golden image and snapshot.

Results

Horizon 8 automatically displays NVIDIA GRID vGPU in the 3D Render field. Horizon 8 also displays the vGPU profile you chose in the golden image. Instant clones inherit the settings configured in the vSphere Client for the golden image.

The vGPU profile cannot be edited from Horizon Console during the instant-clone pool creation process, To edit the vGPU profile for a pool once the pool has been created, you can create a new image with the updated vGPU profile, take a snapshot, and then do a push-image operation. See Patching an Instant-Clone Desktop Pool.