You can use vMotion to perform a live migration of NVIDIA vGPU-powered virtual machines without causing downtime or data loss.
In vSphere 6.7 Update 1 and later, vGPU vMotion is supported for vGPU profiles of up to 12 GB of frame buffer. The 12GB frame buffer limit represents a single vGPU device attached to the VM, regardless of the GPU model or vGPU profile. Attempts to migrate VMs with vGPU frame buffers exceeding this limit might exceed the 100 second timeout for vSphere vMotion stun time, resulting in the migration process failing due to timeout.
While the migration is in progress, you will be unable to access the VM, desktop, or application. Once the migration is completed, access to the VM will resume and all applications will continue from their previous state. If the migration fails, the VM remains on the source host. To preserve the application (and GPU) state during the cold-migration of VMs with a vGPU frame buffer over 12 GB, the VM should be suspended, cold migrated, and resumed on a compatible destination host. For information on frame buffer size in vGPU profiles, refer to the NVIDIA Virtual GPU documentation.
The expected VM stun times (the time when the VM is inaccessible to users during vMotion) are listed in the following table. These stun times were tested over a 10Gb network with NVIDIA Tesla P40 GPUs :
|Used vGPU Frame Buffer (GB)||VM Stun Time (sec)|
|16||100+ (vMotion timeout)|
|24||100+ (vMotion timeout)|
VMware vSphere vMotion is supported only with and between compatible NVIDIA GPU device models and NVIDIA GRID host driver versions as defined and supported by NVIDIA. For compatibility information, refer to the NVIDIA Virtual GPU User Guide.
To check compatibility between NVIDIA vGPU host drivers, vSphere, and Horizon, refer to the VMware Compatibility Matrix.