You can use vMotion to perform a live migration of NVIDIA vGPU-powered virtual machines without causing data loss.
To enable vMotion for vGPU virtual machines, you need to set the
vgpu.hotmigrate.enabled advanced setting to
true. For more information about how to configure the vCenter Server advanced settings, see Configure Advanced Settings.
In vSphere 6.7 Update 1 and vSphere 6.7 Update 2, when you migrate vGPU virtual machines with vMotion and vMotion stun time exceeds 100 seconds, the migration process might fail for vGPU profiles with 24 GB frame buffer size or larger. To avoid the vMotion timeout, upgrade to vSphere 6.7 Update 3 or later.
During the stun time, you are unable to access the VM, desktop, or application. Once the migration is completed, access to the VM resumes and all applications continue from their previous state. For information on frame buffer size in vGPU profiles, refer to the NVIDIA Virtual GPU documentation.
The expected VM stun times (the time when the VM is inaccessible to users during vMotion) are listed in the following table. These stun times were tested over a 10Gb network with NVIDIA Tesla V100 PCIe 32 GB GPUs:
|Used vGPU Frame Buffer (GB)||VM Stun Time (sec)|
DRS supports initial placement of vGPU VMs running vSphere 6.7 Update 1 and later without load balancing support.
VMware vSphere vMotion is supported only with and between compatible NVIDIA GPU device models and NVIDIA GRID host driver versions as defined and supported by NVIDIA. For compatibility information, refer to the NVIDIA Virtual GPU User Guide.
To check compatibility between NVIDIA vGPU host drivers, vSphere, and Horizon, refer to the VMware Compatibility Matrix.