Using vMotion to Migrate vGPU Virtual Machines

You can use vMotion to perform a live migration of NVIDIA vGPU-powered virtual machines without causing data loss.

To enable vMotion for vGPU virtual machines, you need to set the vgpu.hotmigrate.enabled advanced setting to true. For more information about how to configure the vCenter Server advanced settings, see Configure Advanced Settings.

In vSphere 6.7 Update 1 and vSphere 6.7 Update 2, when you migrate vGPU virtual machines with vMotion and vMotion stun time exceeds 100 seconds, the migration process might fail for vGPU profiles with 24 GB frame buffer size or larger. To avoid the vMotion timeout, upgrade to vSphere 6.7 Update 3 or later.

During the stun time, you are unable to access the VM, desktop, or application. Once the migration is completed, access to the VM resumes and all applications continue from their previous state. For information on frame buffer size in vGPU profiles, refer to the NVIDIA Virtual GPU documentation.

The expected VM stun times (the time when the VM is inaccessible to users during vMotion) are listed in the following table. These stun times were tested over a 10Gb network with NVIDIA Tesla V100 PCIe 32 GB GPUs:

Table 1. Expected Stun Times for vMotion of vGPU VMs
Used vGPU Frame Buffer (GB)	VM Stun Time (sec)
1	2.17
2	4.03
4	7.02
8	14.48
16	27.79
32	50.81

Note: The configured vGPU profile represents an upper bound to the used vGPU frame buffer. In many VDI/Graphics use cases, the amount of vGPU frame buffer memory used by the VM at any given time is below the assigned vGPU memory in the profile. Treat these times as worst case stun times for cases when the entire assigned vGPU memory is being used at the time of the migration. For example, a V100-32Q vGPU profile allocates 32 GB of vGPU frame buffer to the VM, but the VM can use any amount between 0-32 GB of frame buffer during the migration. As a result, the stun time can end up being between less than 1 second to 50.81 seconds.

DRS supports initial placement of vGPU VMs running vSphere 6.7 Update 1 and later without load balancing support.

VMware vSphere vMotion is supported only with and between compatible NVIDIA GPU device models and NVIDIA GRID host driver versions as defined and supported by NVIDIA. For compatibility information, refer to the NVIDIA Virtual GPU User Guide.

To check compatibility between NVIDIA vGPU host drivers, vSphere, and Horizon, refer to the VMware Compatibility Matrix.

Related tasks