There are several scenarios when the virtual machine of your vSphere Bitfusion server cannot start due to GPU related issues.

Problem

When you power on the virtual machine of your vSphere Bitfusion server, the virtual machine cannot start.

Cause

Typically, the following scenarios are observed during the installation process of a new vSphere Bitfusion server.
  • When you add multiple instances of the same GPU to a virtual machine of a vSphere Bitfusion server.
  • When the total memory of the GPUs used on a vSphere Bitfusion server is more than 128 GB.
  • When you use a GPU that is already assigned to another running VM.

Solution

  • If you add the same GPUs multiple times, vCenter Server adds the first GPU multiple times. You must manually update the ID of the PCI bus for the additional GPUs with a unique value.
    1. In the vSphere Client, right-click the virtual machine of the vSphere Bitfusion server and select Edit Settings.
    2. From each PCI Device drop-down menu, select a unique ID for the GPU.
  • If the total memory of the GPUs used on a single vSphere Bitfusion server is more than 128 GB, you must change the value of the pciPassthru.64bitMMIOSizeGB property, which is the advanced virtual machine property for GPU passthrough.
    1. Calculate a correct value for the property. Count the number of PCI devices, such as GPUs, that a vSphere Bitfusion server virtual machine uses, multiply the number by the GPU size in GB, and round up the value to the next power of two. For example, to use GPU passthrough with two 16 GB GPU devices, round up the value to 64 (2 * 16 = 32 * 2 = 64). For a single 16 GB GPU, use a value of 32.
    2. Modify the virtual machine property.
      1. In the vSphere Client, select the virtual machine of the vSphere Bitfusion server, and power it off.
      2. With the virtual machine selected, select Actions > Edit Settings > VM Options > Advanced > Edit Configuration.
      3. Search for pciPassthru.64bitMMIOSizeGB and set a new value.
      4. Power on the virtual machine.
  • If the GPU that you are assigning to a virtual machine of a vSphere Bitfusion server is already assigned to a running server, you must select a different GPU. You can pass through one GPU to one vSphere Bitfusion server.