VMware vSphere Bitfusion 2.5.0 | 05 NOV 2020 | Build 10

What's in the Release Notes

The release notes cover the following topics:

About vSphere Bitfusion

VMware vSphere Bitfusion shares accelerators such as graphics processing units (GPUs) to provide a pool of shared network-accessible resources capable of supporting resource-intensive artificial intelligence (AI) and machine learning (ML) workloads. vSphere Bitfusion operates across AI frameworks, cloud sites, networks, and in environments such as virtual machines, containers, and notebooks.

What is New in vSphere Bitfusion 2.5.0

  • Support for Bare metal clients
  • Expanded health checks and usability improvements
  • Support for vSphere Bitfusion clients with version 2.0.0 and later
  • NVIDIA Driver 450
  • NVIDIA CUDA 11
  • Support for TensorFlow 2.3
  • Support for PyTorch 1.5
  • Support for TensorRT 7.1.3

System Requirements

For a list of system requirements for vSphere Bitfusion clients and servers, see the vSphere Bitfusion Installation Guide.

Open Source Components

The copyright statements and licenses applicable to the open source software components distributed in vSphere Bitfusion 2.5.0 are available at http://www.vmware.com. You can download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent available release of vSphere Bitfusion.

Resolved Issues

The resolved issues are grouped as follows.

    vSphere Bitfusion 2.0.2

    • When using Caffee, a framework for deep learning, an issue might occur

      Added support for applications that register segmentation fault (SIGSEGV) handlers.

    • When using paravirtual RDMA (PVRDMA), a health check issue might occur

      This issue is fixed in this release.

    • When using a vSphere Bitfusion client, a potential freeze or hang might occur

      This issue is fixed in this release.

    • When updating vSphere Bitfusion cluster statistics, a race condition might occur

      This issue is fixed in this release.

    vSphere Bitfusion 2.0.1

    • When using VMware vSphere version 7.0b and earlier, the license might be incorrectly detected

      This issue is fixed in this release.

    • Added support for NVIDIA drivers

      This release supports NVIDIA driver version 440.95.01.

    • Added support for multiple datacenters within a single vCenter Server instance

      This release supports multiple datacenters within the same vCenter Server instance.

    Known Issues

    The known issues are grouped as follows.

      GPU Issues

      • Virtual GPUs are not supported

        This release does not support virtual GPUs.

      • After you add multiple GPUs to a vSphere Bitfusion server virtual machine, the virtual machine cannot start

        When adding multiple GPUs to a vSphere Bitfusion server virtual machine, vCenter Server adds the first GPU multiple times. As a result, the virtual machine cannot start.

        Workaround: By using the vCenter vApp options editor, update the ID of the PCI bus for the additional GPUs with a unique value.

        1. Navigate to the virtual machine of the vSphere Bitfusion server.
        2. On the Configure tab, expand Settings and select vApp Options.
        3. Click the Edit button.
        4. From the PCI Device drop-down menu, select a unique ID for all additional GPUs.
      • When the total video memory of the GPUs used on a vSphere Bitfusion server is more than 128GB RAM, you cannot use GPU passthrough

        By default, the advanced virtual machine property for GPU passthrough pciPassthru.64bitMMIOSizeGB is set to 256. If using GPUs with a total video RAM greater than more than 128GB RAM on a single vSphere Bitfusion server, this configuration might cause passthrough failure.

        Workaround:

        1. Calculate a correct value for the pciPassthru.64bitMMIOSizeGB. Count the number of PCI devices, such as GPUs and network cards, that a vSphere Bitfusion server virtual machine uses, multiply the number by the GPU size in GB, and round up the value to the next power of two. For example, to use GPU passthrough with two 16 GB GPU devices, round up the value to 64 (2 * 16 = 32 * 2 = 64. For a single 16 GB GPU, use a value of 32.
        2. Modify the virtual machine property.
          1. Navigate to the virtual machine, select it, and power it off.
          2. With the virtual machine selected, select Actions > Edit Settings > VM Options > Advanced > Edit Configuration.  
          3. Search for pciPassthru.64bitMMIOSizeGB and set a new value.
          4. Power on the virtual machine.
      • A single GPU might appear multiple times in vCenter Server

        An NVIDIA T4 GPU might appear multiple times in vCenter Server.

        Workaround: In the BIOS settings of the ESXi host, enable SR-IOV support.

      • vSphere Bitfusion clients deleted from a vSphere Bitfusion cluster can still request GPUs ​

        After deleting a vSphere Bitfusion client version 2.0.2 and earlier by using the vSphere Bitfusion plug-in, the client can continue requesting GPUs from the vSphere Bitfusion servers.

        Workaround: Perform one of the following tasks.

        • In the virtual machine terminal, run the following commands.
          • vmtoolsd --cmd info-set guestinfo.bitfusion.client.accesstoken
          • rm ~/.bitfusion/client.yaml
        • By using the vSphere Bitfusion plug-in, revoke the token of the client.

      vSphere Bitfusion Server Issues

      • Cannot add new vSphere Bitfusion servers to a cluster if a vSphere Bitfusion server is offline

        If one vSphere Bitfusion server virtual machine in a cluster is offline, you cannot add another server to the cluster.

        Workaround: Perform one of the following tasks.

        • By using the vSphere Bitfusion plug-in, remove the server from the cluster.
        • By using the vSphere Client, set the guest OS environment variable guestinfo.bitfusion.server.cassandra-removenode on the server virtual machine.
        • In the terminal of a running vSphere Bitfusion server, run the bitfusion removenode command.
      • Cannot start a vSphere Bitfusion server virtual machine when using a GPU that is already assigned to a running vSphere Bitfusion server​

        Assigning a GPU to a Bitfusion server virtual machine, when the same GPU is already assigned to a running vSphere Bitfusion, prevents the new server virtual machine from starting.

      • Changing the time on a vSphere Bitfusion server might cause cluster failure

        If the server time changes or is not synchonized after a cluster is created, the cluster might fail.

        Workaround:  All vSphere Bitfusion servers in a cluster must be synchronized with the same time. Synchronize the time of all servers in the cluster and restart them.

      • Servers with different times might cause cluster failure

        When using DHCP to set up the IP addresses of a vSphere Bitfusion server and the DHCP server does not provide NTP server information, or when manually entering the IP addresses of the vSphere Bitfusion server, the cluster might fail due to the time difference between the servers. All servers must be synchronized with the same time.

        Workaround: In the configuration of the server, add the IP address of an NTP server.

      • Cannot join a vSphere Bitfusion server that is deployed by cloning a virtual machine to a cluster 

        After cloning the virtual machine of a vSphere Bitfusion server and deleting another cloned server virtual machine, you might be unable to join the newly-cloned virtual machine to the cluster.

      • After cloning a vSphere Bitfusion server virtual machine, cannot start the new virtual machine because of missing required fields

        During the clone operation of the server virtual machine in vCenter Server, none of the required field are marked as required in the wizard. As a result, the virtual machine might not be able to start.

        Workaround: During a the virtual machine clone operation, enter the following information.

        • Hostname
        • vCenter GUID
        • vCenter URL
        • vCenter User Name
        • vCenter Password (enter it twice)
      • Cannot start a cloned virtual machine after deleting the source virtual machine

        After a clone operation of a vSphere Bitfusion virtual machine, if the source virtual machine is deleted before the cloned virtual machine is powered on, the cloned virtual machine cannot start.

        Workaround: Power on the cloned virtual machine. Then, delete the source virtual machine.

      • The vSphere Bitfusion plug-in identifies activities originating from a cloned virtual machine of a vSphere Bitfusion client as originating from the source virtual machine ​

        After a clone operation of a vSphere Bitfusion client virtual machine, the vSphere Bitfusion plug-in identifies activities that are originating from both the source and cloned virtual machines as if only originating from the source virtual machine.

        Workaround: On the cloned vSphere Bitfusion client virtual machine, in /etc/hostname, change the hostname entry.

      Other Issues

      • After performing the first GPU request, the ID of a vSphere Bitfusion client changes

        When a client virtual machine with version 2.0.2 and earlier is enabled, the client ID appears in the vSphere Bitfusion plug-in. After the client requests GPUs for the first time, this ID changes.

      • Cannot configure the network adapters of a cloned vSphere Bitfusion server

        After a clone operation of a vSphere Bitfusion server virtual machine in vCenter Server, the configuration of the additional network adapters cannot be changed.

        Workaround: Perform one of the following tasks.

        • When creating the original virtual machine, enable the network interfaces that are needed for the cloned virtual machine.
        • By using the vCenter vApp options editor, change the values of the network settings.
          1. Navigate to the virtual machine of the vSphere Bitfusion server.
          2. On the Configure tab, expand Settings and select vApp Options.
          3. Click the Edit button.
        • Override the network adapter configuration by setting these guest OS environment variables to True or False.
          • guestinfo.bitfusion.host.net2.configure
          • guestinfo.bitfusion.host.net3.configure
          • guestinfo.bitfusion.host.net4.configure
      • Cannot attached more that one network interface to a network

        You can connect only one network interface to a particular network. 

        Workaround: To connect a Bitfusion server to multiple networks, use multiple network interfaces.

      • No support for Internet Protocol version 6

        IPv6 is not supported in this release.

      check-circle-line exclamation-circle-line close-line
      Scroll to top icon