VMware vSphere Bitfusion 3.5 | 11 MAY 2021 | Build 5

What's in the Release Notes

The release notes cover the following topics.

About vSphere Bitfusion

VMware vSphere Bitfusion shares accelerators such as graphics processing units (GPUs) to provide a pool of shared network-accessible resources capable of supporting resource-intensive artificial intelligence (AI) and machine learning (ML) workloads. vSphere Bitfusion operates across AI frameworks, cloud sites, networks, and in environments such as virtual machines, containers, and notebooks.

What is New in vSphere Bitfusion 3.5

  • Added support for NVIDIA CUDA 11.2.2
  • Added support for NVIDIA cuDNN 8.1.1
  • Added support for NVIDIA Collective Communications Library (NCCL) 2.8.4
  • The network performance test tools for PVRDMA, such as ib_read_bw, ib_read_lat, ib_send_bw, ib_send_lat, ib_write_bw, ib_write_lat, are now pre-installed in the vSphere Bitfusion OVA file.

System Requirements

For a list of system requirements for vSphere Bitfusion clients and servers, see the vSphere Bitfusion Installation Guide.

Compatibility and Interoperability

For a list of versions, models, and products that are compatible with vSphere Bitfusion, see the VMware vSphere Bitfusion Compatibility and Interoperability page.

Open Source Components

The copyright statements and licenses applicable to the open source software components distributed in vSphere Bitfusion 3.5 are available at http://www.vmware.com. You can download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent available release of vSphere Bitfusion.

Resolved Issues

The resolved issues are grouped as follows.

VMware vSphere Bitfusion 3.5
  • Cannot specify GPU memory when deploying a subsequent vSphere Bitfusion server

    This issue is fixed in this release. When using the vSphere Bitfusion plug-in to install a subsequent server, specifying the total GPU memory on the Select GPUs page recommends the memory and MMIO size of the virtual machine of the vSphere Bitfusion server.

VMware vSphere Bitfusion 3.0.1
  • When using the vSphere Bitfusion plug-in to install a subsequent server, the primary network can be a standard network only

    This issue is fixed in this release. The primary network can be a distributed port group with VMXNET3 or PVRDMA adapters.

  • CUDA 11.1 sample testing might cause failure in cuModuleGetGlobal_v2 module

    This issue is fixed in this release.

VMware vSphere Bitfusion 3.0
  • Bitfusion server will respond to ping. Earlier versions blocked ping requests at the local firewall. 

    This issue is fixed in this release.

Known Issues

The known issues are grouped as follows.

GPU Issues
  • Virtual GPUs are not supported

    This release does not support NVIDIA virtual GPU software and NVIDIA GRID virtual GPU technology.

  • A single GPU might appear multiple times in vCenter Server

    An NVIDIA T4 GPU might appear multiple times in vCenter Server.

    Workaround: In the BIOS settings of the ESXi host, enable SR-IOV support.

vSphere Bitfusion Server Issues
  • The vSphere Bitfusion plug-in identifies activities originating from a cloned virtual machine of a vSphere Bitfusion client as originating from the source virtual machine ​

    After a clone operation of a vSphere Bitfusion client virtual machine, the vSphere Bitfusion plug-in identifies activities that are originating from both the source and cloned virtual machines as if only originating from the source virtual machine.

    Workaround: On the cloned vSphere Bitfusion client virtual machine, in /etc/hostname, change the hostname entry.

  • Changing the time on a vSphere Bitfusion server might cause cluster failure

    If the server time changes or is not synchonized after a cluster is created, the cluster might fail.

    Workaround:  All vSphere Bitfusion servers in a cluster must be synchronized with the same time. Synchronize the time of all servers in the cluster and restart them.

  • Servers with different times might cause cluster failure

    When using DHCP to set up the IP addresses of a vSphere Bitfusion server and the DHCP server does not provide NTP server information, or when manually entering the IP addresses of the vSphere Bitfusion server, the cluster might fail due to the time difference between the servers. All servers must be synchronized with the same time.

    Workaround: In the configuration of the server, add the IP address of an NTP server.

  • Cannot join a vSphere Bitfusion server that is deployed by cloning a virtual machine to a cluster 

    After cloning the virtual machine of a vSphere Bitfusion server and deleting another cloned server virtual machine, you might be unable to join the newly-cloned virtual machine to the cluster.

  • Cannot add new vSphere Bitfusion servers to a cluster if a vSphere Bitfusion server is offline

    If one vSphere Bitfusion server virtual machine in a cluster is offline, you cannot add another server to the cluster.

    Workaround: Perform one of the following tasks.

    • By using the vSphere Bitfusion plug-in, remove the server from the cluster.
    • By using the vSphere Client, set the guest OS environment variable guestinfo.bitfusion.server.cassandra-removenode on the server virtual machine.
    • In the terminal of a running vSphere Bitfusion server, run the bitfusion removenode command.
  • After cloning a vSphere Bitfusion server virtual machine or installing a subsequent vSphere Bitfusion server, the new virtual machine cannot start because of missing or incomplete required fields

    During the clone operation of the server virtual machine in vCenter Server, none of the required fields are marked as required in the wizard. During the installation operation of a subsequent vSphere Bitfusion server in vCenter Server, the fields might be specified incorrectly. As a result, the virtual machine might not be able to start.

    Workaround: Perform one of the following tasks.

    • During the clone or installation operation, verify that all fields are specified correctly.
    • After the clone or installation operation has finished, by using the vCenter vApp Options editor, change the values of the fields. For a list of all properties, see vSphere Bitfusion vApp Properties.
      1. Navigate to the virtual machine of the vSphere Bitfusion server.
      2. On the Configure tab, expand Settings and select vApp Options.
      3. Select a property from the list and click the Set Value button.
  • Cannot start a cloned virtual machine after deleting the source virtual machine

    After a clone operation of a vSphere Bitfusion virtual machine, if the source virtual machine is deleted before the cloned virtual machine is powered on, the cloned virtual machine cannot start.

    Workaround: Power on the cloned virtual machine. Then, delete the source virtual machine.

Networking Issues
  • Installation procedure of a subsequent vSphere Bitfusion server might fail 

    When deploying subsequent a vSphere Bitfusion server, the primary vSphere Bitfusion server connects to the subsequent ESXi host by using HTTPS APIs. If the management network interface of your vSphere Bitfusion server and the vmx0 interface of your ESXi host are using an MTU size of 9000 bytes, but your network does not support this MTU size between the two interfaces, the HTTPS connection might be aborted and the installation procedure might fail.

    Workaround:

    1. Determine the maximum MTU size that is supported between the two interfaces.
      1. To connect to the terminal of the vSphere Bitfusion server, run ssh customer@$server_ip.
      2. Run the following shell script.
        • target_host=(ESXi host IP or DNS name)
          size=1272

          while ping -s $size -M do -c1 $target_host >&/dev/nulldo
              ((size+=4));
          done

          echo "Max MTU size: $((size-4+28))
    2. ​​Change the MTU size value for the vmx0 interface of your ESXi host to the maximum MTU size supported between the two interfaces.
  • Cannot configure the network adapters of a cloned vSphere Bitfusion server

    During a clone operation of a vSphere Bitfusion server virtual machine in vCenter Server, the configuration of the additional network adapters cannot be changed.

    Workaround: Perform one of the following tasks.

    • When creating the original virtual machine, enable the network interfaces that are needed for the cloned virtual machine.
    • By using the vCenter vApp options editor, change the values of the network settings. For a list of all properties, see vSphere Bitfusion vApp Properties.
      1. Navigate to the virtual machine of the vSphere Bitfusion server.
      2. On the Configure tab, expand Settings and select vApp Options.
      3. Select a property from the list and click the Set Value button.
  • Cannot attach more than one network interface to a network

    You can connect only one network interface to a particular network. 

    Workaround: To connect a Bitfusion server to multiple networks, use multiple network interfaces.

  • No support for Internet Protocol version 6

    IPv6 is not supported in this release.

Other Issues
  • Restore operation of a vSphere Bitfusion cluster fails

    After restoring a vSphere Bitfusion cluster from backup, you might experience data loss and the global settings in the vSphere Bitfusion plug-in cannot be changed.

    Workaround: Restart all vSphere Bitfusion servers sequentially and wait 60 seconds after restarting each server.

  • Selecting an OVA file from a local machine might fail without a fast upload network

    When using the vSphere Bitfusion plug-in to install subsequent servers, selecting an OVA file from a local machine, might fail without a fast upload network. Typically, most browsers have 5 minutes timeout limit and the vSphere Bitfusion OVA file size is around 740 MB. 

    Workaround: Select an OVA file from an URL.

  • Cannot download vSphere Bitfusion monitoring data​ for a specified time period

    When you click the Download CSV button on a tab in the vSphere Bitfusion plug-in, the specified time period is ignored and the downloaded file contains 2 days of data.

check-circle-line exclamation-circle-line close-line
Scroll to top icon