VMware vSphere Bitfusion 4.0 | 17 AUG 2021 | Build 13

VMware vSphere Bitfusion 4.0.1 | 16 SEP 2021 | Build 5

Check for additions and updates to these release notes.

What's New

What's in the Release Notes

The release notes cover the following topics.

  • About vSphere Bitfusion
  • What is New in vSphere Bitfusion 4.0
  • System Requirements
  • Compatibility and Interoperability
  • Open Source Components
  • Resolved Issues
  • Known Issues

About vSphere Bitfusion

VMware vSphere Bitfusion shares accelerators such as graphics processing units (GPUs) to provide a pool of shared network-accessible resources capable of supporting resource-intensive artificial intelligence (AI) and machine learning (ML) workloads. vSphere Bitfusion operates across AI frameworks, cloud sites, networks, and in environments such as virtual machines, containers, and notebooks.

What is New in vSphere Bitfusion 4.0

  • Added improved scheduling to run workloads on specific sets of GPUs or vSphere Bitfusion servers.
  • Added data retention policy to specify a time period to store historical vSphere Bitfusion data.
  • Added monitoring plug-ins to determine the status of vSphere Bitfusion servers.
  • Client authentication tokens can be published from vSphere Bitfusion to Kubernetes Secrets.
  • vCenter Server Dark Theme is now supported by vSphere Bitfusion.
  • Support for Ubuntu 16.04 clients is removed as of vSphere Bitfusion 4.0.0.
  • Support for vSphere Bitfusion 2.x.x clients is deprecated and will be removed in an upcoming vSphere Bitfusion release.

System Requirements

For a list of system requirements for vSphere Bitfusion clients and servers, see the vSphere Bitfusion Installation Guide.

Compatibility and Interoperability

For a list of versions, models, and products that are compatible with vSphere Bitfusion, see the VMware vSphere Bitfusion Compatibility and Interoperability page.

Open Source Components

The copyright statements and licenses applicable to the open source software components distributed in vSphere Bitfusion 3.5 are available at http://www.vmware.com. You can download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent available release of vSphere Bitfusion.

Resolved Issues 4.0.1

  • vSphere Bitfusion server might report GPU API mismatch and GPU Xid errors

    When running the bitfusion localhealth command on a vSphere Bitfusion server as bitfusion user, the server might report GPU API mismatch and GPU Xid errors. This issue is fixed in this release.

  • vSphere Bitfusion client cannot connect to the vSphere Bitfusion servers

    When a vSphere Bitfusion client is enabled by using a client authentication token and the client uses a different subnetwork than the server, the client cannot connect to the vSphere Bitfusion servers. This issue is fixed in this release.

  • When you set the global health check settings of a vSphere Bitfusion server to default settings, the operation might fail

    This issue is fixed in this release.

  • When you activate a deactivated health check of a vSphere Bitfusion server, the operation might fail

    This issue is fixed in this release.

Resolved Issues 3.5

  • Cannot specify GPU memory when deploying a subsequent vSphere Bitfusion server

    This issue is fixed in this release. When using the vSphere Bitfusion plug-in to install a subsequent server, specifying the total GPU memory on the Select GPUs page recommends the memory and MMIO size of the virtual machine of the vSphere Bitfusion server.

Resolved Issues 3.0.1

  • When using the vSphere Bitfusion plug-in to install a subsequent server, the primary network can be a standard network only

    This issue is fixed in this release. The primary network can be a distributed port group with VMXNET3 or PVRDMA adapters.

  • CUDA 11.1 sample testing might cause failure in cuModuleGetGlobal_v2 module

    This issue is fixed in this release.

Known Issues: GPU Issues

  • Virtual GPUs are not supported

    This release does not support NVIDIA virtual GPU software and NVIDIA GRID virtual GPU technology.

  • A single GPU might appear multiple times in vCenter Server

    An NVIDIA T4 GPU might appear multiple times in vCenter Server.

    Workaround: In the BIOS settings of the ESXi host, enable SR-IOV support.

Known Issues: vSphere Bitfusion Server Issues

  • After a clone operation of a vSphere Bitfusion client virtual machine, the vSphere Bitfusion plug-in identifies activities that are originating from both the source and cloned virtual machines as if only originating from the source virtual machine.

    After a clone operation of a vSphere Bitfusion client virtual machine, the vSphere Bitfusion plug-in identifies activities that are originating from both the source and cloned virtual machines as if only originating from the source virtual machine.

    Workaround: On the cloned vSphere Bitfusion client virtual machine, in /etc/hostname, change the hostname entry.

  • Changing the time on a vSphere Bitfusion server might cause cluster failure

    If the server time changes or is not synchonized after a cluster is created, the cluster might fail.

    Workaround: All vSphere Bitfusion servers in a cluster must be synchronized with the same time. Synchronize the time of all servers in the cluster and restart them.

  • Servers with different times might cause cluster failure

    When using DHCP to set up the IP addresses of a vSphere Bitfusion server and the DHCP server does not provide NTP server information, or when manually entering the IP addresses of the vSphere Bitfusion server, the cluster might fail due to the time difference between the servers. All servers must be synchronized with the same time.

    Workaround: In the configuration of the server, add the IP address of an NTP server.

  • Cannot join a vSphere Bitfusion server that is deployed by cloning a virtual machine to a cluster

    After cloning the virtual machine of a vSphere Bitfusion server and deleting another cloned server virtual machine, you might be unable to join the newly-cloned virtual machine to the cluster.

  • Cannot add new vSphere Bitfusion servers to a cluster if a vSphere Bitfusion server is offline

    If one vSphere Bitfusion server virtual machine in a cluster is offline, you cannot add another server to the cluster.

    Workaround: Perform one of the following tasks.

    • By using the vSphere Bitfusion plug-in, remove the server from the cluster.
    • By using the vSphere Client, set the guest OS environment variable guestinfo.bitfusion.server.cassandra-removenode on the server virtual machine.
    • In the terminal of a running vSphere Bitfusion server, run the bitfusion removenode command.
  • After cloning a vSphere Bitfusion server virtual machine or installing a subsequent vSphere Bitfusion server, the new virtual machine cannot start because of missing or incomplete required fields

    During the clone operation of the server virtual machine in vCenter Server, none of the required fields are marked as required in the wizard. During the installation operation of a subsequent vSphere Bitfusion server in vCenter Server, the fields might be specified incorrectly. As a result, the virtual machine might not be able to start.

    Workaround: Perform one of the following tasks.

    • During the clone or installation operation, verify that all fields are specified correctly.
    • After the clone or installation operation has finished, by using the vCenter vApp Options editor, change the values of the fields. For a list of all properties, see vSphere Bitfusion vApp Properties.
      1. Navigate to the virtual machine of the vSphere Bitfusion server.
      2. On the Configure tab, expand Settings and select vApp Options.
      3. Select a property from the list and click the Set Value button.
  • Cannot start a cloned virtual machine after deleting the source virtual machine

    After a clone operation of a vSphere Bitfusion virtual machine, if the source virtual machine is deleted before the cloned virtual machine is powered on, the cloned virtual machine cannot start.

    Workaround: Power on the cloned virtual machine. Then, delete the source virtual machine.

Known Issues: Networking Issues

  • Installation procedure of a subsequent vSphere Bitfusion server might fail

    When deploying subsequent a vSphere Bitfusion server, the primary vSphere Bitfusion server connects to the subsequent ESXi host by using HTTPS APIs. If the management network interface of your vSphere Bitfusion server and the vmx0 interface of your ESXi host are using an MTU size of 9000 bytes, but your network does not support this MTU size between the two interfaces, the HTTPS connection might be aborted and the installation procedure might fail.

    Workaround:

    1. Determine the maximum MTU size that is supported between the two interfaces.
      1. To connect to the terminal of the vSphere Bitfusion server, run ssh customer@$server_ip.
      2. Run the following shell script.
        • target_host=(ESXi host IP or DNS name)
          size=1272
          while ping -s $size -M do -c1 $target_host >&/dev/null; do
          ((size+=4));
          done
          echo "Max MTU size: $((size-4+28))
    2. ​​Change the MTU size value for the vmx0 interface of your ESXi host to the maximum MTU size supported between the two interfaces.
  • Cannot configure the network adapters of a cloned vSphere Bitfusion server

    During a clone operation of a vSphere Bitfusion server virtual machine in vCenter Server, the configuration of the additional network adapters cannot be changed.

    Workaround: Perform one of the following tasks.

    • When creating the original virtual machine, enable the network interfaces that are needed for the cloned virtual machine.
    • By using the vCenter vApp options editor, change the values of the network settings. For a list of all properties, see vSphere Bitfusion vApp Properties.
      1. Navigate to the virtual machine of the vSphere Bitfusion server.
      2. On the Configure tab, expand Settings and select vApp Options.
      3. Select a property from the list and click the Set Value button.
  • Cannot attach more than one network interface to a network

    You can connect only one network interface to a particular network.

    Workaround: To connect a Bitfusion server to multiple networks, use multiple network interfaces.

  • No support for Internet Protocol version 6

    IPv6 is not supported in this release.

Known Issues: Other Issues

  • During deployment of the vSphere Bitfusion appliance, vCenter Server might report an invalid vSphere Bitfusion certificate warning

    When using the vSphere Bitfusion 4.0 appliance to install a primary vSphere Bitfusion server in vCenter Server 7.0.2, the Review Details page of the Deploy OVF Template dialog box might display a warning: Invalid certificate. The warning is invalid and the vSphere Bitfusion certificate is valid.

    Workaround: Ignore the warning and click Next to verify the OVF template details. This issue is fixed in an upcoming vCenter Server release.

  • The Match Global Defaults button on the Health logs dialog box might be deactivated

    After modifying the global health check settings for all vSphere Bitfusion servers on the Settings > Global Health Check Defaults tab and checking the health status of a vSphere Bitfusion server, the Match Global Defaults button on the Health logs dialog box might be deactivated. This is a JavaScript error.

    Workaround: Activate or deactivate a health check by clicking the toggle button and click Save.

  • Restore operation of a vSphere Bitfusion 3.0 cluster fails

    After restoring a multi-node vSphere Bitfusion cluster from backup, you might experience data loss and the global settings in the vSphere Bitfusion plug-in cannot be changed.

    Workaround: Restart all vSphere Bitfusion servers sequentially and wait 60 seconds after restarting each server.

  • Restore operation of a vSphere Bitfusion 4.0 cluster fails

    After restoring a multi-server vSphere Bitfusion 4.0 cluster from backup, you might experience an Apache Cassandra error.

    Workaround: By using the host IDs of your current primary and subsequent vSphere Bitfusion servers, deploy new server virtual machines with the vSphere Bitfusion 4.0 appliance, and restore the backup. For a detailed list of all required steps, see the server upgrade procedure in the vSphere Bitfusion Installation Guide.

  • Selecting an OVA file from a local machine might fail without a fast upload network

    When using the vSphere Bitfusion plug-in to install subsequent servers, selecting an OVA file from a local machine, might fail without a fast upload network. Typically, most browsers have 5 minutes timeout limit and the vSphere Bitfusion OVA file size is around 740 MB.

    Workaround: Select an OVA file from an URL.

  • Cannot download vSphere Bitfusion monitoring data​ for a specified time period

    When you click the Download CSV button on a tab in the vSphere Bitfusion plug-in, the specified time period is ignored and the downloaded file contains 2 days of data.

  • vSphere Bitfusion 3.5 and earlier clients that are installed on CentOS 7 and 8 might experience a library error

    vSphere Bitfusion 3.5 and earlier clients for CentOS 7 and 8 have a dependency on the libcapstone.so.3 library which is installed from the EPEL capstone RPM package. The capstone package currently contains the libcapstone.so.4 library only. After the vSphere Bitfusion client is installed, the client downloads and installs the latest package from EPEL, which contains the latest library and might cause an error message: error while loading shared libraries: libcapstone.so.3: cannot open shared object.

    Workaround: Perform one of the following tasks.

    • Update the vSphere Bitfusion servers and clients to version 4.0.0 or later.
    • Install an older version of the capstone package that contains the libcapstone.so.3 library.
  • Using a vSphere Bitfusion 2.5 and later license might cause an error in vCenter Server version 7.0.0 and earlier

    For vCenter Server version 7.0.0 or earlier, vSphere Bitfusion uses a string to determine the validity of the vSphere Bitfusion license. A result mismatch of the string might cause a licensing issue.

    Workaround: Upgrade vCenter Server to version 7.0.2 or later.

check-circle-line exclamation-circle-line close-line
Scroll to top icon