VMware vSphere Bitfusion 4.5 | 23 NOV 2021 | Build 4

VMware vSphere Bitfusion 4.5.1 | 27 JAN 2022 | Build 9

VMware vSphere Bitfusion 4.5.2 | 23 JUN 2022 | Build 16

VMware vSphere Bitfusion 4.5.3 | 21 FEB 2023 | Build 4

VMware vSphere Bitfusion 4.5.4 | 09 MAY 2023 | Build 6

What is in the Release Notes

The release notes cover the following topics.

About vSphere Bitfusion

VMware vSphere Bitfusion shares accelerators such as graphics processing units (GPUs) to provide a pool of shared network-accessible resources capable of supporting resource-intensive artificial intelligence (AI) and machine learning (ML) workloads. vSphere Bitfusion operates across AI frameworks, cloud sites, networks, and in environments such as virtual machines, containers, and notebooks.

What is New in 4.5.4

  • Added daily nodetool repair scans to fix any possible issues with the Apache Cassandra database. 

What is New in 4.5.3

  • vSphere Bitfusion is supported by VMware vSphere 8.0.

  • Expiration date of vSphere Bitfusion clients can be extended by using the vSphere Bitfusion user interface.

  • Added support for Red Hat Enterprise Linux 9.0 and later minor versions.

  • Added support for Rocky Linux 8.

  • Added support for Rocky Linux 9.

  • Added support for NVIDIA Driver 525.85.12.

  • Added support for NVIDIA CUDA 11.5 and 11.5.2.

  • Added framework support for Pytorch 1.9 and 1.10

  • Added hardware support for NVIDIA A40 48GB PCIE.

  • Added hardware support for NVIDIA L40 48GB PCIE.

  • Added hardware support for NVIDIA A30 24GB PCIE.

  • Added hardware support for NVIDIA A10 24GB PCIE.

  • Added hardware support for NVIDIA A2 16GB PCIE.

What is New in 4.5.2

  • vSphere Bitfusion clients can display their current GPU allocation and utilization in the vSphere Bitfusion command-line interface.

  • vSphere Bitfusion clients can be labeled when running the run or request commands, which allows better identification of the clients in the vSphere Bitfusion user interface.

  • vSphere Bitfusion server certificates can be renewed by using the vSphere Bitfusion command-line interface.

  • vSphere Bitfusion client certificates can be renewed by using the vSphere Bitfusion user interface.

  • Added support for Ubuntu Linux 22.04.

  • Added support for SUSE Linux Enterprise Server 15.3.

  • Added support for Red Hat Enterprise Linux 7.9 and later minor versions.

  • Added support for Red Hat Enterprise Linux 8.5 and later minor versions.

  • Added support for NVIDIA Driver 470.129.06.

  • Added support for NVIDIA CUDA 11.3 and 11.4.4.

  • Added support for NVIDIA cuDNN 8.2.4.

  • Added support for PyTorch 1.2 - 1.8.

  • Added support for Tensorflow 1.15,2.2, 2.3, 2.4, and 2.6.

  • Added support for TensorRT 7.1.3, 7.2.3, and 8.0.3.

  • Added support for PaddlePaddle 2.0.0, 2.2.2, and 2.3.0.

  • Removed support for CentOS 8.

What is New in 4.5

  • vSphere Bitfusion displays information for the memory utilization and core utilization of a vSphere Bitfusion cluster.

  • Support for vSphere Bitfusion 2.x.x clients is removed in vSphere Bitfusion 4.5.0.

System Requirements

For a list of system requirements for vSphere Bitfusion clients and servers, see the vSphere Bitfusion Installation Guide.

Compatibility and Interoperability

For a list of versions, models, and products that are compatible with vSphere Bitfusion, see the VMware Interoperability Matrix.

Lifecycle

For a list of supported vSphere Bitfusion versions and their lifecycle, see the VMware Product Lifecycle Matrix.

Open Source Components

The copyright statements and licenses applicable to the open source software components distributed in vSphere Bitfusion 4.5 are available at http://www.vmware.com. You can download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent available release of vSphere Bitfusion.

Resolved Issues

Resolved Issues in 4.5.4

  • When installing vSphere Bitfusion 4.5.3 and earlier on vCenter Server instances that do not have an vsphere.local domain defined, the installation operation might fail

    During the deployment of the primary vSphere Bitfusion server, a service account is installed in the vsphere.local domain of the vCenter Server instance and if the domain does not exist, the installation operation might fail. This issue is fixed in this release. vSphere Bitfusion 4.5.4 installs the service account by using the same domain that is specified for the vCenter Server user name.

  • When the virtual machine of a vSphere Bitfusion server is running out of disk space, the diskspace health check might not report a Marginal or Fatal status

    This issue is fixed in this release. vSphere Bitfusion reports a Marginal status when less than 15 GB disk space is available and a Fatal status when less than 10 GB disk space is available. 

  • After performing an MTU check on a vSphere Bitfusion server, you might receive a warning message for the management network interface

    After running a health check, you might receive the following warning message: Check MTU Size: 10000Mbps interface net1 has low MTU: 1500 < 4096. While vSphere Bitfusion operates more efficiently with an MTU size of 9000 bytes for network interfaces that are configured for data traffic, the network interface which is used for management traffic (net1) requires an MTU size of 1500 bytes. This issue is fixed in this release.

  • After running a health check on a vSphere Bitfusion server, you might receive a warning message about network stability

    When vSphere Bitfusion detects that a network interface has dropped packets, you might receive a similar warning message: Check Network Errors/Drops: drops reported in file: /sys/class/net/net1/statistics/rx_dropped. This issue is fixed in this release.

Resolved Issues in 4.5.3

  • After the login credentials of a vCenter Server user change, the vSphere Bitfusion plug-in might not start

    During the deployment of vSphere Bitfusion, the login credentials of a vCenter Server user are required for verification of product licensing. When the credentials are later modified, the vSphere Bitfusion plug-in might not start. This issue is fixed in this release. vSphere Bitfusion uses the login credentials to generate a non-expiring service account, which is later used to authenticate the vSphere Bitfusion plug-in.

Resolved Issues in 4.5.2

  • A vCenter Server user with multiple user roles cannot access the user interface of vSphere Bitfusion

    vSphere Bitfusion verifies only the first user role that is assigned to a vCenter Server user. A login attempt of a user that has multiple user roles assigned in vCenter Server, might result in a 401 token error and the user interface of vSphere Bitfusion might not be accessible although the user role has the privilege.Bitfusion.Management.label privilege. This issue is fixed in this release.

  • The network performance command of vSphere Bitfusion displays the result in GB/s

    When running the bitfusion net_perf command, the network performace is displayed in Gigabytes per second (GB/s). This issue is fixed in this release and the network speeds are displayed in Gigabits per second (Gb/s), which is the standard measurement unit of network performance.

  • Cannot attach more than one network interface to the same network

    Configuring a vSphere Bitfusion server with multiple network interfaces, where more than one interface is attached to the same network, may result in broken network routing tables and network interfaces with no network routes. This issue is fixed in this release.

  • The vSphere Bitfusion plug-in might display a blank page in the user interface

    When using the vSphere Bitfusion interface, you might observe a blank iframe, which is a result of a Javascript error. This issue is fixed in this release.

  • After creating or renewing a client authentication token, the creation of a new token is not available

    The creation of a new authentication token might not be possible after you created or renewed a token due to a front end issue. This issues is fixed in this release.

  • When running the vSphere Bitfusion plug-in for the first time, you might experience a session error

    After the primary vSphere Bitfusion is deployed and you start the vSphere Bitfusion plug-in, as a result of a session error, your internet browser might not be able to display the user interface of vSphere Bitfusion. The issue might be encountered intermittently. This issue is fixed in this release.

Resolved Issues in 4.5.1

  • When during the installation process of a vSphere Bitfusion server the first network is specified to use a single distributed virtual port group (DVPG) network interface, the network is not created on the virtual machine

    This issue is fixed in this release.

  • Specifying GPU quota in the settings of vSphere Bitfusion is not working

    Specifying the GPU quota in the Global Client Default settings does not have effect when you request GPUs from a vSphere Bitfusion server. This issue is fixed in this release.

  • Running vSphere Bitfusion client commands as user, might result in an error message

    When running client commands in vSphere Bitfusion as user, you might receive the following error message: Error: open /etc/bitfusion/tls/ca.crt: permission denied.Detail: Error: Missing credentials file. Please configure this Bitfusion client and then try again. The error message appears because the ca.crt certificate is not part of the vSphere Bitfusion Linux user group, bitfusion. This issue is fixed in this release.

  • When creating or editing a client authentication token, the operation might fail

    When you create or edit a token to activate a vSphere Bitfusion client on a Kubernetes pod, the operation might fail due to an API error. The API returns an error message, as the Kubernetes Secrets of the namespace is already linked to vSphere Bitfusion, but the data is not saved in the vSphere Bitfusion database. This issue is fixed in this release.

  • Cannot download vSphere Bitfusion monitoring data​ for a specified time period

    When you click the Download CSV button on a tab in the vSphere Bitfusion plug-in, the specified time period is ignored and the downloaded file contains 2 days of data. This issue is solved in this release.

Resolved Issues in 4.5

  • vSphere Bitfusion might stop working after an upgrade of vCenter Server

    After an upgrade of vCenter Server to version 7.0.2, vSphere Bitfusion might display an “Invalid Bitfusion License” error message and stop working. This issue is fixed in this release.

  • vSphere Bitfusion client might disconnect from a vSphere Bitfusion server after running for a long period of time

    When the TCP Keepalive settings are not configured correctly, the connection between a vSphere Bitfusion server and client might be interrupted. This issue is fixed in this release.

  • After cloning a vSphere Bitfusion server virtual machine or installing a subsequent vSphere Bitfusion server, the new virtual machine cannot start because of missing or incomplete required fields

    During the clone operation of the server virtual machine in vCenter Server, none of the required fields are marked as required in the wizard. During the installation operation of a subsequent vSphere Bitfusion server in vCenter Server, the fields might be specified incorrectly. As a result, the virtual machine might not be able to start. This issue is fixed in vCenter Server 7.0.3.

  • Cannot join a vSphere Bitfusion server that is deployed by cloning a virtual machine to a cluster

    After cloning the virtual machine of a vSphere Bitfusion server and deleting another cloned server virtual machine, you might be unable to join the newly-cloned virtual machine to the cluster. This issue is fixed in this release.

  • Cannot start a cloned virtual machine after deleting the source virtual machine

    After a clone operation of a vSphere Bitfusion virtual machine, if the source virtual machine is deleted before the cloned virtual machine is powered on, the cloned virtual machine cannot start. This issue is fixed in this release.

Known Issues

GPU Issues

  • Virtual GPUs are not supported

    This release does not support NVIDIA virtual GPU software and NVIDIA GRID virtual GPU technology.

vSphere Bitfusion Server Issues

  • When auditing a vSphere Bitfusion server, some security tools might display a vulnerability warning about insecure MAC algorithms being used for SSH

    The SSH data integrity and authenticity might be validated by using the following MAC algorithms: [email protected], [email protected], [email protected], or hmac-sha1. These algorithms are considered to be weak and have been removed from the list of available MAC algorithms that vSphere Bitfusion servers with version 4.5.3 and later support.

    Workaround: Upgrade your vSphere Bitfusion servers to version 4.5.3 or later.

  • Changing the time on a vSphere Bitfusion server might cause cluster failure

    If the server time changes or is not synchonized after a cluster is created, the cluster might fail.

    Workaround: All vSphere Bitfusion servers in a cluster must be synchronized with the same time. Synchronize the time of all servers in the cluster and restart them.

  • After a clone operation of a vSphere Bitfusion client virtual machine, the vSphere Bitfusion plug-in identifies activities that are originating from both the source and cloned virtual machines as if only originating from the source virtual machine

    After a clone operation of a vSphere Bitfusion client virtual machine, the vSphere Bitfusion plug-in identifies activities that are originating from both the source and cloned virtual machines as if only originating from the source virtual machine.

    Workaround: On the cloned vSphere Bitfusion client virtual machine, in /etc/hostname, change the hostname entry.

  • Cannot add new vSphere Bitfusion servers to a cluster if a vSphere Bitfusion server is offline

    If one vSphere Bitfusion server virtual machine in a cluster is offline, you cannot add another server to the cluster.

    Workaround: Perform one of the following tasks.

    • By using the vSphere Bitfusion plug-in, remove the server from the cluster.

    • By using the vSphere Client, set the guest OS environment variable guestinfo.bitfusion.server.cassandra-removenode on the server virtual machine.

    • In the terminal of a running vSphere Bitfusion server, run the bitfusion removenode command.

  • Servers with different times might cause cluster failure

    When using DHCP to set up the IP addresses of a vSphere Bitfusion server and the DHCP server does not provide NTP server information, or when manually entering the IP addresses of the vSphere Bitfusion server, the cluster might fail due to the time difference between the servers. All servers must be synchronized with the same time.

    Workaround: By using the vCenter vApp options editor, add one or more IP addresses of NTP servers.

    1. Navigate to the virtual machine of the vSphere Bitfusion server.

    2. Power off the virtual machine.

    3. On the Configure tab, expand Settings and select vApp Options.

    4. Under Properties, select the guestinfo.bitfusion.host.net1.ntp property from the list and click the Set Value button.

    5. In the Set value dialog box, add one or more IP addresses of NTP servers.

      You can separate multiple addresses with a space character.

    6. Click OK.

    7. Power on the virtual machine.

  • vSphere Bitfusion server logs might contain warning messages about low disk space

    Apache Cassandra warns if the available disk space of a vSphere Bitfusion server is less than 64 GB, for example Only 42.645GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots. The disk space requirement is hard-coded and cannot be modified in the configuration.

    Workaround: Increase the disk size of the vSphere Bitfusion server to 75 GB or higher.

Networking Issues

  • When using an Inception V3 module with vSphere Bitfusion on a PVRDMA network, you might experience intermittent software crashes

    The number of simultaneous network connections that are allowed on a PVRDMA network is limited by the number of vCPUs that are available on the virtual machine of your vSphere Bitfusion server or client. Occasionally, the Inception module might open more network connections than your PVRDMA network can handle, which results in a software crash.

    Workaround: None.

  • During a restart of a vSphere Bitfusion server that uses DHCP, a certificate error might be displayed

    The IP addresses of a vSphere Bitfusion server are used as Subject Alternative Name (SAN) certificate extensions for the X.509 certificate, which is used by vSphere Bitfusion for authorization. When a vSphere Bitfusion server that uses DHCP restarts, the server might receive new IP addresses, which might result in certificate authorization failure and the server might not start.

    Workaround: Perform one of the following tasks.

    • Upgrade your vSphere Bitfusion server to version 4.5.3.

    • Define static IP addresses for the vSphere Bitfusion server.

    • Install a new authorization certificate.

  • No support for Internet Protocol version 6

    IPv6 is not supported in this release.

  • Cannot configure the network adapters of a cloned vSphere Bitfusion server

    During a clone operation of a vSphere Bitfusion server virtual machine in vCenter Server, the configuration of the additional network adapters cannot be changed.

    Workaround: Perform one of the following tasks.

    • When creating the original virtual machine, enable the network interfaces that are needed for the cloned virtual machine.

    • By using the vCenter vApp options editor, change the values of the network settings. For a list of all properties, see vSphere Bitfusion vApp Properties.

      1. Navigate to the virtual machine of the vSphere Bitfusion server.

      2. On the Configure tab, expand Settings and select vApp Options.

      3. Select a property from the list and click the Set Value button.

  • Installation procedure of a subsequent vSphere Bitfusion server might fail

    When deploying subsequent a vSphere Bitfusion server, the primary vSphere Bitfusion server connects to the subsequent ESXi host by using HTTPS APIs. If the management network interface of your vSphere Bitfusion server and the vmx0 interface of your ESXi host are using an MTU size of 9000 bytes, but your network does not support this MTU size between the two interfaces, the HTTPS connection might be aborted and the installation procedure might fail.

    Workaround:

    1. Determine the maximum MTU size that is supported between the two interfaces.

      1. To connect to the terminal of the vSphere Bitfusion server, run ssh customer@$server_ip.

      2. Run the following shell script.

        • target_host=(ESXi host IP or DNS name)
          size=1272
          while ping -s $size -M do -c1 $target_host >&/dev/null; do
          ((size+=4));
          done
          echo "Max MTU size: $((size-4+28))
    2. ​​Change the MTU size value for the vmx0 interface of your ESXi host to the maximum MTU size supported between the two interfaces.

Backup and Restore Issues

  • Restore operation on a vSphere Bitfusion cluster that consists of two or more servers might fail

    The restore operation might fail with an Apache Cassandra error due to an intermittent issue with the database.

    Workaround: Perform the restore operation on a vSphere Bitfusion cluster with a single server, then create the subsequent servers. For more information, see Upgrading vSphere Bitfusion.

    1. Install a new primary vSphere Bitfusion server.

      1. During the deployment process, enter the same hostname as your old primary vSphere Bitfusion server uses.

      2. In the settings of the new VM, add the same number of GPUs as your old primary vSphere Bitfusion server uses.

      3. In the advanced settings of the new VM, add a guestinfo.bitfusion.server.host-id configuration parameter. The parameter value must match the host ID of your old primary server, that is listed in the manifest.json file.

    2. Restore the backup of your old vSphere Bitfusion cluster to your new cluster.

    3. Install new subsequent vSphere Bitfusion servers.

      1. During the deployment process, enter the hostnames and host IDs that are listed in the manifest.json for the old corresponding vSphere Bitfusion servers.

      2. In the settings of the new VMs, add the same number of GPUs as the old corresponding vSphere Bitfusion servers use.

      3. In the settings of the new VMs, add a guestinfo.bitfusion.server.host-id configuration parameter. The parameter values must match the host IDs of the old corresponding servers, that are listed in the manifest.json file.

  • When restoring a backup from a vSphere Bitfusion cluster that is currently online to a new cluster, both clusters might fail

    During the restore operation, vSphere Bitfusion creates host IDs for the servers in the new cluster that are identical to the IDs of the servers that are online, which results into conflicts, when both clusters communicate.

    Workaround: After the original cluster is offline, perform the restore operation on the new cluster.

  • Restore operation of a vSphere Bitfusion 4.5 cluster fails

    After the restore operation completes, the vSphere Bitfusion server might not start, because the vSphere Bitfusion service is not restarted.

    Workaround: Restart the vSphere Bitfusion service.

    1. Open a terminal application and run ssh customer@ip_address, where ip_address is the IP address of your vSphere Bitfusion server.

      You can obtain the server IP address from the vSphere Bitfusion plug-in.

    2. Enter the customer password that you specified during the deployment of the vSphere Bitfusion server.

    3. Restart the service by running the sudo systemctl restart bitfusion command.

  • When restoring a backup from a vSphere Bitfusion 4.0.1 and earlier cluster, the restore operation might fail

    When restoring a backup from vSphere Bitfusion 4.0.1 and earlier to a vSphere Bitfusion 4.5 and later cluster, the restore operation might fail with an error message: summary error: failed to restore one or more tables: failed to restore one or more table snapshots. This issue occurs due to an internal update of the Apache Cassandra database to version 4.0.

    Workaround: Before starting the restore operation, in the Apache Cassandra configuration, set the enable_legacy_ssl_storage_port parameter to true.

  • After a restore operation, the vSphere Bitfusion server logs might display error messages

    After a restore operation, you might observe error messages in the server logs related to "snapshotting time series". The error messages might appear, because the vSphere Bitfusion services restart after the restore operation and the previous sessions are not closed properly.

    Workaround: Ignore the error messages.

  • Restore operation of a vSphere Bitfusion 3.0 cluster fails

    After restoring a multi-node vSphere Bitfusion cluster from backup, you might experience data loss and the global settings in the vSphere Bitfusion plug-in cannot be changed.

    Workaround: Restart all vSphere Bitfusion servers sequentially and wait 60 seconds after restarting each server.

Other Issues

  • During deployment of the vSphere Bitfusion appliance, vCenter Server might report an invalid vSphere Bitfusion certificate warning

    When using the vSphere Bitfusion 4.0 appliance to install a primary vSphere Bitfusion server in vCenter Server 7.0.2 and 7.0.3, the Review Details page of the Deploy OVF Template dialog box might display a warning: Invalid certificate. The warning is invalid and the vSphere Bitfusion certificate is valid.

    Workaround: Ignore the warning and click Next to verify the OVF template details. This issue is fixed in an upcoming vCenter Server release.

  • The Match Global Defaults button on the Health logs dialog box might be deactivated

    After modifying the global health check settings for all vSphere Bitfusion servers on the Settings > Global Health Check Defaults tab and checking the health status of a vSphere Bitfusion server, the Match Global Defaults button on the Health logs dialog box might be deactivated. This is a JavaScript error.

    Workaround: Activate or deactivate a health check by clicking the toggle button and click Save.

  • Selecting an OVA file from a local machine might fail without a fast upload network

    When using the vSphere Bitfusion plug-in to install subsequent servers, selecting an OVA file from a local machine, might fail without a fast upload network. Typically, most browsers have 5 minutes timeout limit and the vSphere Bitfusion OVA file size is around 740 MB.

    Workaround: Select an OVA file from an URL.

  • vSphere Bitfusion 3.5 and earlier clients that are installed on CentOS 7 and 8 might experience a library error

    vSphere Bitfusion 3.5 and earlier clients for CentOS 7 and 8 have a dependency on the libcapstone.so.3 library which is installed from the EPEL capstone RPM package. The capstone package currently contains the libcapstone.so.4 library only. After the vSphere Bitfusion client is installed, the client downloads and installs the latest package from EPEL, which contains the latest library and might cause an error message: error while loading shared libraries: libcapstone.so.3: cannot open shared object.

    Workaround: Perform one of the following tasks.

    • Update the vSphere Bitfusion servers and clients to version 4.0.0 or later.

    • Install an older version of the capstone package that contains the libcapstone.so.3 library.

  • Using a vSphere Bitfusion 2.5 and later license might cause an error in vCenter Server version 7.0.0 and earlier

    For vCenter Server version 7.0.0 or earlier, vSphere Bitfusion uses a string to determine the validity of the vSphere Bitfusion license. A result mismatch of the string might cause a licensing issue.

    Workaround: Upgrade vCenter Server to version 7.0.2 or later.

check-circle-line exclamation-circle-line close-line
Scroll to top icon