To perform troubleshooting or maintenance on a vSphere Bitfusion server, you must remove the server from the vSphere Bitfusion cluster.

When powering off a vSphere Bitfusion server for maintenance or to perform troubleshooting, the health status of the vSphere Bitfusion cluster changes. When the cluster is not in a healthy state, you cannot add vSphere Bitfusion servers or perform a cluster backup operation. If half or more of the servers are powered off, the cluster is inoperable. When powering off a server for a longer period of time, you can prevent any potential risk by removing the server from the cluster.

Performing the following procedure immediately removes the server from the vSphere Bitfusion cluster. Any running applications that are using the GPUs receive an immediate GPU failure and usually return an error condition.

Prerequisites

  • Prevent new client connections to the specific server in the server settings.
  • Verify that there are no running applications on the server.

Procedure

  1. In the vSphere Client, select Menu (vSphere Client menu icon) > Bitfusion.
  2. On the Servers tab, select a server from the list.
  3. From the Actions drop-down menu, select Delete.
  4. In the confirmation dialog box, click Delete.
    The vSphere Bitfusion server is no longer listed on the Servers tab, but the delete operation can take up to 10 minutes and longer. During this time, the Apache Cassandra database is being updated.
  5. Verify that the delete operation is completed.
    1. Open a terminal application and run ssh customer@ip_address, where ip_address is the IP address of an active vSphere Bitfusion server .
      You can obtain the vSphere Bitfusion server IP address from the vSphere Bitfusion Plug-in.
    2. Run the nodetool status command.
    3. If the deleted vSphere Bitfusion server is still displayed in the server list, run again the nodetool status command until the command output does not display the deleted server.
  6. (Optional) Delete the server virtual machine (VM).
    Accidentally powering on the removed VM may result in the vSphere Bitfusion plug-in and cluster information being overwritten.

Results

You have deleted the selected server from the vSphere Bitfusion cluster.

What to do next

To reuse the VM or the underlying hardware, you can perform one of the following tasks.
  • If you deleted the server from the cluster without deleting the VM, delete the /etc/bitfusion/bitfusion-manager.yaml configuration file on the VM, reactivate the VM as a vSphere Bitfusion server, restart the vSphere Bitfusion service, and power on the VM. For more information, see Activating the vSphere Bitfusion Client in the Installing VMware vSphere Bitfusion and How to start and stop the vSphere Bitfusion service.
  • If you deleted the server VM, you can reuse the underlying hardware as a vSphere Bitfusion server by creating a VM and deploying the vSphere Bitfusion server appliance. See How to install subsequent vSphere Bitfusion servers.