To perform troubleshooting or maintenance on a vSphere Bitfusion server, you must remove the server from the vSphere Bitfusion cluster.
When powering off a vSphere Bitfusion server for maintenance or to perform troubleshooting, the health status of the vSphere Bitfusion cluster changes. When the cluster is not in a healthy state, you cannot add vSphere Bitfusion servers or perform a cluster backup operation. If half of the servers are powered off, the cluster is inoperable. When powering off a server for a longer period of time, you can prevent any potential risk by removing the server from the cluster.
Performing the following procedure immediately removes the server from the vSphere Bitfusion cluster. Any running applications that are using the GPUs receive an immediate GPU failure and usually return an error condition.
- Prevent new client connections to the specific server in the server settings.
- Verify that there are no running applications on the server.
- In the vSphere Client, select .
- On the Servers tab, select a server from the list.
- From the Actions drop-down menu, select Delete.
- In the confirmation dialog box, click Delete.
- Wait until the server is no longer listed on the Servers tab.
The delete operation can take up to 10 minutes and longer. During this time, the backing storage rebalances. Alternatively, you can verify that the delete operation is finished by running the nodetool status command in the terminal of a running server.
- (Optional) Delete the server virtual machine (VM).
Accidentally powering on the removed VM may result in the vSphere Bitfusion plug-in and cluster information being overwritten.
What to do next
- If you deleted the server from the cluster without deleting the VM, delete the /etc/bitfusion/bitfusion-manager.yaml configuration file on the VM, reenable the VM as a vSphere Bitfusion server, restart the vSphere Bitfusion service, and power on the VM. For more information, see Enabling the vSphere Bitfusion Client in the VMware vSphere Bitfusion Installation Guide and Start and Stop the vSphere Bitfusion Service.
- If you deleted the server VM, you can reuse the underlying hardware as a vSphere Bitfusion server by creating a VM and deploying the vSphere Bitfusion server appliance. See Add Subsequent vSphere Bitfusion Servers.