The following list provides information about the most important vSphere Bitfusion commands and their tasks. If necessary, additional CLI commands can be provided by the VMware support team.
Allocate GPUs in vSphere Bitfusion
To allocate a number of GPUs for a single application, run the bitfusion run
command.
To allocate a number of GPUs and start a session, wherein you can run multiple applications on the same GPUs, run the bitfusion request_gpus.
Run applications in vSphere Bitfusion
To start a single application, run the bitfusion run
command.
To start multiple applications in a session started with the bitfusion request_gpus command, run the bitfusion client command.
Deallocate GPUs in vSphere Bitfusion
To deallocate GPUs in a session started with the bitfusion request_gpus command, run the bitfusion release_gpus command.
List available GPUs in vSphere Bitfusion
To verify a vSphere Bitfusion server installation and find a list of available GPUs, run the bitfusion list_gpus
command.
- server 0 [172.16.31.162:56001]: running 0 tasks |- GPU [0]: free memory (15109 / 15109MiB) Tesla T4 (7.5) - server 1 (leader) [172.16.31.156:56001]: running 0 tasks |- GPU [0]: free memory (15109 / 15109MiB) Tesla T4 (7.5)
Run a health check in vSphere Bitfusion
- To check the health of all vSphere Bitfusion servers and the Bitfusion client, run bitfusion health.
- To check the health of a single vSphere Bitfusion client or server, run bitfusion localhealth.
Check your vSphere Bitfusion version
To display the installed version of vSphere Bitfusion, run the bitfusion version command.
Bitfusion version: 4.0.0 release
Display GPU information in vSphere Bitfusion
To display GPU information, run the bitfusion smi command. Alternatively, to receive a similar output, you can start the nvidia-smi application with the bitfusion run command.
+----------------------------------------------------------------------------------------+ | 172.16.31.162:56001 Driver Version: 460.73.01 | +--------------------------------------+-------------------------+-----------------------+ | GPU Name Persistence-M | Virt Mem Alloc / All | BusId Vol Uncorr ECC | | Fan Temp Perf Pwr:Usage/Cap | Phy Mem Used / All | GPU-Util Compute M. | |======================================+=========================+=======================| | 0 Tesla T4 Enabled | 0 MB / 15109 MB | 00000000:13:00.0 0 | | 0 % 28C P8 10W / 70W | 3 MB / 15109 MB | 0% Default | +--------------------------------------+-------------------------+-----------------------+ +----------------------------------------------------------------------------------------+ | 172.16.31.156:56001 Driver Version: 460.73.01 | +--------------------------------------+-------------------------+-----------------------+ | GPU Name Persistence-M | Virt Mem Alloc / All | BusId Vol Uncorr ECC | | Fan Temp Perf Pwr:Usage/Cap | Phy Mem Used / All | GPU-Util Compute M. | |======================================+=========================+=======================| | 0 Tesla T4 Enabled | 0 MB / 15109 MB | 00000000:13:00.0 0 | | 0 % 34C P8 10W / 70W | 3 MB / 15109 MB | 0% Default | +--------------------------------------+-------------------------+-----------------------+
Test your bandwidth in vSphere Bitfusion
To test the bandwidth and latency between the vSphere Bitfusion client and servers, run the bitfusion net_perf command.
Displayed results are calculated from round-trip measurements BW(1MB) = 1000/(LAT(1MB) - LAT(1B)) [ <client>] ens160 => [10.202.8.169] net1 ( tcp) Single packet lat = 51 us, bw(1MB) = 1.71 GB/s [ <client>] ens160 => [10.202.8.185] net1 ( tcp) Single packet lat = 48 us, bw(1MB) = 1.09 GB/s [ <client>] ens160 => [10.202.8.233] net1 ( tcp) Single packet lat = 50 us, bw(1MB) = 0.87 GB/s
Displayed results are calculated from round-trip measurements BW(1MB) = 1000/(LAT(1MB) - LAT(1B)) [ <client>] ens160 => [10.202.8.169] net1 ( tcp) Single packet lat = 51 us, bw(1MB) = 1.71 GB/s [ <client>] ens160 => [10.202.8.185] net1 ( tcp) Single packet lat = 48 us, bw(1MB) = 1.09 GB/s [ <client>] ens160 => [10.202.8.233] net1 ( tcp) Single packet lat = 50 us, bw(1MB) = 0.87 GB/s [ <client>] ens192f0 => [10.202.8.169] net2 ( tcp) Single packet lat = 47 us, bw(1MB) = 2.14 GB/s [ <client>] ens192f0 => [10.202.8.185] net2 ( tcp) Single packet lat = 49 us, bw(1MB) = 1.11 GB/s [ <client>] ens192f0 => [10.202.8.233] net2 ( tcp) Single packet lat = 50 us, bw(1MB) = 1.15 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.169] vmw_pvrdma0 (infiniband) Single packet lat = 19 us, bw(1MB) = 3.66 GB/s Single packet Write lat = 8 us, bw = 10.101 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.185] vmw_pvrdma0 (infiniband) Single packet lat = 21 us, bw(1MB) = 3.45 GB/s Single packet Write lat = 8 us, bw = 10.5263 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.233] vmw_pvrdma0 (infiniband) Single packet lat = 21 us, bw(1MB) = 3.46 GB/s Single packet Write lat = 8 us, bw = 10.4167 GB/s
Request help in vSphere Bitfusion
To get the full list of vSphere Bitfusion CLI commands or more information about a specific command, run the bitfusion help command.
NAME: Bitfusion - Run application with VMware Bitfusion USAGE: bitfusion <command> <options> "application" bitfusion <command> <options> -- [application] bitfusion help [command] For more information, system requirements, and advanced usage please visit docs.bitfusion.io COMMANDS: tls-certs, TC Manage TLS certificates used by bitfusion server. Requires root privileges. version, v Display full Bitfusion version localhealth, LH Run health check on current node only dealloc Deallocate license certificate. Requires root priviledges. crashreport Send crash report to bitfusion list_gpus List the available GPUs in a shared pool initdb Init database setup token Fetch and manipulate tokens register Register remote server as the plugin unregister Unregister remote plugin removenode Remove unavailable nodes user Manage bitfusion users help, h Shows a list of commands or help for one command Client Commands: client, c Run application health, H Run health check on all specified servers and current node request_gpus Request GPUs from a shared pool release_gpus Release GPUs back into a shared pool. Options must match a previous request_gpus command run Request GPUs from a shared pool, run a client command, then release the GPUs stats Gather stats from all servers. smi Display smi-like info for all servers. local Run a CUDA application locally net_perf Gather network performance data from all SRS servers. Server Commands: server, s Run dispatcher service - listens for 'bitfusion client' commands resource_scheduler, srs Run Bitfusion resource scheduler (SRS) on GPU server analytics Run Bitfusion analytics server manager Run Bitfusion manager server EXAMPLES: $ bitfusion resource_scheduler --srs_port 50001 $ bitfusion run -n 4 -- <application> $ bitfusion request_gpus -n 1 -p 0.25