以下列表提供了有关最重要的 vSphere Bitfusion 命令及其任务的信息。如有必要,VMware 技术支持团队还会提供其他 CLI 命令。
在 vSphere Bitfusion 中分配 GPU
要为单个应用程序分配多个 GPU,请运行 bitfusion run
命令。
要分配多个 GPU 并启动会话以便在相同 GPU 上运行多个应用程序,请运行 bitfusion request_gpus。
在 vSphere Bitfusion 中运行应用程序
要启动单个应用程序,请运行 bitfusion run
命令。
要在使用 bitfusion request_gpus 命令启动的会话中启动多个应用程序,请运行 bitfusion client 命令。
在 vSphere Bitfusion 中解除分配 GPU
要在使用 bitfusion request_gpus 命令启动的会话中解除分配 GPU,请运行 bitfusion release_gpus 命令。
列出 vSphere Bitfusion 中的可用 GPU
要验证 vSphere Bitfusion 服务器安装并查找可用 GPU 列表,请运行 bitfusion list_gpus
命令。
- server 0 [172.16.31.162:56001]: running 0 tasks |- GPU [0]: free memory (15109 / 15109MiB) Tesla T4 (7.5) - server 1 (leader) [172.16.31.156:56001]: running 0 tasks |- GPU [0]: free memory (15109 / 15109MiB) Tesla T4 (7.5)
在 vSphere Bitfusion 中执行运行状况检查
可以从命令行访问运行状况检查。
- 要检查所有 vSphere Bitfusion 服务器和 Bitfusion 客户端的运行状况,请运行 bitfusion health。
- 要检查单个 vSphere Bitfusion 客户端或服务器的运行状况,请运行 bitfusion localhealth。
检查 vSphere Bitfusion 版本
要显示 vSphere Bitfusion 的安装版本,请运行 bitfusion version 命令。
Bitfusion version: 4.0.0 release
在 vSphere Bitfusion 中显示 GPU 信息
要显示 GPU 信息,请运行 bitfusion smi 命令。或者,要收到类似的输出,可以使用 bitfusion run 命令启动 nvidia-smi 应用程序。
+----------------------------------------------------------------------------------------+ | 172.16.31.162:56001 Driver Version: 460.73.01 | +--------------------------------------+-------------------------+-----------------------+ | GPU Name Persistence-M | Virt Mem Alloc / All | BusId Vol Uncorr ECC | | Fan Temp Perf Pwr:Usage/Cap | Phy Mem Used / All | GPU-Util Compute M. | |======================================+=========================+=======================| | 0 Tesla T4 Enabled | 0 MB / 15109 MB | 00000000:13:00.0 0 | | 0 % 28C P8 10W / 70W | 3 MB / 15109 MB | 0% Default | +--------------------------------------+-------------------------+-----------------------+ +----------------------------------------------------------------------------------------+ | 172.16.31.156:56001 Driver Version: 460.73.01 | +--------------------------------------+-------------------------+-----------------------+ | GPU Name Persistence-M | Virt Mem Alloc / All | BusId Vol Uncorr ECC | | Fan Temp Perf Pwr:Usage/Cap | Phy Mem Used / All | GPU-Util Compute M. | |======================================+=========================+=======================| | 0 Tesla T4 Enabled | 0 MB / 15109 MB | 00000000:13:00.0 0 | | 0 % 34C P8 10W / 70W | 3 MB / 15109 MB | 0% Default | +--------------------------------------+-------------------------+-----------------------+
在 vSphere Bitfusion 中测试带宽
要测试 vSphere Bitfusion 客户端和服务器之间的带宽和延迟,请运行 bitfusion net_perf 命令。
单个网络接口
Displayed results are calculated from round-trip measurements BW(1MB) = 1000/(LAT(1MB) - LAT(1B)) [ <client>] ens160 => [10.202.8.169] net1 ( tcp) Single packet lat = 51 us, bw(1MB) = 1.71 GB/s [ <client>] ens160 => [10.202.8.185] net1 ( tcp) Single packet lat = 48 us, bw(1MB) = 1.09 GB/s [ <client>] ens160 => [10.202.8.233] net1 ( tcp) Single packet lat = 50 us, bw(1MB) = 0.87 GB/s
多个网络接口
Displayed results are calculated from round-trip measurements BW(1MB) = 1000/(LAT(1MB) - LAT(1B)) [ <client>] ens160 => [10.202.8.169] net1 ( tcp) Single packet lat = 51 us, bw(1MB) = 1.71 GB/s [ <client>] ens160 => [10.202.8.185] net1 ( tcp) Single packet lat = 48 us, bw(1MB) = 1.09 GB/s [ <client>] ens160 => [10.202.8.233] net1 ( tcp) Single packet lat = 50 us, bw(1MB) = 0.87 GB/s [ <client>] ens192f0 => [10.202.8.169] net2 ( tcp) Single packet lat = 47 us, bw(1MB) = 2.14 GB/s [ <client>] ens192f0 => [10.202.8.185] net2 ( tcp) Single packet lat = 49 us, bw(1MB) = 1.11 GB/s [ <client>] ens192f0 => [10.202.8.233] net2 ( tcp) Single packet lat = 50 us, bw(1MB) = 1.15 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.169] vmw_pvrdma0 (infiniband) Single packet lat = 19 us, bw(1MB) = 3.66 GB/s Single packet Write lat = 8 us, bw = 10.101 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.185] vmw_pvrdma0 (infiniband) Single packet lat = 21 us, bw(1MB) = 3.45 GB/s Single packet Write lat = 8 us, bw = 10.5263 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.233] vmw_pvrdma0 (infiniband) Single packet lat = 21 us, bw(1MB) = 3.46 GB/s Single packet Write lat = 8 us, bw = 10.4167 GB/s
在 vSphere Bitfusion 中请求帮助
要获取 vSphere Bitfusion CLI 命令的完整列表或有关特定命令的详细信息,请运行 bitfusion help 命令。
NAME: Bitfusion - Run application with VMware Bitfusion USAGE: bitfusion <command> <options> "application" bitfusion <command> <options> -- [application] bitfusion help [command] For more information, system requirements, and advanced usage please visit docs.bitfusion.io COMMANDS: tls-certs, TC Manage TLS certificates used by bitfusion server. Requires root privileges. version, v Display full Bitfusion version localhealth, LH Run health check on current node only dealloc Deallocate license certificate. Requires root priviledges. crashreport Send crash report to bitfusion list_gpus List the available GPUs in a shared pool initdb Init database setup token Fetch and manipulate tokens register Register remote server as the plugin unregister Unregister remote plugin removenode Remove unavailable nodes user Manage bitfusion users help, h Shows a list of commands or help for one command Client Commands: client, c Run application health, H Run health check on all specified servers and current node request_gpus Request GPUs from a shared pool release_gpus Release GPUs back into a shared pool. Options must match a previous request_gpus command run Request GPUs from a shared pool, run a client command, then release the GPUs stats Gather stats from all servers. smi Display smi-like info for all servers. local Run a CUDA application locally net_perf Gather network performance data from all SRS servers. Server Commands: server, s Run dispatcher service - listens for 'bitfusion client' commands resource_scheduler, srs Run Bitfusion resource scheduler (SRS) on GPU server analytics Run Bitfusion analytics server manager Run Bitfusion manager server EXAMPLES: $ bitfusion resource_scheduler --srs_port 50001 $ bitfusion run -n 4 -- <application> $ bitfusion request_gpus -n 1 -p 0.25