以下清單提供了有關最重要的 vSphere Bitfusion 命令及其工作的資訊。如有必要,其他 CLI 命令可由 VMware 支援團隊提供。
在 vSphere Bitfusion 中配置 GPU
若要為單一應用程式配置多個 GPU,請執行 bitfusion run
命令。
若要配置多個 GPU,並啟動工作階段,以便在同一個 GPU 上執行多個應用程式,請執行 bitfusion request_gpus。
在 vSphere Bitfusion 中執行應用程式
若要啟動單一應用程式,請執行 bitfusion run
命令。
若要在使用 bitfusion request_gpus 命令啟動的工作階段中啟動多個應用程式,請執行 bitfusion client 命令。
在 vSphere Bitfusion 中取消配置 GPU
若要在使用 bitfusion request_gpus 命令啟動的工作階段中取消配置 GPU,請執行 bitfusion release_gpus 命令。
列出 vSphere Bitfusion 中的可用 GPU
若要驗證 vSphere Bitfusion 伺服器安裝並尋找可用 GPU 的清單,請執行 bitfusion list_gpus
命令。
- server 0 [172.16.31.162:56001]: running 0 tasks |- GPU [0]: free memory (15109 / 15109MiB) Tesla T4 (7.5) - server 1 (leader) [172.16.31.156:56001]: running 0 tasks |- GPU [0]: free memory (15109 / 15109MiB) Tesla T4 (7.5)
在 vSphere Bitfusion 中執行健全狀況檢查
您可以從命令列存取健全狀況檢查。
- 若要檢查所有 vSphere Bitfusion 伺服器及 Bitfusion 用戶端的健全狀況,請執行 bitfusion health。
- 若要檢查單一 vSphere Bitfusion 用戶端或伺服器的健全狀況,請執行 bitfusion localhealth。
檢查 vSphere Bitfusion 版本
若要顯示 vSphere Bitfusion 的安裝版本,請執行 bitfusion version 命令。
Bitfusion version: 4.0.0 release
在 vSphere Bitfusion 中顯示 GPU 資訊
若要顯示 GPU 資訊,請執行 bitfusion smi 命令。或者,若要接收類似輸出,您可以使用 bitfusion run 命令啟動 nvidia-smi 應用程式。
+----------------------------------------------------------------------------------------+ | 172.16.31.162:56001 Driver Version: 460.73.01 | +--------------------------------------+-------------------------+-----------------------+ | GPU Name Persistence-M | Virt Mem Alloc / All | BusId Vol Uncorr ECC | | Fan Temp Perf Pwr:Usage/Cap | Phy Mem Used / All | GPU-Util Compute M. | |======================================+=========================+=======================| | 0 Tesla T4 Enabled | 0 MB / 15109 MB | 00000000:13:00.0 0 | | 0 % 28C P8 10W / 70W | 3 MB / 15109 MB | 0% Default | +--------------------------------------+-------------------------+-----------------------+ +----------------------------------------------------------------------------------------+ | 172.16.31.156:56001 Driver Version: 460.73.01 | +--------------------------------------+-------------------------+-----------------------+ | GPU Name Persistence-M | Virt Mem Alloc / All | BusId Vol Uncorr ECC | | Fan Temp Perf Pwr:Usage/Cap | Phy Mem Used / All | GPU-Util Compute M. | |======================================+=========================+=======================| | 0 Tesla T4 Enabled | 0 MB / 15109 MB | 00000000:13:00.0 0 | | 0 % 34C P8 10W / 70W | 3 MB / 15109 MB | 0% Default | +--------------------------------------+-------------------------+-----------------------+
在 vSphere Bitfusion 中測試頻寬
若要測試 vSphere Bitfusion 用戶端和伺服器之間的頻寬和延遲,請執行 bitfusion net_perf 命令。
單一網路介面
Displayed results are calculated from round-trip measurements BW(1MB) = 1000/(LAT(1MB) - LAT(1B)) [ <client>] ens160 => [10.202.8.169] net1 ( tcp) Single packet lat = 51 us, bw(1MB) = 1.71 GB/s [ <client>] ens160 => [10.202.8.185] net1 ( tcp) Single packet lat = 48 us, bw(1MB) = 1.09 GB/s [ <client>] ens160 => [10.202.8.233] net1 ( tcp) Single packet lat = 50 us, bw(1MB) = 0.87 GB/s
多個網路介面
Displayed results are calculated from round-trip measurements BW(1MB) = 1000/(LAT(1MB) - LAT(1B)) [ <client>] ens160 => [10.202.8.169] net1 ( tcp) Single packet lat = 51 us, bw(1MB) = 1.71 GB/s [ <client>] ens160 => [10.202.8.185] net1 ( tcp) Single packet lat = 48 us, bw(1MB) = 1.09 GB/s [ <client>] ens160 => [10.202.8.233] net1 ( tcp) Single packet lat = 50 us, bw(1MB) = 0.87 GB/s [ <client>] ens192f0 => [10.202.8.169] net2 ( tcp) Single packet lat = 47 us, bw(1MB) = 2.14 GB/s [ <client>] ens192f0 => [10.202.8.185] net2 ( tcp) Single packet lat = 49 us, bw(1MB) = 1.11 GB/s [ <client>] ens192f0 => [10.202.8.233] net2 ( tcp) Single packet lat = 50 us, bw(1MB) = 1.15 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.169] vmw_pvrdma0 (infiniband) Single packet lat = 19 us, bw(1MB) = 3.66 GB/s Single packet Write lat = 8 us, bw = 10.101 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.185] vmw_pvrdma0 (infiniband) Single packet lat = 21 us, bw(1MB) = 3.45 GB/s Single packet Write lat = 8 us, bw = 10.5263 GB/s [ <client>] vmw_pvrdma0 => [10.202.8.233] vmw_pvrdma0 (infiniband) Single packet lat = 21 us, bw(1MB) = 3.46 GB/s Single packet Write lat = 8 us, bw = 10.4167 GB/s
在 vSphere Bitfusion 中請求說明
若要取得 vSphere Bitfusion CLI 命令的完整清單或有關特定命令的詳細資訊,請執行 bitfusion help 命令。
NAME: Bitfusion - Run application with VMware Bitfusion USAGE: bitfusion <command> <options> "application" bitfusion <command> <options> -- [application] bitfusion help [command] For more information, system requirements, and advanced usage please visit docs.bitfusion.io COMMANDS: tls-certs, TC Manage TLS certificates used by bitfusion server. Requires root privileges. version, v Display full Bitfusion version localhealth, LH Run health check on current node only dealloc Deallocate license certificate. Requires root priviledges. crashreport Send crash report to bitfusion list_gpus List the available GPUs in a shared pool initdb Init database setup token Fetch and manipulate tokens register Register remote server as the plugin unregister Unregister remote plugin removenode Remove unavailable nodes user Manage bitfusion users help, h Shows a list of commands or help for one command Client Commands: client, c Run application health, H Run health check on all specified servers and current node request_gpus Request GPUs from a shared pool release_gpus Release GPUs back into a shared pool. Options must match a previous request_gpus command run Request GPUs from a shared pool, run a client command, then release the GPUs stats Gather stats from all servers. smi Display smi-like info for all servers. local Run a CUDA application locally net_perf Gather network performance data from all SRS servers. Server Commands: server, s Run dispatcher service - listens for 'bitfusion client' commands resource_scheduler, srs Run Bitfusion resource scheduler (SRS) on GPU server analytics Run Bitfusion analytics server manager Run Bitfusion manager server EXAMPLES: $ bitfusion resource_scheduler --srs_port 50001 $ bitfusion run -n 4 -- <application> $ bitfusion request_gpus -n 1 -p 0.25