This section lists the available vSphere Bitfusion CLI commands and their tasks. Additional CLI commands can be provided by the VMware support team.

Allocate GPUs

To allocate a number of GPUs for a single application, run the run command.

To allocate a number of GPUs and start a session, wherein you can run multiple applications on the same GPUs, run the request_gpus.

Start Applications in the vSphere Bitfusion Environment Accessing the GPUs

To start a single application, run the run command.

To start multiple applications in a session started with the request_gpus command, run the client command.

Deallocate the GPUs

To deallocate GPUs in a session started with the request_gpus command, run the release_gpus command.

List Available GPUs

To verify a vSphere Bitfusion server installation and find a list of available GPUs, run the list_gpus command.

- server 0 [172.31.51.20:56001]: running 0 tasks
|- GPU 0: free memory 12000 MiB / 12000 MiB
|- GPU 1: free memory 12000 MiB / 12000 MiB
|- GPU 2: free memory 12000 MiB / 12000 MiB
|- GPU 3: free memory 12000 MiB / 12000 MiB
- server 1 [172.31.51.26:56003]: running 0 tasks
|- GPU 0: free memory 12000 MiB / 12000 MiB
|- GPU 1: free memory 12000 MiB / 12000 MiB
- server 2 [172.31.51.42:56003]: running 0 tasks
|- GPU 0: free memory 12000 MiB / 12000 MiB
|- GPU 1: free memory 12000 MiB / 12000 MiB

Run a Health Check

You can access the health check from the command line.
  • To check the health of all vSphere Bitfusion servers and the Bitfusion client, run bitfusion health.
  • To check the health of the vSphere Bitfusion client only, run bitfusion localhealth.

Request Help

To get the full list of vSphere Bitfusion CLI commands or more information about a specific command, run the help command.

NAME:
    bitfusion - Run application with VMware Bitfusion

USAGE:
   bitfusion <command> <options> "application"
   bitfusion <command> <options> -- [application]
   bitfusion help [command]

   For more information, system requirements, and advanced usage please visit docs.bitfusion.io

COMMANDS:
        tls-certs, TC    Manage TLS certificates used by bitfusion server.  Requires root privileges.
        version, v       Display full Bitfusion version
        localhealth, LH  Run health check on current node only
        dealloc          Deallocate license certificate.  Requires root priviledges.
        crashreport      Send crash report to bitfusion
        license          Check license status
        list_gpus        List the available GPUs in a shared pool
        initdb           Init database setup
        token            Fetch and manipulate tokens
        register         Register remote server as the plugin
        unregister       Unregister remote plugin
        removenode       Remove unavailable nodes
        user             Manage bitfusion users
        help, h          Shows a list of commands or help for one command
   Client Commands:
        client, c     Run application
        health, H     Run health check on all specified servers and current node
        request_gpus  Request GPUs from a shared pool
        release_gpus  Release GPUs back into a shared pool. Options must match a previous request_gpus command
        run           Request GPUs from a shared pool, run a client command, then release the GPUs
        stats         Gather stats from all servers.
        smi           Display smi-like info for all servers.
        local         Run a CUDA application locally
        net_perf      Gather network performance data from all SRS servers.
   Server Commands:
        server, s                Run dispatcher service - listens for 'bitfusion client' commands
        resource_scheduler, srs  Run Bitfusion resource scheduler (SRS) on GPU server
        analytics                Run Bitfusion analytics server
        manager                  Run Bitfusion manager server

EXAMPLES:
   $ sudo bitfusion init -l <license_key>

   $ bitfusion resource_scheduler --srs_port 50001

   $ bitfusion run -n 4 -- <application>

Check vSphere Bitfusion Version

To check the version of vSphere Bitfusion that is installed, run the version command.

Bitfusion version: 2.5.0 release

Display GPU Information

To display GPU information, run the smi command. Alternatively, to receive a similar output, you can start the nvidia-smi application with the run command.

+----------------------------------------------------------------------------------------+
| 172.16.31.243:56001                                          Driver Version: 440.64.00 |
+--------------------------------------+-------------------------+-----------------------+
| GPU  Name              Persistence-M | Virt Mem    Alloc / All | BusId  Vol Uncorr ECC |
| Fan  Temp  Perf        Pwr:Usage/Cap | Phy Mem     Used  / All | GPU-Util   Compute M. |
|======================================+=========================+=======================|
| 0    Tesla T4               Disabled | 0       MB / 15109   MB | 00000000:13:00.0    0 |
| 0 %   36C  P8             10W /  70W | 11      MB / 15109   MB |   0%          Default |
+--------------------------------------+-------------------------+-----------------------+
+----------------------------------------------------------------------------------------+
| 172.16.31.241:56001                                                                    |
+----------------------------------------------------------------------------------------+

Test the Bandwidth

To test the bandwidth and latency between the vSphere Bitfusion client and servers, run the net_perf command.

Single network interface
Displayed results are calculated from round-trip measurements
BW(1MB) = 1000/(LAT(1MB) - LAT(1B))

[ <client>] ens160 => [10.202.8.169] net1 ( tcp) Single packet lat = 51 us, bw(1MB) = 1.71 GB/s
[ <client>] ens160 => [10.202.8.185] net1 ( tcp) Single packet lat = 48 us, bw(1MB) = 1.09 GB/s
[ <client>] ens160 => [10.202.8.233] net1 ( tcp) Single packet lat = 50 us, bw(1MB) = 0.87 GB/s
Multiple network interfaces
Displayed results are calculated from round-trip measurements
BW(1MB) = 1000/(LAT(1MB) - LAT(1B))

[ <client>] ens160 => [10.202.8.169] net1 ( tcp) Single packet lat = 51 us, bw(1MB) = 1.71 GB/s
[ <client>] ens160 => [10.202.8.185] net1 ( tcp) Single packet lat = 48 us, bw(1MB) = 1.09 GB/s
[ <client>] ens160 => [10.202.8.233] net1 ( tcp) Single packet lat = 50 us, bw(1MB) = 0.87 GB/s
[ <client>] ens192f0 => [10.202.8.169] net2 ( tcp) Single packet lat = 47 us, bw(1MB) = 2.14 GB/s
[ <client>] ens192f0 => [10.202.8.185] net2 ( tcp) Single packet lat = 49 us, bw(1MB) = 1.11 GB/s
[ <client>] ens192f0 => [10.202.8.233] net2 ( tcp) Single packet lat = 50 us, bw(1MB) = 1.15 GB/s
[ <client>] vmw_pvrdma0 => [10.202.8.169] vmw_pvrdma0 (infiniband) Single packet lat = 19 us, bw(1MB) = 3.66 GB/s Single packet Write lat = 8 us, bw = 10.101 GB/s
[ <client>] vmw_pvrdma0 => [10.202.8.185] vmw_pvrdma0 (infiniband) Single packet lat = 21 us, bw(1MB) = 3.45 GB/s Single packet Write lat = 8 us, bw = 10.5263 GB/s
[ <client>] vmw_pvrdma0 => [10.202.8.233] vmw_pvrdma0 (infiniband) Single packet lat = 21 us, bw(1MB) = 3.46 GB/s Single packet Write lat = 8 us, bw = 10.4167 GB/s