Deploying deep learning VMs

As a data scientist, you can use Automation Service Broker to deploy deep learning virtual machines for AI development.

Note: This documentation is based on VMware Aria Automation 8.18. For information about the VMware Private AI Foundation functionality in VMware Aria Automation 8.18.1, see Deploy Deep Learning Virtual Machines by Using Self-Service Catalog Items in VMware Aria Automation in the VMware Private AI Foundation with NVIDIA documentation.

When you request an AI workstation (VM) in the Automation Service Broker catalog, you provision a GPU-enabled deep learning VM that can be configured with the desired vCPU, vGPU, Memory, and AI/ML NGC containers from NVIDIA.

Deploy a deep learning virtual machine to a VI workload domain

As a data scientist, you can deploy a single GPU software-defined development environment from the self-service Automation Service Broker catalog.

You can customize the GPU-enabled virtual machine with machine parameters to model development requirements, pre-install AI/ML frameworks like PyTorch, TensorFlow, and CUDA, to meet training and inference requirements, and specify the AI/ML packages from the NVIDIA NGC registry via a portal access key.

Procedure

On the Catalog page in Automation Service Broker, locate the AI Workstation card and click Request.
Select a project.
Enter a name and description for your deployment.

Configure the AI workstation parameters.

Setting	Sample value
VM class	`A100 Small - 1 vGPU (16 GB), 8 CPUs and 16 GB Memory`
Data disk size	`32 GB`
User password	Enter a password for the default user. You might be prompted to reset your password when you first log in.
SSH public key	This setting is optional.

Select a software bundle to install on your workstation.

Setting	Description
PyTorch	The PyTorch NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container also contains software for accelerating ETL (DALI, RAPIDS), Training (cuDNN, NCCL), and Inference (TensorRT) workloads.
TensorFlow	The TensorFlow NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container may also contain modifications to the TensorFlow source code in order to maximize performance and compatibility. This container also contains software for accelerating ETL (DALI, RAPIDS), Training (cuDNN, NCCL), and Inference (TensorRT) workloads.
CUDA Samples	This is a collection of containers to run CUDA workloads on the GPUs. The collection includes containerized CUDA samples for example, vectorAdd (to demonstrate vector addition), nbody (or gravitational n-body simulation) and other examples. These containers can be used for validating the software configuration of GPUs in the system or simply to run some example workloads.

(Optional) Enter a custom cloud-init that you want to install in addition to the cloud-init defined for the software bundle.
VMware Aria Automation merges the cloud-init from the software bundle and the custom cloud-init.
Click Submit.

Results

The deployment Overview tab contains a summary of the software that was installed, along with instructions on how to access the application, services, and the workstation VM.

Add DCGM Exporter for DL workload monitoring

You can use DCGM Exporter for monitoring a deep learning workload that uses GPU capacity.

DCGM-Exporter is an exporter for Prometheus that monitors the company's health and gets metrics from GPUs. It leverages DCGM using Go bindings to collect GPU telemetry and exposes GPU metrics to Prometheus using an HTTP endpoint (/metrics). DCGM-Exporter can be standalone or deployed as part of the NVIDIA GPU Operator.

Before you begin

Verify that you have successfully deployed a deep learning VM.

Procedure

Run the DCGM Exporter container by using the following command.

docker run -d --gpus all --cap-add SYS_ADMIN --rm -p 9400:9400 registry-URI-path/nvidia/k8s/dcgm-exporter:ngc_image_tag

For example, to run dcgm-exporter:3.2.5-3.1.8-ubuntu22.04 from the NVIDIA NGC catalog, run the following command:

docker run -d --gpus all --cap-add SYS_ADMIN --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:dcgm-exporter:3.2.5-3.1.8-ubuntu22.04

After the DCGM Exporter installation is complete, visualize vGPU metrics in Prometheus and Grafana.

Deploy a GPU-enabled workstation with NVIDIA Triton Inference Server

As a data scientist, you can deploy a GPU-enabled workstation with NVIDIA Triton Inference Server from the self-service Automation Service Broker catalog.

NVIDIA Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton Inference Server supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for a variety of machine learning frameworks, including TensorFlow, PyTorch, and others. For edge deployments, Triton Inference Server is available as a shared library with a C API that allows the full functionality of Triton to be included directly in an application.

The deployed workstation includes Ubuntu 22.04, an NVIDIA vGPU driver, Docker Engine, NVIDIA Container Toolkit, and NVIDIA Triton Inference Server.

Procedure

On the Catalog page in Automation Service Broker, locate the Triton Inferencing Server card and click Request.
Select a project.
Enter a name and description for your deployment.

Configure the AI workstation parameters.

Setting	Sample value
VM class	`A100 Small - 1 vGPU (16 GB), 8 CPUs and 16 GB Memory` VM classes with Unified Virtual Memory (UVM) support are required for running Triton Inferencing Server.
Data disk size	`32 GB`
User password	Enter a password for the defalt user. You might be prompted to reset your password when you first log in.
SSH public key	This setting is optional.

(Optional) Enter a custom cloud-init that you want to install in addition to the cloud-init defined for the software bundle.
VMware Aria Automation merges the cloud-init from the software bundle and the custom cloud-init.
Click Submit.