You can provision a deep learning virtual machine with a supported deep learning (DL) workload in addition to its embedded components. The DL workloads are downloaded from the NVIDIA NGC catalog and are GPU-optimized and validated by NVIDIA and VMware by Broadcom.
For an overview of the deep learning VM images, see About Deep Learning VM Images in VMware Private AI Foundation with NVIDIA.
CUDA Sample
You can use a deep learning VM with running CUDA samples to explore vector addition, gravitational n-body simulation, or other examples on a VM. See the CUDA Samples page.
After the deep learning VM is launched, it runs a CUDA sample workload to test the vGPU guest driver. You can examine the test output in the /var/log/dl.log file.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/k8s/cuda-sample:ngc_image_tagFor example: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8 For information on the CUDA Sample container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a CUDA Sample workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
|
PyTorch
You can use a deep learning VM with a PyTorch library to explore conversational AI, NLP, and other types of AI models, on a VM. See the PyTorch page.
After the deep learning VM is launched, it starts a JupyterLab instance with PyTorch packages installed and configured.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/pytorch:ngc_image_tagFor example: nvcr.io/nvidia/pytorch:23.10-py3 For information on the PyTorch container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a PyTorch workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
|
TensorFlow
You can use a deep learning VM with a TensorFlow library to explore conversational AI, NLP, and other types of AI models, on a VM. See the TensorFlow page.
After the deep learning VM is launched, it starts a JupyterLab instance with TensorFlow packages installed and configured.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/tensorflow:ngc_image_tag For example: nvcr.io/nvidia/tensorflow:23.10-tf2-py3 For information on the TensorFlow container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a TensorFlow workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
|
DCGM Exporter
You can use a deep learning VM with a Data Center GPU Manager (DCGM) exporter to monitor the health of and get metrics from GPUs used by a DL workload, using NVIDIA DCGM, Prometheus, and Grafana.
See the DCGM Exporter page.
In a deep learning VM, you run the DCGM Exporter container together with a DL workload that performs AI operations. After the deep learning VM is started, DCGM Exporter is ready to collect vGPU metrics and export the data to another application for further monitoring and visualization. You can run the monitored DL workload as a part of the cloud-init process or from the command line after the virtual machine is started.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/k8s/dcgm-exporter:ngc_image_tag For example: nvcr.io/nvidia/k8s/dcgm-exporter:3.2.5-3.1.8-ubuntu22.04 For information on the DCGM Exporter container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a DCGM Exporter workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
Next, in the deep learning VM, you run a DL workload, and visualize the data on another virtual machine by using Prometheus at http://visualization_vm_ip:9090 and Grafana at http://visualization_vm_ip:3000. |
Run a DL Workload on the Deep Leaning VM
Run the DL workload you want to collect vGPU metrics for and export the data to another application for further monitoring and visualization.
- Log in to the deep learning VM as vmware over SSH.
- Add the vmware user account to the docker group by running the following command.
sudo usermod -aG docker ${USER}
- Run the container for the DL workload, pulling it from the NVIDIA NGC catalog or from a local container registry.
For example, to run the following command to run the tensorflow:23.10-tf2-py3 image from NVIDIA NGC:
docker run -d -p 8888:8888 nvcr.io/nvidia/tensorflow:23.10-tf2-py3 /usr/local/bin/jupyter lab --allow-root --ip=* --port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/workspace
- Start using the DL workload for AI development.
Install Prometheus and Grafana
You can visualize and monitor the vGPU metrics from the DCGM Exporter virtual machine on a virtual machine running Prometheus and Grafana.
- Create a visualization VM with Docker Community Engine installed.
- Connect to the VM over SSH and create a YAML file for Prometheus.
$ cat > prometheus.yml << EOF global: scrape_interval: 15s external_labels: monitor: 'codelab-monitor' scrape_configs: - job_name: 'dcgm' scrape_interval: 5s metrics_path: /metrics static_configs: - targets: [dl_vm_with_dcgm_exporter_ip:9400'] EOF
- Create a data path.
$ mkdir grafana_data prometheus_data && chmod 777 grafana_data prometheus_data
- Create a Docker compose file to install Prometheus and Grafana.
$ cat > compose.yaml << EOF services: prometheus: image: prom/prometheus:v2.47.2 container_name: "prometheus0" restart: always ports: - "9090:9090" volumes: - "./prometheus.yml:/etc/prometheus/prometheus.yml" - "./prometheus_data:/prometheus" grafana: image: grafana/grafana:10.2.0-ubuntu container_name: "grafana0" ports: - "3000:3000" restart: always volumes: - "./grafana_data:/var/lib/grafana" EOF
- Start the Prometheus and Grafana containers.
$ sudo docker compose up -d
View vGPU Metrics in Prometheus
You can access Prometheus at http://visualization-vm-ip:9090. You can view the following vGPU information in the Prometheus UI:
Information | UI Section |
---|---|
Raw vGPU metrics from the deep learning VM | To view the raw vGPU metrics from the deep learning VM, click the endpoint entry. |
Graph expressions |
|
For more information on using Prometheus, see the Prometheus documentation.
Visualize Metrics in Grafana
Set Prometheus as a data source for Grafana and visualize the vGPU metrics from the deep learning VM in a dashboard.
- Access Grafana at http://visualization-vm-ip:3000 by using the default user name admin and password
admin
. - Add Prometheus as the first data source, connecting to visualization-vm-ip on port 9090.
- Create a dashboard with the vGPU metrics.
For more information on configuring a dashboard using a Prometheus data source, see the Grafana documentation.
Triton Inference Server
You can use a deep learning VM with a Triton Inference Server for loading a model repository and receive inference requests.
See the Triton Inference Server page.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/tritonserver:ngc_image_tag For example: nvcr.io/nvidia/tritonserver:23.10-py3 For information on the Triton Inference Server container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a Triton Inference Server workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
The model repository for the Triton Inference Server is in /home/vmware/model_repository. Initially, the model repository is empty and the initial log of the Triton Inference Server instance shows that no model is loaded. |
Create a Model Repository
To load your model for model inference, perform these steps:
- Create the model repository for your model.
See the NVIDIA Triton Inference Server Model Repository documentation .
- Copy the model repository to
/home/vmware/model_repository
so that the Triton Inference Server can load it.sudo cp -r path_to_your_created_model_repository/* /home/vmware/model_repository/
Send Model Inference Requests
- Verify that the Triton Inference Server is healthy and models are ready by running this command in the deep learning VM console.
curl -v localhost:8000/v2/simple_sequence
- Send a request to the model by running this command on the deep learning VM.
curl -v localhost:8000/v2/models/simple_sequence
For more information on using the Triton Inference Server, see NVIDIA Triton Inference Server Model Repository documentation.
NVIDIA RAG
You can use a deep learning VM to build Retrieval Augmented Generation (RAG) solutions with an Llama2 model.
See the AI Chatbot with Retrieval Augmented Generation documentation.
Component | Description |
---|---|
Container images and models | rag-app-text-chatbot.yamlin the NVIDIA sample RAG pipeline. For information on the NVIDIA RAG container applications that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy an NVIDIA RAG workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
|