The deep learning virtual machine images delivered as part of VMware Private AI Foundation with NVIDIA are preconfigured with popular ML libraries, frameworks, and toolkits, and are optimized and validated by NVIDIA and VMware for GPU acceleration in a VMware Cloud Foundation environment.
As a data scientist, you can use the deep learning VMs provisioned from these images for AI prototyping, fine tuning, validation, and inference.
The software stack for running AI applications on top of NVIDIA GPUs is validated in advance. As a result, you directly start AI developing, without spending time installing and validating the compatibility of operating systems, software libraries, ML frameworks, toolkits, and GPU drivers.
What Does a Deep Learning VM Image Contain?
The latest deep learning virtual machine image contains the following software. For information on the component versions in each deep learning VM image release, see VMware Deep Learning VM Release Notes.
Software Component Category | Software Component | |
---|---|---|
Embedded |
|
|
Can be pre-installed automatically when you start the deep learning VM for the first time |
|
|
Deep learning (DL) workloads | CUDA Sample You can use a deep learning VM with running CUDA samples to explore vector addition, gravitational n-body simulation, or other examples on a VM. See the CUDA Samples page in the NVIDIA NGC catalog. |
|
PyTorch. You can use a deep learning VM with a PyTorch library to explore conversational AI, NLP, and other types of AI models, on a VM. See the PyTorch page in the NVIDIA NGC catalog. You can use a ready JupyterLab instance with PyTorch installed and configured at http://dl_vm_ip:8888. |
||
TensorFlow. You can use a deep learning VM with a TensorFlow library to explore conversational AI, NLP, and other types of AI models, on a VM. See the TensorFlow page in the NVIDIA NGC catalog. You can use a ready JupyterLab instance with TensorFlow installed and configured at http://dl_vm_ip:8888. |
||
DCGM Exporter You can use a deep learning VM with a Data Center GPU Manager (DCGM) exporter to monitor the health of and get metrics from GPUs used by a DL workload, using NVIDIA DCGM, Prometheus, and Grafana. See the DCGM Exporter page in the NVIDIA NGC catalog. In a deep learning VM, you run the DCGM Exporter container together with a DL workload that performs AI operations. After the deep learning VM is started, DCGM Exporter is ready to collect vGPU metrics and export the data to another application for further monitoring and visualization. For information how to use DGCM Exporter to visualize metrics with Prometheus and Grafana, see DCGM Exporter Installation and Initial Setup.
|
||
Triton Inference Server You can use a deep learning VM with a Triton Inference Server for loading a model repository and receive inference requests. See the Triton Inference Server page in the NVIDIA NGC catalog. For information how to use Triton Inference Server for inference requests for AI models, see Triton Inference Server Installation and Initial Setup. |
||
NVIDIA RAG You can use a deep learning VM to build Retrieval Augmented Generation (RAG) solutions with an LLM model. See the NVIDIA RAG Applications Docker Compose documentation (requires specific account permissions) . A sample chatbot Web application that you can access at http://dl_vm_ip:3001/orgs/nvidia/models/text-qa-chatbot. You can upload your own knowledge base. |
Deploying a Deep Learning VM
As a data scientist or MLOps engineer, you can deploy a deep learning VM on your own by using catalog items in VMware Aria Automation. Otherwise, a cloud administrator or DevOps engineer deploys such a VM for you.