When you deploy a deep learning VM in vSphere IaaS control plane by using kubectl or directly on a vSphere cluster, you must fill in custom VM properties.
For information about deep learning VM images in VMware Private AI Foundation with NVIDIA, see About Deep Learning VM Images in VMware Private AI Foundation with NVIDIA.
OVF Properties of Deep Learning VMs
When you deploy a deep learning VM, you must fill in custom VM properties to automate the configuration of the Linux operating system, the deployment of the vGPU guest driver, and the deployment and configuration of NGC containers for the DL workloads.
The latest deep learning VM image has the following OVF properties:
Category | Parameter | Label in the vSphere Client | Description |
---|---|---|---|
Base OS Properties | instance-id | Instance ID | Required. A unique instance ID for the VM instance. An instance ID uniquely identifies an instance. When an instance ID changes, cloud-init treats the instance as a new instance and runs the cloud-init process to again. |
hostname | Hostname | Required. The host name of the appliance. | |
seedfrom | URL to seed instance data from | Optional. An URL to pull the value for the user-data parameter and metadata from. | |
public-keys | SSH public key | If provided, the instance populates the default user's SSH authorized_keys with this value. |
|
user-data | Encoded user-data | A set of scripts or other metadata that is inserted into the VM at provisioning time. This property is the actual contents of the cloud-init script. This value must be base64 encoded.
|
|
password | Default user's password | Required. The password for the default vmware user account. | |
vGPU Driver Installation |
vgpu-license | vGPU license | Required. The NVIDIA vGPU client configuration token. The token is saved in the /etc/nvidia/ClientConfigToken/client_configuration_token.tok file. |
nvidia-portal-api-key | NVIDIA Portal API key | Required in a connected environment. The API key you downloaded from the NVIDIA Licensing Portal. The key is required for vGPU guest driver installation. |
|
vgpu-host-driver-version | vGPU host driver version | Install directly this version of the vGPU guest driver. | |
vgpu-url | URL for air-gapped vGPU downloads | Required in a disconnected environment. The URL to download the vGPU guest driver from. For information on the required configuration of the local Web server, see Preparing VMware Cloud Foundation for Private AI Workload Deployment. |
|
DL Workload Automation | registry-uri | Registry URI | Required in a disconnected environment or if you plan to use a private container registry to avoid downloading images from the Internet. The URI of a private container registry with the deep learning workload container images. Required if you are referring to a private registry in |
registry-user | Registry username | Required if you are using a private container registry that requires basic authentication. | |
registry-passwd | Registry password | Required if you are using a private container registry that requires basic authentication. | |
registry-2-uri | Secondary registry URI | Required if you are using a second private container registry that is based on Docker and requires basic authentication. For example, when deploying a deep learning VM with the NVIDIA RAG DL workload pre-installed, a pgvector image is downloaded from Docker Hub. You can use the |
|
registry-2-user | Secondary registry username | Required if you are using a second private container registry. | |
registry-2-passwd | Secondary registry password | Required if you are using a second private container registry. | |
image-oneliner | Encoded one-line command | A one-line bash command that is run at VM provisioning. This value must be base64 encoded. You can use this property to specify the DL workload container you want to deploy, such as PyTorch or TensorFlow. See Deep Learning Workloads in VMware Private AI Foundation with NVIDIA.
Caution: Avoid using both
user-data and
image-oneliner .
|
|
docker-compose-uri | Encoded Docker compose file | Required if you need a Docker compose file to start the DL workload container. The contents of the docker-compose.yaml file that will be inserted into the virtual machine at provisioning after the virtual machine is started with GPU enabled. This value must be base64 encoded. |
|
config-json | Encoded config.json | The contents of a configuration file for adding the following details:
This value must be base64 encoded. |
|
conda-environment-install | Conda Environment Install | A comma-separated list of Conda environments to be automatically installed after VM deployment is complete. Available environments: pytorch2.3_py3.12, |
Deep Learning Workloads in VMware Private AI Foundation with NVIDIA
You can provision a deep learning virtual machine with a supported deep learning (DL) workload in addition to its embedded components. The DL workloads are downloaded from the NVIDIA NGC catalog and are GPU-optimized and validated by NVIDIA and VMware by Broadcom.
For an overview of the deep learning VM images, see About Deep Learning VM Images in VMware Private AI Foundation with NVIDIA.
CUDA Sample
You can use a deep learning VM with running CUDA samples to explore vector addition, gravitational n-body simulation, or other examples on a VM. See the CUDA Samples page.
After the deep learning VM is launched, it runs a CUDA sample workload to test the vGPU guest driver. You can examine the test output in the /var/log/dl.log file.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/k8s/cuda-sample:ngc_image_tagFor example: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8 For information on the CUDA Sample container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a CUDA Sample workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
|
PyTorch
You can use a deep learning VM with a PyTorch library to explore conversational AI, NLP, and other types of AI models, on a VM. See the PyTorch page.
After the deep learning VM is launched, it starts a JupyterLab instance with PyTorch packages installed and configured.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/pytorch-pb24h1:ngc_image_tagFor example: nvcr.io/nvidia/pytorch-pb24h1:24.03.02-py3 For information on the PyTorch container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a PyTorch workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
|
TensorFlow
You can use a deep learning VM with a TensorFlow library to explore conversational AI, NLP, and other types of AI models, on a VM. See the TensorFlow page.
After the deep learning VM is launched, it starts a JupyterLab instance with TensorFlow packages installed and configured.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/tensorflow-pb24h1:ngc_image_tag For example: nvcr.io/nvidia/tensorflow-pb24h1:24.03.02-tf2-py3 For information on the TensorFlow container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a TensorFlow workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
|
DCGM Exporter
You can use a deep learning VM with a Data Center GPU Manager (DCGM) exporter to monitor the health of and get metrics from GPUs used by a DL workload, using NVIDIA DCGM, Prometheus, and Grafana.
See the DCGM Exporter page.
In a deep learning VM, you run the DCGM Exporter container together with a DL workload that performs AI operations. After the deep learning VM is started, DCGM Exporter is ready to collect vGPU metrics and export the data to another application for further monitoring and visualization. You can run the monitored DL workload as a part of the cloud-init process or from the command line after the virtual machine is started.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/k8s/dcgm-exporter:ngc_image_tag For example: nvcr.io/nvidia/k8s/dcgm-exporter:3.2.5-3.1.8-ubuntu22.04 For information on the DCGM Exporter container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a DCGM Exporter workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
Next, in the deep learning VM, you run a DL workload, and visualize the data on another virtual machine by using Prometheus at http://visualization_vm_ip:9090 and Grafana at http://visualization_vm_ip:3000. |
Run a DL Workload on the Deep Leaning VM
Run the DL workload you want to collect vGPU metrics for and export the data to another application for further monitoring and visualization.
- Log in to the deep learning VM as vmware over SSH.
- Run the container for the DL workload, pulling it from the NVIDIA NGC catalog or from a local container registry.
For example, to run the following command to run the tensorflow-pb24h1:24.03.02-tf2-py3 image from NVIDIA NGC:
docker run -d --gpus all -p 8888:8888 nvcr.io/nvidia/tensorflow-pb24h1:24.03.02-tf2-py3 /usr/local/bin/jupyter lab --allow-root --ip=* --port=8888 --no-browser --NotebookApp.token="$TOKEN" --NotebookApp.allow_origin="*" --notebook-dir=/workspace
- Start using the DL workload for AI development.
Install Prometheus and Grafana
You can visualize and monitor the vGPU metrics from the DCGM Exporter virtual machine on a virtual machine running Prometheus and Grafana.
- Create a visualization VM with Docker Community Engine installed.
- Connect to the VM over SSH and create a YAML file for Prometheus.
$ cat > prometheus.yml << EOF global: scrape_interval: 15s external_labels: monitor: 'codelab-monitor' scrape_configs: - job_name: 'dcgm' scrape_interval: 5s metrics_path: /metrics static_configs: - targets: [dl_vm_with_dcgm_exporter_ip:9400'] EOF
- Create a data path.
$ mkdir grafana_data prometheus_data && chmod 777 grafana_data prometheus_data
- Create a Docker compose file to install Prometheus and Grafana.
$ cat > compose.yaml << EOF services: prometheus: image: prom/prometheus:v2.47.2 container_name: "prometheus0" restart: always ports: - "9090:9090" volumes: - "./prometheus.yml:/etc/prometheus/prometheus.yml" - "./prometheus_data:/prometheus" grafana: image: grafana/grafana:10.2.0-ubuntu container_name: "grafana0" ports: - "3000:3000" restart: always volumes: - "./grafana_data:/var/lib/grafana" EOF
- Start the Prometheus and Grafana containers.
$ sudo docker compose up -d
View vGPU Metrics in Prometheus
You can access Prometheus at http://visualization-vm-ip:9090. You can view the following vGPU information in the Prometheus UI:
Information | UI Section |
---|---|
Raw vGPU metrics from the deep learning VM | To view the raw vGPU metrics from the deep learning VM, click the endpoint entry. |
Graph expressions |
|
For more information on using Prometheus, see the Prometheus documentation.
Visualize Metrics in Grafana
Set Prometheus as a data source for Grafana and visualize the vGPU metrics from the deep learning VM in a dashboard.
- Access Grafana at http://visualization-vm-ip:3000 by using the default user name admin and password
admin
. - Add Prometheus as the first data source, connecting to visualization-vm-ip on port 9090.
- Create a dashboard with the vGPU metrics.
For more information on configuring a dashboard using a Prometheus data source, see the Grafana documentation.
Triton Inference Server
You can use a deep learning VM with a Triton Inference Server for loading a model repository and receive inference requests.
See the Triton Inference Server page.
Component | Description |
---|---|
Container image | nvcr.io/nvidia/tritonserver-pb24h1:ngc_image_tag For example: nvcr.io/nvidia/tritonserver-pb24h1:24.03.02-py3 For information on the Triton Inference Server container images that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy a Triton Inference Server workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
The model repository for the Triton Inference Server is in /home/vmware/model_repository. Initially, the model repository is empty and the initial log of the Triton Inference Server instance shows that no model is loaded. |
Create a Model Repository
To load your model for model inference, perform these steps:
- Create the model repository for your model.
See the NVIDIA Triton Inference Server Model Repository documentation .
- Copy the model repository to
/home/vmware/model_repository
so that the Triton Inference Server can load it.cp -r path_to_your_created_model_repository/* /home/vmware/model_repository/
Send Model Inference Requests
- Verify that the Triton Inference Server is healthy and models are ready by running this command in the deep learning VM console.
curl -v localhost:8000/v2/simple_sequence
- Send a request to the model by running this command on the deep learning VM.
curl -v localhost:8000/v2/models/simple_sequence
For more information on using the Triton Inference Server, see NVIDIA Triton Inference Server Model Repository documentation.
NVIDIA RAG
You can use a deep learning VM to build Retrieval Augmented Generation (RAG) solutions with an Llama2 model.
See the NVIDIA RAG Applications Docker Compose documentation (requires specific account permissions).
Component | Description |
---|---|
Container images and models | docker-compose-nim-ms.yaml rag-app-multiturn-chatbot/docker-compose.yamlin the NVIDIA sample RAG pipeline. For information on the NVIDIA RAG container applications that are supported for deep learning VMs, see VMware Deep Learning VM Release Notes. |
Required inputs | To deploy an NVIDIA RAG workload, you must set the OVF properties for the deep learning virtual machine in the following way:
|
Output |
|
Assign a Static IP Address to a Deep Learning VM in VMware Private AI Foundation with NVIDIA
By default, the deep learning VM images are configured with DHCP address assignment. If you want to deploy a deep learning VM with a static IP address directly on a vSphere cluster, you must add additional code to the cloud-init section.
Procedure
Example: Assigning a Static IP Address to a CUDA Sample Workload
For an example deep learning VM with a CUDA Sample DL workload:
Deep Learning VM Element | Example Value |
---|---|
DL workload image | nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8 |
IP address | 10.199.118.245 |
Subnet prefix | /25 |
Gateway | 10.199.118.253 |
DNS servers |
|
you provide the following cloud-init code:
I2Nsb3VkLWNvbmZpZwp3cml0ZV9maWxlczoKLSBwYXRoOiAvb3B0L2Rsdm0vZGxfYXBwLnNoCiAgcGVybWlzc2lvbnM6ICcwNzU1JwogIGNvbnRlbnQ6IHwKICAgICMhL2Jpbi9iYXNoCiAgICBkb2NrZXIgcnVuIC1kIG52Y3IuaW8vbnZpZGlhL2s4cy9jdWRhLXNhbXBsZTp2ZWN0b3JhZGQtY3VkYTExLjcuMS11Ymk4CgptYW5hZ2VfZXRjX2hvc3RzOiB0cnVlCiAKd3JpdGVfZmlsZXM6CiAgLSBwYXRoOiAvZXRjL25ldHBsYW4vNTAtY2xvdWQtaW5pdC55YW1sCiAgICBwZXJtaXNzaW9uczogJzA2MDAnCiAgICBjb250ZW50OiB8CiAgICAgIG5ldHdvcms6CiAgICAgICAgdmVyc2lvbjogMgogICAgICAgIHJlbmRlcmVyOiBuZXR3b3JrZAogICAgICAgIGV0aGVybmV0czoKICAgICAgICAgIGVuczMzOgogICAgICAgICAgICBkaGNwNDogZmFsc2UgIyBkaXNhYmxlIERIQ1A0CiAgICAgICAgICAgIGFkZHJlc3NlczogWzEwLjE5OS4xMTguMjQ1LzI1XSAgIyBTZXQgdGhlIHN0YXRpYyBJUCBhZGRyZXNzIGFuZCBtYXNrCiAgICAgICAgICAgIHJvdXRlczoKICAgICAgICAgICAgICAgIC0gdG86IGRlZmF1bHQKICAgICAgICAgICAgICAgICAgdmlhOiAxMC4xOTkuMTE4LjI1MyAjIENvbmZpZ3VyZSBnYXRld2F5CiAgICAgICAgICAgIG5hbWVzZXJ2ZXJzOgogICAgICAgICAgICAgIGFkZHJlc3NlczogWzEwLjE0Mi43LjEsIDEwLjEzMi43LjFdICMgUHJvdmlkZSB0aGUgRE5TIHNlcnZlciBhZGRyZXNzLiBTZXBhcmF0ZSBtdWxpdHBsZSBETlMgc2VydmVyIGFkZHJlc3NlcyB3aXRoIGNvbW1hcy4KIApydW5jbWQ6CiAgLSBuZXRwbGFuIGFwcGx5
which corresponds to the following script in plain-text format:
#cloud-config write_files: - path: /opt/dlvm/dl_app.sh permissions: '0755' content: | #!/bin/bash docker run -d nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8 manage_etc_hosts: true write_files: - path: /etc/netplan/50-cloud-init.yaml permissions: '0600' content: | network: version: 2 renderer: networkd ethernets: ens33: dhcp4: false # disable DHCP4 addresses: [10.199.118.245/25] # Set the static IP address and mask routes: - to: default via: 10.199.118.253 # Configure gateway nameservers: addresses: [10.142.7.1, 10.132.7.1] # Provide the DNS server address. Separate mulitple DNS server addresses with commas. runcmd: - netplan apply
Configure a Deep Learning VM with a Proxy Server
To connect your deep learning VM to the Internet in a disconnected environment where Internet access is over a proxy server, you must provide the proxy server details in the config.json file in the virtual machine.