The deep learning virtual machine images delivered as part of VMware Private AI Foundation with NVIDIA are preconfigured with popular ML libraries, frameworks, and toolkits, and are optimized and validated by NVIDIA and VMware for GPU acceleration in a VMware Cloud Foundation environment.
Data scientists can use the deep learning virtual machines provisioned from these images for AI prototyping, fine tuning, validation, and inference.
The software stack for running AI applications on top of NVIDIA GPUs is validated in advance. As a result, you directly start AI developing, without spending time installing and validating the compatibility of operating systems, software libraries, ML frameworks, toolkits, and GPU drivers.
What Does a Deep Learning VM Image Contain?
The initial deep learning virtual machine image contains the following software. For information on the component versions in each deep learning VM image release, see VMware Deep Learning VM Release Notes.
Software Component Category | Software Component |
---|---|
Embedded |
|
Can be pre-installed automatically |
|
Content Library for Deep Learning VM Images
Deep learning VM images are delivered as vSphere VM templates, hosted and published by VMware in a content library. You can use these images to deploy Deep learning VM by using the vSphere Client or VMware Aria Automation.
The content library with deep learning VM images for VMware Private AI Foundation with NVIDIA is available at the https://packages.vmware.com/dl-vm/lib.json URL. In a connected environment, you create a subscribed content library connected to this URL, and in a disconnected environment - a local content library where you upload images downloaded from the central URL.
OVF Properties of Deep Learning VMs
When you deploy a deep learning VM, you must fill in custom VM properties to automate the configuration of the Linux operating system, the deployment of the vGPU guest driver, and the deployment and configuration of NGC containers for the DL workloads.
Category | Parameter | Description |
---|---|---|
Base OS Properties | instance-id | Required. A unique instance ID for the VM instance. An instance ID uniquely identifies an instance. When an instance ID changes, cloud-init treats the instance as a new instance and runs the cloud-init process to again. |
hostname | Required. The host name of the appliance. | |
public-keys | If provided, the instance populates the default user's SSH authorized_keys with this value. |
|
user-data | A set of scripts or other metadata that is inserted into the VM at provisioning time. This property is the actual the cloud-init script. This value must be base64 encoded.
|
|
password | Required. The password for the default vmware user account. | |
vGPU Driver Installation |
vgpu-license | Required. The NVIDIA vGPU client configuration token. The token is saved in the /etc/nvidia/ClientConfigToken/client_configuration_token.tok file. |
nvidia-portal-api-key | Required in a connected environment. The API key you downloaded from the NVIDIA Licensing Portal. The key is required for vGPU guest driver installation. |
|
vgpu-fallback-version | The version of the vGPU guest driver to fall back to if the version of the vGPU guest driver cannot be determined by using the entered license API key. | |
vgpu-url | Required in a disconnected environment. The URL to download the vGPU guest driver from. |
|
DL Workload Automation | registry-uri | Required in a disconnected environment or if you plan to use a private container registry to avoid downloading images from the Internet. The URI of a private container registry with the deep learning workload container images. Required if you are referring to a private registry in |
registry-user | Required if you are using a private container registry that requires basic authentication. | |
registry-passwd | Required if you are using a private container registry that requires basic authentication. | |
registry-2-uri | Required if you are using a second private container registry that is based on Docker and required basic authentication. | |
registry-2-user | Required if you are using a second private container registry. | |
registry-2-passwd | Required if you are using a second private container registry. | |
image-oneliner |
A one-line bash command that is run at VM provisioning. This value must be base64 encoded. You can use this property to specify the DL workload container you want to deploy, such as PyTorch or TensorFlow. See Deep Learning Workloads in VMware Private AI Foundation with NVIDIA.
Note: If both
user-data and
image-oneliner are provided, the value of
user-data is used.
|
|
docker-compose-uri |
URI of the Docker compose file. Required if you need a Docker compose file to start the DL workload container. This value must be base64 encoded. | |
config-json |
Configuration file for multiple container registry login operations when using a Docker compose file. This value must be base64 encoded. |
Assign a Static IP Address to a Deep Learning VM in VMware Private AI Foundation with NVIDIA
By default, the deep learning VM images are configured with DHCP address assignment. If you want to deploy a deep learning VM with a static IP address directly on a vSphere cluster, you must add additional code to the cloud-init section.
Procedure
Example: Assigning a Static IP Address to a CUDA Sample Workload
For an example deep learning VM with a CUDA Sample DL workload:
Deep Learning VM Element | Example Value |
---|---|
DL workload image | nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8 |
IP address | 10.199.118.245 |
Subnet prefix | /25 |
Gateway | 10.199.118.253 |
DNS servers |
|
you provide the following cloud-init code:
I2Nsb3VkLWNvbmZpZwp3cml0ZV9maWxlczoKLSBwYXRoOiAvb3B0L2Rsdm0vZGxfYXBwLnNoCiAgcGVybWlzc2lvbnM6ICcwNzU1JwogIGNvbnRlbnQ6IHwKICAgICMhL2Jpbi9iYXNoCiAgICBkb2NrZXIgcnVuIC1kIG52Y3IuaW8vbnZpZGlhL2s4cy9jdWRhLXNhbXBsZTp2ZWN0b3JhZGQtY3VkYTExLjcuMS11Ymk4CgptYW5hZ2VfZXRjX2hvc3RzOiB0cnVlCiAKd3JpdGVfZmlsZXM6CiAgLSBwYXRoOiAvZXRjL25ldHBsYW4vNTAtY2xvdWQtaW5pdC55YW1sCiAgICBwZXJtaXNzaW9uczogJzA2MDAnCiAgICBjb250ZW50OiB8CiAgICAgIG5ldHdvcms6CiAgICAgICAgdmVyc2lvbjogMgogICAgICAgIHJlbmRlcmVyOiBuZXR3b3JrZAogICAgICAgIGV0aGVybmV0czoKICAgICAgICAgIGVuczMzOgogICAgICAgICAgICBkaGNwNDogZmFsc2UgIyBkaXNhYmxlIERIQ1A0CiAgICAgICAgICAgIGFkZHJlc3NlczogWzEwLjE5OS4xMTguMjQ1LzI1XSAgIyBTZXQgdGhlIHN0YXRpYyBJUCBhZGRyZXNzIGFuZCBtYXNrCiAgICAgICAgICAgIHJvdXRlczoKICAgICAgICAgICAgICAgIC0gdG86IGRlZmF1bHQKICAgICAgICAgICAgICAgICAgdmlhOiAxMC4xOTkuMTE4LjI1MyAjIENvbmZpZ3VyZSBnYXRld2F5CiAgICAgICAgICAgIG5hbWVzZXJ2ZXJzOgogICAgICAgICAgICAgIGFkZHJlc3NlczogWzEwLjE0Mi43LjEsIDEwLjEzMi43LjFdICMgUHJvdmlkZSB0aGUgRE5TIHNlcnZlciBhZGRyZXNzLiBTZXBhcmF0ZSBtdWxpdHBsZSBETlMgc2VydmVyIGFkZHJlc3NlcyB3aXRoIGNvbW1hcy4KIApydW5jbWQ6CiAgLSBuZXRwbGFuIGFwcGx5
which corresponds to the following script in plain-text format:
#cloud-config write_files: - path: /opt/dlvm/dl_app.sh permissions: '0755' content: | #!/bin/bash docker run -d nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubi8 manage_etc_hosts: true write_files: - path: /etc/netplan/50-cloud-init.yaml permissions: '0600' content: | network: version: 2 renderer: networkd ethernets: ens33: dhcp4: false # disable DHCP4 addresses: [10.199.118.245/25] # Set the static IP address and mask routes: - to: default via: 10.199.118.253 # Configure gateway nameservers: addresses: [10.142.7.1, 10.132.7.1] # Provide the DNS server address. Separate mulitple DNS server addresses with commas. runcmd: - netplan apply