The VMware Deep Learning VM images are delivered as part of VMware Private AI Foundation with NVIDIA. They are preconfigured with popular DL workloads, and are optimized and validated by NVIDIA and VMware for GPU acceleration in a VMware Cloud Foundation environment.

VMware Deep Learning VM 1.2 | 09 OCT 2024

Check for additions and updates to these release notes.

Content Library

Deep learning VM images are delivered as vSphere VM templates, hosted and published by VMware in a content library. You can use these images to deploy Deep learning VM by using the vSphere Client or VMware Aria Automation.

The content library with deep learning VM images for VMware Private AI Foundation with NVIDIA is available at the https://packages.vmware.com/dl-vm/lib.json URL. In a connected environment, you create a subscribed content library connected to this URL, and in a disconnected environment - a local content library where you upload images from the central content library.

Compatibility and Upgrade

Use the latest release of VMware Deep Learning VM if supported by your environment.

Updating a running deep learning VM to a later image is not supported. You must deploy a new deep learning virtual machine by using a later deep learning VM image release.

Installation

You deploy a deep learning VM image from a content library on the vCenter Server instance for the AI-ready VI workload domain. You can deploy a deep learning VM on the following systems:

  • As a data scientists, MLOps engineer, or DevOps engineer

    • On a Supervisor in vSphere IaaS control plane by using the VMware Aria Automation.

  • As a cloud administrator

    • Directly on a vSphere cluster

  • As a DevOps engineer

    • On a Supervisor in vSphere IaaS control plane by using the kubectl.

See Deploying a Deep Learning VM in VMware Private AI Foundation with NVIDIA.

VMware Deep Learning VM 1.2

Image Snapshot

VMware Deep Learning VM 1.2 is available for use with VMware Cloud Foundation 5.2.1.

Snapshot

Release Date

Compatible VMware Cloud Foundation Version

common-container-nv-vgpu-ubuntu-2204-v20240814

09 OCT 2024

VMware Cloud Foundation 5.2.1

What's New

  • The deep learning VM image includes the Broadcom EULA and VMware Private AI Foundation with NVIDIA SPD (specific program documentation).

  • The embedded Miniconda 24.3.0 component is updated to Miniforge3 24.3.0.

  • In addition to pytorch2.3.0_py3.12, by using the Conda Environment Install OVF parameter, you can also have the pytorch1.13.1_py3.10, tf2.16.1_py3.12, and tf1.15.5_py3.7 Conda environments installed during VM deployment.

  • Private AI Services (pais) CLI version 1.0.0 for storing ML models in a central Harbor registry is now available.

  • In a connected environment, downloading the vGPU guest driver now requires only NVIDIA AI Enterprise entitlement.

  • In a connected environment, error messages that appear while downloading the vGPU guest driver are improved.

Supported NVIDIA GPU Devices

VMware Deep Learning VM 1.2 supports the following GPUs on your ESXi hosts:

NVIDIA Component

Supported Option

NVIDIA GPUs

  • NVIDIA A100

  • NVIDIA L40S

  • NVIDIA H100

GPU sharing mode

  • Time slicing

  • Multi-Instance GPU

Components of VMware Deep Learning VM 1.2

This version of the deep learning virtual machine image contains the following software:

Software Component Category

Software Component

Version

Embedded

Canonical Ubuntu

22.04

NVIDIA Container Toolkit

1.15.0

Docker Community Engine

26.0.2

Miniforge

24.3.0-0 (Python 3.10)

VMware Private AI Services (pais) CLI

1.0.0

Can be pre-installed automatically

NVIDIA vGPU guest driver

According to the version of the NVIDIA vGPU host driver

PyTorch Conda Environment

2.3.0 (Python 3.12), 1.13.1 (Python 3.10)

TensorFlow Conda Environment

2.16.1 (Python 3.12), 1.15.5 (Python 3.7)

Deep learning (DL) workload from NVIDIA NGC

CUDA Sample

-

PyTorch

-

TensorFlow

-

DCGM Exporter

-

Triton Inference Server

-

NVIDIA RAG

-

Resolved Issues

  • Containers deployed by using cloud-init are running as root.

  • When the deep learning VM is restarted, only log information from the most recent boot is visible in /var/log/dl.log. DL workload log information from earlier boots is overwritten.

  • Installation of Conda environments fails if the password OVF parameter is set.

VMware Deep Learning VM 1.1

Image Snapshot

VMware Deep Learning VM 1.1 is available for use with VMware Cloud Foundation 5.2.

Snapshot

Release Date

Compatible VMware Cloud Foundation Version

common-container-nv-vgpu-ubuntu-2204-v20240613

23 JUL 2024

VMware Cloud Foundation 5.2

What's New

  • The deep learning VM image now contains a built-in Miniconda installation.

  • The deep learning VM image now contains a verified PyTorch Conda environment manifest.

  • You can use the Conda Environment Install OVF parameter to specify a comma-separated list of Conda environments to automatically install during VM deployment. Currently, you can install a pytorch2.3_py3.12 environment.

  • More detailed logs on the initialization script are available in /var/log/vgpu-install.log.

Supported NVIDIA GPU Devices

VMware Deep Learning VM 1.1 supports the following GPUs on your ESXi hosts:

NVIDIA Component

Supported Option

NVIDIA GPUs

  • NVIDIA A100

  • NVIDIA L40S

  • NVIDIA H100

GPU sharing mode

  • Time slicing

  • Multi-Instance GPU

Components of VMware Deep Learning VM 1.1

This version of the deep learning virtual machine image contains the following software:

Software Component Category

Software Component

Version

Embedded

Canonical Ubuntu

22.04

NVIDIA Container Toolkit

1.15.0

Docker Community Engine

26.0.2

Miniconda

24.3.0-0 (Python 3.12)

Can be pre-installed automatically

NVIDIA vGPU guest driver

According to the version of the NVIDIA vGPU host driver

PyTorch Conda Environment

2.3.0 (Python 3.12)

Deep learning (DL) workload from NVIDIA NGC

CUDA Sample

-

PyTorch

-

TensorFlow

-

DCGM Exporter

-

Triton Inference Server

-

NVIDIA RAG

-

Resolved Issues

  • Earlier versions of the NVIDIA vGPU driver are not downloaded from the NVIDIA License Portal.

  • The GuestBootstrap status is shown incorrectly in certain cases.

  • NVIDIA vGPU driver download might fail because of network issues.

  • The authorized_keys SSH file, used during the image build process, is available in the ~/.ssh/ directory.

VMware Deep Learning VM 1.0.1

Image Snapshot

VMware Deep Learning VM 1.0.1 is available for use with VMware Cloud Foundation 5.1.1.

Snapshot

Release Date

Compatible VMware Cloud Foundation Version

common-container-nv-vgpu-ubuntu-2204-v20240419

06 MAY 2024

VMware Cloud Foundation 5.1.1

What's New

  • The versions of the NVIDIA Container Toolkit and Docker Community Engine are updated.

  • The description of the OVF properties shown when deploying a deep learning VM by using the OVF deployment wizard is improved.

  • The download URL format for vGPU guest drivers for disconnected environments now supports directory index listings, as generated by Web servers, such as NGINX or Apache HTTP Server.

  • A link to documentation of the VMware deep learning VM appears as a "message of the day" in the Ubuntu operating system.

Supported NVIDIA GPU Devices

VMware Deep Learning VM 1.0.1 supports the following GPUs on your ESXi hosts:

NVIDIA Component

Supported Option

NVIDIA GPUs

  • NVIDIA A100

  • NVIDIA L40S

  • NVIDIA H100

GPU sharing mode

  • Time slicing

  • Multi-Instance GPU

Components of VMware Deep Learning VM 1.0.1

This version of the deep learning virtual machine image contains the following software:

Software Component Category

Software Component

Version

Embedded

Canonical Ubuntu

22.04

NVIDIA Container Toolkit

1.15.0

Docker Community Engine

26.0.2

Can be pre-installed automatically

NVIDIA vGPU guest driver

According to the version of the NVIDIA vGPU host driver

Deep learning (DL) workload from NVIDIA NGC

CUDA Sample

-

PyTorch

-

TensorFlow

-

DCGM Exporter

-

Triton Inference Server

-

NVIDIA RAG

-

Resolved Issues

  • Unable to log in to a Docker private container registry if the registry password set in the OVF properties of the deep learning VM contains special characters, such as & < > " ' .

  • The OVF properties for a secondary container registry are not processed.

  • Running apt update fails because of errors and security warnings.

  • The execution status of the get-vgpu-driver.sh script, run at VM startup, is not reflected in the guestinfo.vmservice.bootstrap.condition setting of VM Tools.

VMware Deep Learning VM 1.0

Image Snapshot

VMware Deep Learning VM 1.0 is available for use with VMware Cloud Foundation 5.1.1.

Snapshot

Release Date

Compatible VMware Cloud Foundation Version

common-container-nv-vgpu-ubuntu-2204-v20240217

26 MAR 2024

VMware Cloud Foundation 5.1.1

Supported NVIDIA GPU Devices

VMware Deep Learning VM 1.0 supports the following GPUs on your ESXi hosts:

NVIDIA Component

Supported Option

NVIDIA GPUs

  • NVIDIA A100

  • NVIDIA L40S

  • NVIDIA H100

GPU sharing mode

  • Time slicing

  • Multi-Instance GPU

Components of VMware Deep Learning VM 1.0

The initial version of the deep learning virtual machine image contains the following software:

Software Component Category

Software Component

Version

Embedded

Canonical Ubuntu

22.04

NVIDIA Container Toolkit

1.13.5

Docker Community Engine

25.03

Can be pre-installed automatically

NVIDIA vGPU guest driver

According to the version of the NVIDIA vGPU host driver

Deep learning (DL) workload from NVIDIA NGC

CUDA Sample

-

PyTorch

-

TensorFlow

-

DCGM Exporter

-

Triton Inference Server

-

NVIDIA RAG

-

License Information

VMware Deep Learning VM releases are available under a VMware Private AI Foundation with NVIDIA license. See VMware Private AI Foundation with NVIDIA Guide.

Documentation

Examine the VMware Private AI Foundation with NVIDIA Guide for an overview and how-to instructions for running deep learning VMs in a VMware Cloud Foundation environment.

check-circle-line exclamation-circle-line close-line
Scroll to top icon