Deploy a Deep Learning VM by Using a Self-Service Catalog in VMware Private AI Foundation with NVIDIA

Data scientists, DevOps engineers and developers can use VMware Aria Automation to provision deep learning virtual machines on the Supervisor instance in a VI workload domain.

The workflow for deploying a deep learning VM has two parts:

As a cloud administrator, you add self-service catalog items for private AI to Automation Service Broker.
As a data scientist or DevOps engineer, you use an AI workstation catalog item to deploy a deep learning VM on a new namespace on the Supervisor.

Create AI Self-Service Catalog Items in VMware Aria Automation

As a cloud administrator, you can use the catalog setup wizard for private AI in VMware Aria Automation to quickly add catalog items for deploying deep learning virtual machines or GPU-accelerated TKG clusters in a VI workload domain.

Data scientists can use deep learning catalog items for deployment of deep learning VMs. DevOps engineers can use the catalog items for provisioning AI-ready TKG clusters. Every time you run it, the catalog setup wizard for private AI adds two catalog items to the Service Broker catalog - one for a deep learning virtual machine and one for a TKG cluster.

Every time you run it, the catalog setup wizard for private AI adds two catalog items to the Service Broker catalog - one for a deep learning virtual machine and one for a TKG cluster. You can run the wizard every time you need the following:

Enable provisioning of AI workloads on another supervisor.
Accommodate a change in your NVIDIA AI Enterprise license, including the client configuration .tok file and license server, or the download URL for the vGPU guest drivers for a disconnected environment.
Accommodate a deep learning VM image change.
Use other vGPU or non-GPU VM classes, storage policy, or container registry.
Create catalog items in a new project.

Prerequisites

Verify that VMware Private AI Foundation with NVIDIA is available for the VI workload domain.
Verify that the prerequisites for deploying deep learning VMs are in place.
Create a Content Library with Deep Learning VM Images for VMware Private AI Foundation with NVIDIA.

Procedure

Navigate to the VMware Aria Automation home page and click Quickstart.
Run the Private AI Automation Services catalog setup wizard for Private AI Automation.

See Add Private AI items to the Automation Service Broker catalog in the VMware Aria Automation Product Documentation.

Deploy a Deep Learning VM by Using a Self-Service Catalog in VMware Aria Automation

In VMware Private AI Foundation with NVIDIA, as a data scientist or DevOps engineer, you can deploy a deep learning VM from VMware Aria Automation by using an AI workstation self-service catalog items in Automation Service Broker.

Note: VMware Aria Automation creates a namespace every time you provision a deep learning VM.

Procedure

♦ In Automation Service Broker, deploy an AI workstation catalog item on the Supervisor instance in the VI workload domain.
See Deploy a deep learning virtual machine to a VI workload domain.
If you plan to use DCGM Exporter with a DL workload that uses GPU capacity, you can have the DL workload installed at virtual machine startup as a part of the cloud-init process or from the command line after the virtual machine is started. To include the DL workload in the cloud-init process, in the request form of the AI workstation catalog item, add the following configuration in addition to the other details for provisioning the deep learning VM:
1. From the Software Bundle drop-down menu, select DCGM Exporter.
2. Select the Custom cloud-init check box and enter the instructions for running the DL workload.

Results

The vGPU guest driver and the specified deep learning workload is installed the first time you start the deep learning VM.

You can examine the logs or open the JupyterLab instance that comes with some of the images. See Deep Learning Workloads in VMware Private AI Foundation with NVIDIA.

What to do next

For details on how to access the virtual machine and the JupyterLab instance on it, in Automation Service Broker, navigate to Consume > Deployments > Deployments.