Preparing VMware Cloud Foundation for Private AI Workload Deployment

As a cloud administrator, you must deploy specific software and configure the target VI workload domains so that data scientists and DevOps engineers can deploy AI workloads on top of VMware Private AI Foundation with NVIDIA.

VMware Components in VMware Private AI Foundation with NVIDIA

The functionality of the VMware Private AI Foundation with NVIDIA solution is available in VMware Cloud Foundation and certain versions of VMware Aria Automation, VMware Aria Operations and VMware Data Services Manager.


VMware Cloud Foundation Version	Versions of VMware Aria Components and VMware Data Services Manager
VMware Cloud Foundation 5.2.1	VMware Aria Automation 8.18.1 VMware Aria Operations 8.18.1 VMware Data Services Manager 2.1
VMware Cloud Foundation 5.2 Note: This documentation is based on VMware Cloud Foundation 5.2.1. For information on the VMware Private AI Foundation with NVIDIA functionality in VMware Cloud Foundation 5.2, see VMware Private AI Foundation with NVIDIA Guide for VMware Cloud Foundation 5.2.	VMware Aria Automation 8.18 VMware Aria Operations 8.18 VMware Data Services Manager 2.1

For information about the VMware Private AI Foundation with NVIDIA architecture and components, see System Architecture of VMware Private AI Foundation with NVIDIA.

Guided Deployment in the vSphere Client

Starting with VMware Cloud Foundation 5.2.1, you can fully set up the VMware Private AI Foundation with NVIDIA components by using the guided deployment UI in the vSphere Client. The guided deployment UI connects to SDDC Manager to perform the requested operations.

To open the guided deployment for VMware Private AI Foundation with NVIDIA, follow these steps:

Log in to the management vCenter Server by using the vSphere Client at https://<management_vcenter_server>/ui as [email protected].
In the vSphere Client side panel, click Private AI Foundation and enter your VMware Private AI Foundation with NVIDIA license.
The license key is assigned to the management vCenter Server as a solution license.
Follow the wizard to complete the set up of VMware Private AI Foundation with NVIDIA according to the deployment workflows below.

Deployment Workflows for VMware Private AI Foundation with NVIDIA

The functionality of VMware Private AI Foundation with NVIDIA is based on a foundational set of components with additional components required to enable the deployment of one of the following AI workload type:

Deep learning VMs in general
AI workloads on a GPU-accelerated TKG cluster in general
RAG workloads as deep learning VMs or applications on GPU-accelerated TKG clusters
The deployment of a RAG workload extends the general approach for deep learning VMs and AI workloads on TKG clusters with the deployment of a pgvector PostgreSQL database and configuring the application with the pgvector database.

In a disconnected environment, you must take additional steps to set up and deploy appliances and provide resources locally, so that your workloads can access them.

Connected Environment


Task	AI Workload Deployment Use Cases	Steps
Review the architecture and requirements for deploying VMware Private AI Foundation with NVIDIA.	All	System Architecture of VMware Private AI Foundation with NVIDIA Requirements for Deploying VMware Private AI Foundation with NVIDIA
Configure a License Service instance on the NVIDIA Licensing Portal and generate a client configuration token.		NVIDIA License System User Guide.
Generate an API key for access to the NVIDIA NGC catalog.		Pulling and Running NVIDIA AI Enterprise Containers
Deploy a GPU-accelerated VI workload domain.		Deploy a GPU-Accelerated VI Workload Domain for VMware Private AI Foundation with NVIDIA
Enable vSphere IaaS control plane (formely known as vSphere with Tanzu).	All Required if data scientists and DevOps engineers will deploy workloads by using self-service catalog items in VMware Aria Automation or by using the kubectl command.	Configure vSphere IaaS Control Plane for VMware Private AI Foundation with NVIDIA
Create a content library for deep learning VM images.	Deploy a deep learning VM	Create a Content Library with Deep Learning VM Images for VMware Private AI Foundation with NVIDIA
Configure vGPU-based VM classes for AI workloads.	All	Configure vGPU-Based VM Classes for AI Workloads for VMware Private AI Foundation with NVIDIA
Configure a vSphere namespace for AI workloads.	All	Configure a vSphere Namespace for GPU-Accelerated Workloads
Deploy VMware Aria Automation by using VMware Aria Suite Lifecycle in VMware Cloud Foundation mode.	All Required if data scientists and DevOps engineers will deploy workloads by using self-service catalog items in VMware Aria Automation.	Private Cloud Automation for VMware Cloud Foundation Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations by using VMware Aria Suite Lifecycle in VMware Cloud Foundation mode.	All	Intelligent Operations Management for VMware Cloud Foundation.
Deploy VMware Data Services Manager	Deploy a RAG workload	Installing and Configuring VMware Data Services Manager You deploy a VMware Data Services Manager instance in the management domain. Install the Data Services Manager Consumption Operator as a Supervisor Service. See vSphere Supervisor Services catalog. Configure VMware Data Services Manager with at least one infrastructure policy. See Creating Infrastructure Policies.
Set up a machine that has access to the Supervisor instance, and has Docker, Helm, and Kubernetes CLI Tools for vSphere.	All Required if the AI workloads will be deployed by directly using the kubectl command.	Install the Kubernetes CLI Tools for vSphere
Starting with VMware Cloud Foundation 5.2.1, set up a Harbor registry service in the Supervisor.	All Required if in the following case: The AI workloads will be deployed on a Supervisor in vSphere IaaS control plane You plan to use a model gallery in Harbor for storing validated ML models.	Setting Up a Private Harbor Registry in VMware Private AI Foundation with NVIDIA

Disconnected Environment


Task	Related AI Workload Deployment Options	Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.	All	System Architecture of VMware Private AI Foundation with NVIDIA Requirements for Deploying VMware Private AI Foundation with NVIDIA
Deploy an NVIDIA Delegated License Service Instance.		Installing and Configuring the DLS Virtual Appliance You can deploy the virtual appliance in the same workload domain as the AI workloads or in the management domain.
Register an NVIDIA DLS instance on the NVIDIA Licensing Portal, and bind and install a license server on it. Generate a client configuration token.		Configuring a Service Instance Managing Licenses on a License Server.
Deploy a GPU-accelerated VI workload domain.		Deploy a GPU-Accelerated VI Workload Domain for VMware Private AI Foundation with NVIDIA
Enable vSphere IaaS control plane (formely known as vSphere with Tanzu).	All	Configure vSphere IaaS Control Plane for VMware Private AI Foundation with NVIDIA
Create a content library for deep learning VM images.	Deploy a deep learning VM	Create a Content Library with Deep Learning VM Images for VMware Private AI Foundation with NVIDIA
Configure a vSphere namespace for AI workloads.	All	Configure a vSphere Namespace for GPU-Accelerated Workloads
Set up a machine that has access to the Internet and has Docker and Helm installed. Set up a machine that has access to vCenter Server for the VI workload domain, the Supervisor instance, and the local container registry. The machine must have Docker, Helm, and Kubernetes CLI Tools for vSphere.	All	Deploying a Bastion Host Install the Kubernetes CLI Tools for vSphere
Set up a Harbor registry service in the Supervisor.	All Required if the AI workloads will be deployed on a Supervisor in vSphere IaaS control plane. In an environment without vSphere IaaS control plane, for pulling container images on a deep learning VM running directly on a vSphere cluster, you must configure a registry from another vendor.	Setting Up a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Configure a content library for Tanzu Kubernetes releases (TKr) for Ubuntu	Deploy a RAG workload on a GPU-accelerated TKG cluster Deploy AI workloads on a GPU-accelerated TKG cluster	Configure a Content Library with Ubuntu TKr for a Disconnected VMware Private AI Foundation with NVIDIA Environment
Upload the components of the NVIDIA operators to the environment.	Deploy a RAG workload on a GPU-accelerated TKG cluster Deploy AI workloads on a GPU-accelerated TKG cluster	Upload the NVIDIA GPU Operator Components to a Disconnected Environment
Provide a location to download the vGPU guest drivers from.	Deploy a deep learning VM	Upload to a local Web server the required vGPU guest driver versions, downloaded from the NVIDIA Licensing Portal, and an index in one of the following formats: An index .txt file with a list of the .run or .zip files of the vGPU guest drivers. `host-driver-version-1` `guest-driver-download-URL-1` `host-driver-version-2` `guest-driver-download-URL-2` `host-driver-version-3` `guest-driver-download-URL-3` A directory index in the format generated by Web servers, such as NGINX and Apache HTTP Server. The version-specific vGPU driver files must provided as .zip files.
Upload the NVIDIA NGC container images to a private container registry, such as the Harbor Registry service of the Supervisor.	All In an environment without vSphere IaaS control plane, for pulling container images on a deep learning VM running directly on a vSphere cluster, you must configure a registry from another vendor.	Upload AI Container Images to a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation by using VMware Aria Suite Lifecycle in VMware Cloud Foundation mode.	All Required if data scientists and DevOps engineers will deploy workloads by using self-service catalog items in VMware Aria Automation.	Private Cloud Automation for VMware Cloud Foundation Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations by using VMware Aria Suite Lifecycle in VMware Cloud Foundation mode.	All	Intelligent Operations Management for VMware Cloud Foundation
Deploy VMware Data Services Manager	Deploy a RAG workload	Installing and Configuring VMware Data Services Manager You can also use the guided deployment UI in the vSphere Client to deploy a VMware Data Services Manager instance in the management domain. Install the Data Services Manager Consumption Operator as a Supervisor Service. See vSphere Supervisor Services catalog. Configure VMware Data Services Manager with at least one infrastructure policy. See Creating Infrastructure Policies.