Deploying VMware Private AI Foundation with NVIDIA

As a cloud administrator, you must deploy specific software and configure the target VI workload domains so that data scientists and DevOps engineers can deploy AI workloads on top of VMware Private AI Foundation with NVIDIA.

VMware Components in VMware Private AI Foundation with NVIDIA

The functionality of the VMware Private AI Foundation with NVIDIA solution is available across several software components.

VMware Cloud Foundation 5.1.1
VMware Aria Automation 8.16.2 and VMware Aria Automation 8.17
VMware Aria Operations 8.16 and VMware Aria Operations 8.17.1
VMware Data Services Manager 2.0.x

For information about the VMware Private AI Foundation with NVIDIA architecture and components, see What is VMware Private AI Foundation with NVIDIA?.

Deployment Workflows for VMware Private AI Foundation with NVIDIA

In a disconnected environment, you must take additional steps to set up and deploy appliances and provide resources locally, so that your workloads can access them.

Connected Environment


Task	Related AI Workload Deployment Options	Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.	Deploy a deep learning VM Deploy AI workloads on a GPU-accelerated TKG cluster Deploy a RAG workload	Requirements for Deploying VMware Private AI Foundation with NVIDIA
Configure a License Service instance on the NVIDIA Licensing Portal and generate a client configuration token.	Deploy a deep learning VM Deploy AI workloads on a GPU-accelerated TKG cluster Deploy a RAG workload	NVIDIA License System User Guide.
Generate an API key for access to the NVIDIA NGC catalog.	Deploy a deep learning VM Deploy AI workloads on a GPU-accelerated TKG cluster Deploy a RAG workload	Pulling and Running NVIDIA AI Enterprise Containers
If you plan to deploy deep learning VMs or TKG cluster directly on a Supervisor in vSphere with Tanzu, set up a machine that has access to the Supervisor instance, and has Docker, Helm, and Kubernetes CLI Tools for vSphere.	Deploy a deep learning VM directly by using kubectl Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl Deploy a RAG workload Deploy a deep learning VM with a RAG workload by using kubectl Deploy a RAG Workload on a TKG cluster	Install the Kubernetes CLI Tools for vSphere
Enable vSphere with Tanzu.	Deploy a deep learning VM directly by using kubectl Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl Deploy a RAG workload Deploy a deep learning VM with a RAG workload by using kubectl Deploy a RAG Workload on a TKG cluster	Configure vSphere with Tanzu for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation.	Deploy a deep learning VM directly by using a self-service catalog item Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using a self-service catalog item Deploy a RAG workload Deploy a deep learning VM with a RAG workload by using a self-service catalog item Deploy a RAG Workload on a TKG cluster that is provisioned by using a self-service catalog item	Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations.	Monitor GPU metrics at the cluster, host system and host properties with the option to add these metrics to custom dashboards.	For VMware Aria Operations 8.16, follow Intelligent Operations Management for VMware Cloud Foundation. If you want to use the extended GPU monitoring features in VMware Aria Operations 8.17.1, perform the following steps: Apply the product support packs for VMware Aria Operations 8.17.1 to VMware Aria Suite Lifecycle 8.16. See VMware Aria Suite Lifecycle 8.16 Product Support Pack Release Notes. Deploy VMware Aria Operations according to Intelligent Operations Management for VMware Cloud Foundation
Deploy VMware Data Services Manager	Deploy a RAG workload	Installing and Configuring VMware Data Services Manager You deploy a VMware Data Services Manager instance in the VI workload domain with the AI workloads. To be able to provision a PostgreSQL database with pgvector extension by using a self-service catalog item in VMware Aria Automation, deploy VMware Data Services Manager 2.0.2.

Disconnected Environment


Task	Related AI Workload Deployment Options	Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.	Deploy a deep learning VM Deploy AI workloads on a GPU-accelerated TKG cluster Deploy a RAG workload	Requirements for Deploying VMware Private AI Foundation with NVIDIA
Deploy an NVIDIA Delegated License Service Instance.	Deploy a deep learning VM Deploy AI workloads on a GPU-accelerated TKG cluster Deploy a RAG workload	Installing and Configuring the DLS Virtual Appliance You can deploy the virtual appliance in the same workload domain as the AI workloads or in the management domain.
Register an NVIDIA DLS instance on the NVIDIA Licensing Portal, and bind and install a license server on it. Generate a client configuration token.	Deploy a deep learning VM Deploy AI workloads on a GPU-accelerated TKG cluster Deploy a RAG workload	Configuring a Service Instance Managing Licenses on a License Server.
Enable vSphere with Tanzu	Deploy a deep learning VM directly by using kubectl Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl Deploy a RAG workload Deploy a deep learning VM with a RAG workload by using kubectl Deploy a RAG Workload on a TKG cluster	Configure vSphere with Tanzu for VMware Private AI Foundation with NVIDIA
Set up a Harbor registry service in the Supervisor.	Deploy a deep learning VM Deploy a deep learning VM directly by using kubectl Deploy a deep learning VM directly by using a self-service catalog item Deploy AI workloads on a GPU-accelerated TKG cluster Deploy a RAG workload Deploy a deep learning VM with a RAG workload by using kubectl Deploy a deep learning VM directly by using a self-service catalog item Deploy a RAG Workload on a TKG cluster	Setting Up a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Provide a location to download the vGPU guest drivers from.	Deploy a deep learning VM	Upload to a local Web server the required vGPU guest driver versions and an index in one of the following formats: An index file with a list of the .run files of the vGPU guest drivers. `host-driver-version-1` `guest-driver-download-URL-1` `host-driver-version-2` `guest-driver-download-URL-2` `host-driver-version-3` `guest-driver-download-URL-3` A directory index in the format generated by Web servers, such as NGINX and Apache HTTP Server.
Upload the NVIDIA NGC container images to a private container registry, such as the Harbor Registry service of the Supervisor.	Deploy a deep learning VM Deploy a deep learning VM directly by using kubectl Deploy a deep learning VM directly by using a self-service catalog item Deploy AI workloads on a GPU-accelerated TKG cluster Deploy a RAG workload Deploy a deep learning VM with a RAG workload by using kubectl Deploy a deep learning VM directly by using a self-service catalog item Deploy a RAG Workload on a TKG cluster	Upload AI Container Images to a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation	Deploy a deep learning VM directly by using a self-service catalog item Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using a self-service catalog item Deploy a RAG workload Deploy a deep learning VM directly by using a self-service catalog item Deploy a RAG Workload on a TKG cluster that is provisioned by using a self-service catalog item	Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations	Monitor GPU metrics at the cluster, host system and host properties with the option to add these metrics to custom dashboards.	For VMware Aria Operations 8.16, follow Intelligent Operations Management for VMware Cloud Foundation. If you want to use the extended GPU monitoring features in VMware Aria Operations 8.17.1, perform the following steps: Apply the product support packs for VMware Aria Operations 8.17.1 to VMware Aria Suite Lifecycle 8.16. See VMware Aria Suite Lifecycle 8.16 Product Support Pack Release Notes. Deploy VMware Aria Operations according to Intelligent Operations Management for VMware Cloud Foundation
Deploy VMware Data Services Manager	Deploy a RAG workload	Installing and Configuring VMware Data Services Manager You deploy a VMware Data Services Manager instance in the VI workload domain with the AI workloads. To be able to provision a PostgreSQL database with pgvector extension by using a self-service catalog item in VMware Aria Automation, deploy VMware Data Services Manager 2.0.2.
Set up a machine that has access to the Internet and has Docker and Helm installed. Set up a machine that has access to vCenter Server for the VI workload domain, the Supervisor instance, and the local container registry. The machine must have Docker, Helm, and Kubernetes CLI Tools for vSphere.	Deploy a deep learning VM Deploy a GPU-accelerated TKG cluster Deploy a RAG workload	Install the Kubernetes CLI Tools for vSphere Installing VMware vSphere with VMware Tanzu (Air-gapped)