As a cloud administrator, you must deploy specific software and configure the target VI workload domains so that data scientists and DevOps engineers can deploy AI workloads on top of VMware Private AI Foundation with NVIDIA.

VMware Components in VMware Private AI Foundation with NVIDIA

The functionality of the VMware Private AI Foundation with NVIDIA solution is available across several software components.

  • VMware Cloud Foundation 5.2
  • VMware Aria Automation 8.18 and VMware Aria Automation 8.18
  • VMware Aria Operations 8.18 and VMware Aria Operations 8.18
  • VMware Data Services Manager 2.1

For information about the VMware Private AI Foundation with NVIDIA architecture and components, see System Architecture of VMware Private AI Foundation with NVIDIA.

Deployment Workflows for VMware Private AI Foundation with NVIDIA

In a disconnected environment, you must take additional steps to set up and deploy appliances and provide resources locally, so that your workloads can access them.

Connected Environment
Task Related AI Workload Deployment Options Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Requirements for Deploying VMware Private AI Foundation with NVIDIA
Configure a License Service instance on the NVIDIA Licensing Portal and generate a client configuration token.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
NVIDIA License System User Guide.
Generate an API key for access to the NVIDIA NGC catalog.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Pulling and Running NVIDIA AI Enterprise Containers
Enable vSphere IaaS control plane.
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Configure vSphere IaaS Control Plane for VMware Private AI Foundation with NVIDIA
If you plan to deploy deep learning VMs or TKG cluster directly on a Supervisor in vSphere IaaS control plane, set up a machine that has access to the Supervisor instance, and has Docker, Helm, and Kubernetes CLI Tools for vSphere.
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Install the Kubernetes CLI Tools for vSphere
Create a content library for deep learning VM images. Deploy a deep learning VM Create a Content Library with Deep Learning VM Images for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation by using VMware Aria Suite Lifecycle in VMware Cloud Foundation mode.
  • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using a self-service catalog item
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster that is provisioned by using a self-service catalog item
  1. Private Cloud Automation for VMware Cloud Foundation
  2. Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations by using VMware Aria Suite Lifecycle in VMware Cloud Foundation mode. Monitor GPU metrics at the cluster, host system and host properties with the option to add these metrics to custom dashboards. Intelligent Operations Management for VMware Cloud Foundation.
Deploy VMware Data Services Manager
  • Deploy a RAG workload
  1. Installing and Configuring VMware Data Services Manager

    You deploy a VMware Data Services Manager instance in the management domain .

  2. Create a Vector Database Catalog Item in VMware Aria Automation
Disconnected Environment
Task Related AI Workload Deployment Options Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Requirements for Deploying VMware Private AI Foundation with NVIDIA
Deploy an NVIDIA Delegated License Service Instance.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Installing and Configuring the DLS Virtual Appliance

You can deploy the virtual appliance in the same workload domain as the AI workloads or in the management domain.

  1. Register an NVIDIA DLS instance on the NVIDIA Licensing Portal, and bind and install a license server on it.
  2. Generate a client configuration token.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Enable vSphere IaaS control plane (formely known as vSphere with Tanzu)
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Configure vSphere IaaS Control Plane for VMware Private AI Foundation with NVIDIA
  • Set up a machine that has access to the Internet and has Docker and Helm installed.
  • Set up a machine that has access to vCenter Server for the VI workload domain, the Supervisor instance, and the local container registry.

    The machine must have Docker, Helm, and Kubernetes CLI Tools for vSphere.

  • Deploy a deep learning VM
  • Deploy a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Create a content library for deep learning VM images Deploy a deep learning VM Create a Content Library with Deep Learning VM Images for VMware Private AI Foundation with NVIDIA
Set up a Harbor registry service in the Supervisor.
  • Deploy a deep learning VM
    • Deploy a deep learning VM directly by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster
Setting Up a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Upload the components of the NVIDIA operators to the environment.
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster
Upload the NVIDIA GPU Operator Components to a Disconnected Environment
Provide a location to download the vGPU guest drivers from. Deploy a deep learning VM Upload to a local Web server the required vGPU guest driver versions, downloaded from the NVIDIA Licensing Portal, and an index in one of the following formats:
  • An index .txt file with a list of the .run or .zip files of the vGPU guest drivers.
    host-driver-version-1 guest-driver-download-URL-1
    host-driver-version-2 guest-driver-download-URL-2
    host-driver-version-3 guest-driver-download-URL-3
  • A directory index in the format generated by Web servers, such as NGINX and Apache HTTP Server. The version-specific vGPU driver files must provided as .zip files.
Upload the NVIDIA NGC container images to a private container registry, such as the Harbor Registry service of the Supervisor.
  • Deploy a deep learning VM
    • Deploy a deep learning VM directly by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster
Upload AI Container Images to a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation by using VMware Aria Suite Lifecycle in VMware Cloud Foundation mode.
  • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using a self-service catalog item
  • Deploy a RAG workload
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster that is provisioned by using a self-service catalog item
  1. Private Cloud Automation for VMware Cloud Foundation
  2. Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations by using VMware Aria Suite Lifecycle in VMware Cloud Foundation mode. Monitor GPU metrics at the cluster, host system and host properties with the option to add these metrics to custom dashboards. Intelligent Operations Management for VMware Cloud Foundation
Deploy VMware Data Services Manager
  • Deploy a RAG workload
  1. Installing and Configuring VMware Data Services Manager

    You deploy a VMware Data Services Manager instance in the management domain .

  2. Create a Vector Database Catalog Item in VMware Aria Automation