As a cloud administrator, you must deploy specific software and configure the target VI workload domains so that data scientists and DevOps engineers can deploy AI workloads on top of VMware Private AI Foundation with NVIDIA.

VMware Components in VMware Private AI Foundation with NVIDIA

The functionality of the VMware Private AI Foundation with NVIDIA solution is available across several software components.

  • VMware Cloud Foundation 5.1.1
  • VMware Aria Automation 8.16.2 and VMware Aria Automation 8.17
  • VMware Aria Operations 8.16 and VMware Aria Operations 8.17.1
  • VMware Data Services Manager 2.0.x

For information about the VMware Private AI Foundation with NVIDIA architecture and components, see What is VMware Private AI Foundation with NVIDIA?.

Deployment Workflows for VMware Private AI Foundation with NVIDIA

In a disconnected environment, you must take additional steps to set up and deploy appliances and provide resources locally, so that your workloads can access them.

Connected Environment
Task Related AI Workload Deployment Options Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Requirements for Deploying VMware Private AI Foundation with NVIDIA
Configure a License Service instance on the NVIDIA Licensing Portal and generate a client configuration token.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
NVIDIA License System User Guide.
Generate an API key for access to the NVIDIA NGC catalog.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Pulling and Running NVIDIA AI Enterprise Containers
If you plan to deploy deep learning VMs or TKG cluster directly on a Supervisor in vSphere with Tanzu, set up a machine that has access to the Supervisor instance, and has Docker, Helm, and Kubernetes CLI Tools for vSphere.
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Install the Kubernetes CLI Tools for vSphere
Enable vSphere with Tanzu.
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Configure vSphere with Tanzu for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation.
  • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using a self-service catalog item
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster that is provisioned by using a self-service catalog item
Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations. Monitor GPU metrics at the cluster, host system and host properties with the option to add these metrics to custom dashboards. For VMware Aria Operations 8.16, follow Intelligent Operations Management for VMware Cloud Foundation.

If you want to use the extended GPU monitoring features in VMware Aria Operations 8.17.1, perform the following steps:

  1. Apply the product support packs for VMware Aria Operations 8.17.1 to VMware Aria Suite Lifecycle 8.16.

    See VMware Aria Suite Lifecycle 8.16 Product Support Pack Release Notes.

  2. Deploy VMware Aria Operations according to Intelligent Operations Management for VMware Cloud Foundation
Deploy VMware Data Services Manager
  • Deploy a RAG workload
Installing and Configuring VMware Data Services Manager

You deploy a VMware Data Services Manager instance in the VI workload domain with the AI workloads.

To be able to provision a PostgreSQL database with pgvector extension by using a self-service catalog item in VMware Aria Automation, deploy VMware Data Services Manager 2.0.2.

Disconnected Environment
Task Related AI Workload Deployment Options Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Requirements for Deploying VMware Private AI Foundation with NVIDIA
Deploy an NVIDIA Delegated License Service Instance.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Installing and Configuring the DLS Virtual Appliance

You can deploy the virtual appliance in the same workload domain as the AI workloads or in the management domain.

  1. Register an NVIDIA DLS instance on the NVIDIA Licensing Portal, and bind and install a license server on it.
  2. Generate a client configuration token.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Enable vSphere with Tanzu
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Configure vSphere with Tanzu for VMware Private AI Foundation with NVIDIA
Set up a Harbor registry service in the Supervisor.
  • Deploy a deep learning VM
    • Deploy a deep learning VM directly by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster
Setting Up a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Provide a location to download the vGPU guest drivers from. Deploy a deep learning VM Upload to a local Web server the required vGPU guest driver versions and an index in one of the following formats:
  • An index file with a list of the .run files of the vGPU guest drivers.
    host-driver-version-1 guest-driver-download-URL-1
    host-driver-version-2 guest-driver-download-URL-2
    host-driver-version-3 guest-driver-download-URL-3
  • A directory index in the format generated by Web servers, such as NGINX and Apache HTTP Server.
Upload the NVIDIA NGC container images to a private container registry, such as the Harbor Registry service of the Supervisor.
  • Deploy a deep learning VM
    • Deploy a deep learning VM directly by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster
Upload AI Container Images to a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation
  • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using a self-service catalog item
  • Deploy a RAG workload
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster that is provisioned by using a self-service catalog item
Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations Monitor GPU metrics at the cluster, host system and host properties with the option to add these metrics to custom dashboards. For VMware Aria Operations 8.16, follow Intelligent Operations Management for VMware Cloud Foundation.

If you want to use the extended GPU monitoring features in VMware Aria Operations 8.17.1, perform the following steps:

  1. Apply the product support packs for VMware Aria Operations 8.17.1 to VMware Aria Suite Lifecycle 8.16.

    See VMware Aria Suite Lifecycle 8.16 Product Support Pack Release Notes.

  2. Deploy VMware Aria Operations according to Intelligent Operations Management for VMware Cloud Foundation
Deploy VMware Data Services Manager
  • Deploy a RAG workload
Installing and Configuring VMware Data Services Manager

You deploy a VMware Data Services Manager instance in the VI workload domain with the AI workloads.

To be able to provision a PostgreSQL database with pgvector extension by using a self-service catalog item in VMware Aria Automation, deploy VMware Data Services Manager 2.0.2.

  • Set up a machine that has access to the Internet and has Docker and Helm installed.
  • Set up a machine that has access to vCenter Server for the VI workload domain, the Supervisor instance, and the local container registry.

    The machine must have Docker, Helm, and Kubernetes CLI Tools for vSphere.

  • Deploy a deep learning VM
  • Deploy a GPU-accelerated TKG cluster
  • Deploy a RAG workload