As a cloud administrator, you must deploy specific software and configure the target VI workload domains so that data scientists and DevOps engineers can deploy AI workloads on top of VMware Private AI Foundation with NVIDIA.

VMware Components in VMware Private AI Foundation with NVIDIA

The functionality of the VMware Private AI Foundation with NVIDIA solution is available across several software components.

  • VMware Cloud Foundation 5.1.1
  • VMware Aria Automation 8.16.2
  • VMware Aria Operations 8.16
  • VMware Data Services Manager 2.0

For information about the VMware Private AI Foundation with NVIDIA architecture and components, see What is VMware Private AI Foundation with NVIDIA?.

VMware Private AI Foundation with NVIDIA Deployment Workflows

In a disconnected environment, you must take additional steps to set up and deploy appliances and provide resources locally, so that your workloads can access them.

Connected Environment
Task Related AI Workload Deployment Options Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Requirements for Deploying VMware Private AI Foundation with NVIDIA
Configure a License Service instance on the NVIDIA Licensing Portal and generate a client configuration token.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
NVIDIA License System User Guide.
Generate an API key for access to the NVIDIA NGC catalog.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Pulling and Running NVIDIA AI Enterprise Containers
If you plan to deploy deep learning VMs or TKG cluster directly on a Supervisor in vSphere with Tanzu, set up a machine that has access to the Supervisor instance, and has Docker, Helm, and Kubernetes CLI Tools for vSphere.
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Install the Kubernetes CLI Tools for vSphere
Enable vSphere with Tanzu.
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Configure vSphere with Tanzu for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation.
  • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using a self-service catalog item
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster that is provisioned by using a self-service catalog item
Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations. Monitor GPU metrics at the cluster, host system and host properties with the option to add these metrics to custom dashboards. Intelligent Operations Management for VMware Cloud Foundation
Deploy VMware Data Services Manager
  • Deploy a RAG workload
Installing and Configuring VMware Data Services Manager

You deploy a VMware Data Services Manager instance in the VI workload domain with the AI workloads.

Disconnected Environment
Task Related AI Workload Deployment Options Steps
Review the requirements for deploying VMware Private AI Foundation with NVIDIA.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Requirements for Deploying VMware Private AI Foundation with NVIDIA
Deploy an NVIDIA Delegated License Service Instance.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Installing and Configuring the DLS Virtual Appliance

You can deploy the virtual appliance in the same workload domain as the AI workloads or in the management domain.

  1. Register an NVIDIA DLS instance on the NVIDIA Licensing Portal, and bind and install a license server on it.
  2. Generate a client configuration token.
  • Deploy a deep learning VM
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
Enable vSphere with Tanzu
  • Deploy a deep learning VM directly by using kubectl
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using kubectl
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a RAG Workload on a TKG cluster
Configure vSphere with Tanzu for VMware Private AI Foundation with NVIDIA
Enable the Harbor Registry service in the Supervisor.
  • Deploy a deep learning VM
    • Deploy a deep learning VM directly by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster
Enable Harbor as a Supervisor Service

You can use it as a local image registry for container images from the NVIDIA NGC catalog.

Allocate enough storage space for hosting the NVIDIA NGC containers you plan to deploy on a deep learning VM or on a TKG cluster. Accommodate at least three versions of each container in the storage space.

Note: The installation of the Harbor service in the Supervisor requires an Internet connection. After you install the service, you can disconnect your environment from the Internet and start using the Harbor service as a local container registry.

If connecting to the Internet while installing the Harbor service is not an option for your organization, set up a container registry by another vendor.

Provide a location to download the vGPU guest drivers from. Deploy a deep learning VM Upload the required vGPU guest driver versions and an index file in the following format to a local Web server:
host-driver-version-1 guest-driver-download-URL-1
host-driver-version-2 guest-driver-download-URL-2
host-driver-version-3 guest-driver-download-URL-3
Upload the NVIDIA NGC container images to a private container registry, such as the Harbor Registry service of the Supervisor.
  • Deploy a deep learning VM
    • Deploy a deep learning VM directly by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster
  • Deploy a RAG workload
    • Deploy a deep learning VM with a RAG workload by using kubectl
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster
Upload AI Container Images to a Private Harbor Registry in VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Automation
  • Deploy a deep learning VM directly by using a self-service catalog item
  • Deploy AI workloads on a GPU-accelerated TKG cluster that is provisioned by using a self-service catalog item
  • Deploy a RAG workload
    • Deploy a deep learning VM directly by using a self-service catalog item
    • Deploy a RAG Workload on a TKG cluster that is provisioned by using a self-service catalog item
Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA
Deploy VMware Aria Operations Monitor GPU metrics at the cluster, host system and host properties with the option to add these metrics to custom dashboards. Intelligent Operations Management for VMware Cloud Foundation
Deploy VMware Data Services Manager
  • Deploy a RAG workload
Installing and Configuring VMware Data Services Manager

You deploy a VMware Data Services Manager instance for each VI workload domain. You can host these VMware Data Services Manager instances in the management domain.

  • Set up a machine that has access to the Internet and has Docker and Helm installed.
  • Set up a machine that has access to vCenter Server for the VI workload domain, the Supervisor instance, and the local container registry.

    The machine must have Docker, Helm, and Kubernetes CLI Tools for vSphere.

  • Deploy a deep learning VM
  • Deploy a GPU-accelerated TKG cluster
  • Deploy a RAG workload