In VMware Private AI Foundation with NVIDIA, as a DevOps engineer, by using the Kubernetes API, you provision a TKG cluster that uses NVIDIA GPUs. In a disconnected environment, you must additionally set up a local Ubuntu package repository and use the Harbor Registry for the Supervisor.

Prerequisites

Verify with the cloud administrator that the following prerequisites are in place for the AI-ready infrastructure.

Procedure

  1. Provision a TKG cluster on the vSphere namespace configured by the cloud administrator.

    See Provision a TKGS Cluster for NVIDIA vGPU.

  2. Install the NVIDIA GPU Operator.
    helm install --wait gpu-operator ./gpu-operator-4-1 -n gpu-operator
  3. Monitor the operation.
    watch kubectl get pods -n gpu-operator

What to do next

Deploy an AI container image from the Harbor Registry to the Supervisor.