Provision a GPU-Accelerated TKG Cluster by Using the kubectl Command in a Disconnected VMware Private AI Foundation with NVIDIA Environment

In VMware Private AI Foundation with NVIDIA, as a DevOps engineer, by using the Kubernetes API, you provision a TKG cluster that uses NVIDIA GPUs. In a disconnected environment, you must additionally set up a local Ubuntu package repository and use the Harbor Registry for the Supervisor.

Prerequisites

Verify with the cloud administrator that the following prerequisites are in place for the AI-ready infrastructure.

VMware Private AI Foundation with NVIDIA is deployed and configured. See Deploying VMware Private AI Foundation with NVIDIA.
A content library with Ubuntu TKr images is added to the namespace for AI workloads. See Configure a Content Library with Ubuntu TKr for a Disconnected VMware Private AI Foundation with NVIDIA Environment.
A machine that has access to the Supervisor endpoint.

Procedure

Provision a TKG cluster on the vSphere namespace configured by the cloud administrator.
See Provision a TKGS Cluster for NVIDIA vGPU.
Complete the TKG cluster setup.
See Installing VMware vSphere with VMware Tanzu (Air-gapped).
1. Provide a local Ubuntu package repository and upload the container images in the NVIDIA GPU Operator package to the Harbor Registry for the Supervisor.
2. Update the Helm chart definitions of the NVIDIA GPU Operator to use the local Ubuntu package repository and private Harbor Registry.
3. Provide NVIDIA license information.
4. Install the NVIDIA GPU Operator.

What to do next

Deploy an AI container image from the Harbor Registry for the Supervisor.