Provision a GPU-Accelerated TKG Cluster by Using the kubectl Command in a Disconnected VMware Private AI Foundation with NVIDIA Environment

In VMware Private AI Foundation with NVIDIA, as a DevOps engineer, by using the Kubernetes API, you provision a TKG cluster that uses NVIDIA GPUs. In a disconnected environment, you must additionally set up a local Ubuntu package repository and use the Harbor Registry for the Supervisor.

Prerequisites

Verify with the cloud administrator that the following prerequisites are in place for the AI-ready infrastructure.

VMware Private AI Foundation with NVIDIA is configured for a disconnected environment. See Preparing VMware Cloud Foundation for Private AI Workload Deployment.
A machine that has access to the Supervisor endpoint and to the local Helm repository hosting the for the NVIDIA GPU Operator chart definitions.

Procedure

Provision a TKG cluster on the vSphere namespace configured by the cloud administrator.
See Provision a TKGS Cluster for NVIDIA vGPU.

Install the NVIDIA GPU Operator.

helm install --wait gpu-operator ./gpu-operator-4-1 -n gpu-operator

Monitor the operation.
```
watch kubectl get pods -n gpu-operator
```

What to do next

Deploy an AI container image from the Harbor Registry to the Supervisor.