In VMware Private AI Foundation with NVIDIA, as a DevOps engineer, by using the Kubernetes API, you provision a TKG cluster that uses NVIDIA GPUs. In a disconnected environment, you must additionally set up a local Ubuntu package repository and use the Harbor Registry for the Supervisor.
Prerequisites
Verify with the cloud administrator that the following prerequisites are in place for the AI-ready infrastructure.
- VMware Private AI Foundation with NVIDIA is configured for a disconnected environment. See Preparing VMware Cloud Foundation for Private AI Workload Deployment.
- A machine that has access to the Supervisor endpoint and to the local Helm repository hosting the for the NVIDIA GPU Operator chart definitions.
Procedure
- Provision a TKG cluster on the vSphere namespace configured by the cloud administrator.
- Install the NVIDIA GPU Operator.
helm install --wait gpu-operator ./gpu-operator-4-1 -n gpu-operator
- Monitor the operation.
watch kubectl get pods -n gpu-operator
What to do next
Deploy an AI container image from the Harbor Registry to the Supervisor.