In VMware Private AI Foundation with NVIDIA, as a DevOps engineer, by using the Kubernetes API, you provision a TKG cluster that uses NVIDIA GPUs. In a disconnected environment, you must additionally set up a local Ubuntu package repository and use the Harbor Registry for the Supervisor.
Prerequisites
Verify with the cloud administrator that the following prerequisites are in place for the AI-ready infrastructure.
- VMware Private AI Foundation with NVIDIA is deployed and configured. See Deploying VMware Private AI Foundation with NVIDIA.
- A content library with Ubuntu TKr images is added to the namespace for AI workloads. See Configure a Content Library with Ubuntu TKr for a Disconnected VMware Private AI Foundation with NVIDIA Environment.
- A machine that has access to the Supervisor endpoint.
Procedure
- Provision a TKG cluster on the vSphere namespace configured by the cloud administrator.
- Complete the TKG cluster setup.
See Installing VMware vSphere with VMware Tanzu (Air-gapped).
- Provide a local Ubuntu package repository and upload the container images in the NVIDIA GPU Operator package to the Harbor Registry for the Supervisor.
- Update the Helm chart definitions of the NVIDIA GPU Operator to use the local Ubuntu package repository and private Harbor Registry.
- Provide NVIDIA license information.
- Install the NVIDIA GPU Operator.
What to do next
Deploy an AI container image from the Harbor Registry for the Supervisor.