DevOps engineers and developers can use VMware Aria Automation to provision GPU-accelerated TKG clusters for hosting container AI workloads on the Supervisor instance in a VI workload domain.
The workflow for deploying a GPU-accelerated TKG cluster has two parts:
- As a cloud administrator, you add self-service catalog items for private AI for a new namespace on the Supervisor to Automation Service Broker.
- As a data scientist or DevOps engineer, you use an AI Kubernetes cluster catalog item to deploy a GPU-accelerated TKG cluster on a new namespace on the Supervisor.
Create AI Self-Service Catalog Items in VMware Aria Automation
As a cloud administrator, you can use the catalog setup wizard for private AI in VMware Aria Automation to quickly add catalog items for deploying deep learning virtual machines or GPU-accelerated TKG clusters in a VI workload domain.
Data scientists can use deep learning catalog items for deployment of deep learning VMs. DevOps engineers can use the catalog items for provisioning AI-ready TKG clusters. Every time you run it, the catalog setup wizard for private AI adds two catalog items to the Service Broker catalog - one for a deep learning virtual machine and one for a TKG cluster.
Every time you run it, the catalog setup wizard for private AI adds two catalog items to the Service Broker catalog - one for a deep learning virtual machine and one for a TKG cluster. You can run the wizard every time you need the following:
- Enable provisioning of AI workloads on another supervisor.
- Accommodate a change in your NVIDIA AI Enterprise license, including the client configuration .tok file and license server, or the download URL for the vGPU guest drivers for a disconnected environment.
- Accommodate a deep learning VM image change.
- Use other vGPU or non-GPU VM classes, storage policy, or container registry.
- Create catalog items in a new project.
Prerequisites
- Verify that VMware Private AI Foundation with NVIDIA is available for the VI workload domain.
- Verify that the prerequisites for deploying deep learning VMs are in place.
- Create a Content Library with Deep Learning VM Images for VMware Private AI Foundation with NVIDIA.
Procedure
Provision a GPU-Accelerated TKG Cluster by Using a Self-Service Catalog in VMware Aria Automation
In VMware Private AI Foundation with NVIDIA, as a DevOps engineer, you can provision a TKG cluster accelerated with NVIDIA GPUs from VMware Aria Automation by using an AI Kubernetes Cluster self-service catalog items in Automation Service Broker. Then, you can deploy AI container images from NVIDIA NGC on the cluster.
Procedure
What to do next
- For details on how to access the TKG cluster by using kubectl, in Automation Service Broker, navigate to .
- Deploy an AI container image from the NVIDIA NGC catalog.
In a disconnected environment, you must upload the AI container images to a private container registry. See Setting Up a Private Harbor Registry in VMware Private AI Foundation with NVIDIA.