Deploy a RAG Workload on a TKG Cluster Using a Self-Service Catalog Item in VMware Aria Automation

As a DevOps engineer using the self-service Automation Service Broker catalog, you can provision a GPU-enabled Tanzu Kubernetes Grid RAG cluster, where worker nodes can run a reference RAG solution that uses the Llama2-13b-chat model.

The deployment contains a Supervisor namespace and a Tanzu Kubernetes Grid cluster. The TKG cluster contains two Supervisor namespaces – one for the NVIDIA GPU Operator and the other for the NVIDIA RAG LLM Operator, both of which are preinstalled on the TKG cluster. Carvel applications for each operator are deployed inside these two namespaces.

Procedure

On the Catalog page in Automation Service Broker, locate the AI Kubernetes RAG Cluster card and click Request.
Select a project.
Enter a name and description for your deployment.

Select the number of control pane nodes.

Setting	Sample value
Node count	`1`
VM class	`best-effort-2xlarge - 8 CPUs and 64 GB Memory` The class selection defines the resources available within the virtual machine.

Select the number of work nodes.

Setting	Description
Node count	`3`
VM class	`best-effort-4xlarge-a100-40c - 1 vGPU (40 GB), 16 CPUs and 120 GB Memory` Minimum VM class specification: CPU: 10 vCPUs CPU RAM: 64 GB GPU: 2xH100 GPU memory: 50 GB
Time-slicing replicas	`1` Time-slicing defines a set of replicas for a GPU that is shared between workloads.

Provide the NVIDIA AI Enterprise API key.
Click Submit.