Deploy a RAG Workload on a TKG Cluster

On a TKG cluster in a Supervisor, you can deploy a RAG workload based on the RAG Sample Pipeline from NVIDIA that uses a pgvector PostgreSQL database managed by VMware Data Services Manager.

Prerequisites

Verify that VMware Private AI Foundation with NVIDIA is available for the VI workload domain. See Deploying VMware Private AI Foundation with NVIDIA.
Deploy a Vector Database in VMware Private AI Foundation with NVIDIA.

Procedure

Provision a GPU-accelerated TKG cluster.
See Deploying AI Workloads on TKG Clusters in VMware Private AI Foundation with NVIDIA.
Install the RAG LLM Operator.
See Install the RAG LLM Operator.
Download the manifests for the NVIDIA sample RAG pipeline.
See Sample RAG Pipeline.
Configure the sample RAG pipeline with the pgvector PostgreSQL database.
1. Edit the sample pipeline YAML file.
  See Step 4 in Sample RAG Pipeline.
2. In the YAML file, configure the sample pipeline with the pgvector PostgreSQL database by using the database's connection string.
  See Vector Database for RAG Sample Pipeline .
To provide an external IP for the sample chat application, in the YAML file, set frontend.service.type to loadBalancer.
Start the sample RAG pipeline.
See Sample RAG Pipeline.
To access the sample chat application, run the following command to get the application's external IP address.
```
kubectl -n rag-sample get service rag-playground
```
In a Web browser, open the sample chat application at http://application_external_ip:3001/orgs/nvidia/models/text-qa-chatbot.