Deploy a RAG Workload on a TKG Cluster

As a DevOps engineer, on a TKG cluster in a Supervisor, you can deploy a RAG workload based on the RAG sample multi-turn application from NVIDIA that uses a pgvector PostgreSQL database managed by VMware Data Services Manager.

Prerequisites

Verify that VMware Private AI Foundation with NVIDIA is available for the VI workload domain. See Preparing VMware Cloud Foundation for Private AI Workload Deployment.
Deploy a Vector Database in VMware Private AI Foundation with NVIDIA.
Note: Depending on your organization guidelines, you can deploy a RAG TKG cluster with a new vector database in a single provisioning request in the VMware Aria Automation self-service catalog.

Procedure

Provision a GPU-accelated TKG cluster.

You can use one of the following workflows.


Provisioning Workflow	Steps
By using a catalog item in VMware Aria Automation	Deploy a RAG Cluster with a Vector Database by Using a Self-Service Catalog Item in VMware Aria Automation.
By using the kubectl comamnd	Provision a GPU-Accelerated TKG Cluster by using the kubectl command. For a connected environment, see Provision a GPU-Accelerated TKG Cluster by Using the kubectl Command in a Connected VMware Private AI Foundation with NVIDIA Environment . For a disconnected environment, see Provision a GPU-Accelerated TKG Cluster by Using the kubectl Command in a Disconnected VMware Private AI Foundation with NVIDIA Environment.

If you are using the kubectl command, deploy the NVIDIA NIMs.
1. Fetch the Helm charts with the NVIDIA NIMs.
  See Multi Turn RAG.
2. Deploy NVIDIA NIM LLM, NVIDIA NeMo Retriever Embedding, and NVIDIA NeMo Retriever Ranking Microservice.
  See Multi Turn RAG.

Fetch the Helm chart for the sample multi-turn chatbot.

helm fetch https://helm.ngc.nvidia.com/nvidia/aiworkflows/charts/rag-app-multiturn-chatbot-24.08.tgz --username='$oauthtoken' --password=<YOUR API KEY>

Create a YAML with custom values for configuring the chatbot with the pgvector PostgreSQL database.
For a pgvector database with a connection string postgres://pgvector_db_admin:encoded_pgvector_db_admin_password@pgvector_db_ip_address:5432/pgvector_db_name, prepare the following app_values.yaml file.
To provide an external IP for the sample chat application, in the YAML file, set frontend.service.type to loadBalancer.
```
query:
  env:
    APP_VECTORSTORE_URL: "pgvector_db_ip_address:5432"
    APP_VECTORSTORE_NAME: "pgvector"
    POSTGRES_PASSWORD: "encoded_pgvector_db_admin_password"
    POSTGRES_USER: "pgvector_db_admin"
    POSTGRES_DB: "pgvector_db_name"
    APP_EMBEDDINGS_MODELNAME: "nvidia/nv-embedqa-e5-v5"
frontend:
  service:
    type: LoadBalancer
```

Deploy the multi-turn chatbot in a namespace using the custom values file.

kubectl create namespace multiturn-rag
kubectl label --overwrite ns multiturn-rag pod-security.kubernetes.io/enforce=privileged
 
export NGC_CLI_API_KEY="<NGC-API-key>"
 
helm install multiturn-rag rag-app-multiturn-chatbot-24.08.tgz -n multiturn-rag --set imagePullSecret.password=$NGC_CLI_API_KEY -f ./app_values.yaml

To access the chatbot application, run the following command to get the application's external IP address.
```
kubectl -n multiturn-rag get service
```
In a Web browser, open the sample chat application at http://application_external_ip:3001/converse.