As a DevOps engineer, on a TKG cluster in a Supervisor, you can deploy a RAG workload based on the RAG sample multi-turn application from NVIDIA that uses a pgvector PostgreSQL database managed by VMware Data Services Manager.
Procedure
- Provision a GPU-accelated TKG cluster.
You can use one of the following workflows.
- If you are using the kubectl command, deploy the NVIDIA NIMs.
- Fetch the Helm charts with the NVIDIA NIMs.
- Deploy NVIDIA NIM LLM, NVIDIA NeMo Retriever Embedding, and NVIDIA NeMo Retriever Ranking Microservice.
- Fetch the Helm chart for the sample multi-turn chatbot.
helm fetch https://helm.ngc.nvidia.com/nvidia/aiworkflows/charts/rag-app-multiturn-chatbot-24.08.tgz --username='$oauthtoken' --password=<YOUR API KEY>
- Create a YAML with custom values for configuring the chatbot with the pgvector PostgreSQL database.
For a pgvector database with a connection string
postgres://pgvector_db_admin:encoded_pgvector_db_admin_password@pgvector_db_ip_address:5432/pgvector_db_name
, prepare the following
app_values.yaml file.
To provide an external IP for the sample chat application, in the YAML file, set frontend.service.type
to loadBalancer
.
query:
env:
APP_VECTORSTORE_URL: "pgvector_db_ip_address:5432"
APP_VECTORSTORE_NAME: "pgvector"
POSTGRES_PASSWORD: "encoded_pgvector_db_admin_password"
POSTGRES_USER: "pgvector_db_admin"
POSTGRES_DB: "pgvector_db_name"
APP_EMBEDDINGS_MODELNAME: "nvidia/nv-embedqa-e5-v5"
frontend:
service:
type: LoadBalancer
- Deploy the multi-turn chatbot in a namespace using the custom values file.
kubectl create namespace multiturn-rag
kubectl label --overwrite ns multiturn-rag pod-security.kubernetes.io/enforce=privileged
export NGC_CLI_API_KEY="<NGC-API-key>"
helm install multiturn-rag rag-app-multiturn-chatbot-24.08.tgz -n multiturn-rag --set imagePullSecret.password=$NGC_CLI_API_KEY -f ./app_values.yaml
- To access the chatbot application, run the following command to get the application's external IP address.
kubectl -n multiturn-rag get service
- In a Web browser, open the sample chat application at http://application_external_ip:3001/converse.