On a TKG cluster in a Supervisor, you can deploy a RAG workload based on the RAG Sample Pipeline from NVIDIA that uses a pgvector PostgreSQL database managed by VMware Data Services Manager.

Prerequisites

Procedure

  1. Provision a GPU-accelerated TKG cluster.
  2. Install the RAG LLM Operator.
  3. Download the manifests for the NVIDIA sample RAG pipeline.
  4. Configure the sample RAG pipeline with the pgvector PostgreSQL database.
    1. Edit the sample pipeline YAML file.
      See Step 4 in Sample RAG Pipeline.
    2. In the YAML file, configure the sample pipeline with the pgvector PostgreSQL database by using the database's connection string.
  5. To provide an external IP for the sample chat application, in the YAML file, set frontend.service.type to loadBalancer.
  6. Start the sample RAG pipeline.
  7. To access the sample chat application, run the following command to get the application's external IP address.
    kubectl -n rag-sample get service rag-playground
  8. In a Web browser, open the sample chat application at http://application_external_ip:3001/orgs/nvidia/models/text-qa-chatbot.