As a DevOps engineer, on a TKG cluster in a Supervisor, you can deploy a RAG workload based on the RAG sample multi-turn application from NVIDIA that uses a pgvector PostgreSQL database managed by VMware Data Services Manager.

Prerequisites

Procedure

  1. Provision a GPU-accelated TKG cluster.
    You can use one of the following workflows.
    Provisioning Workflow Steps
    By using a catalog item in VMware Aria Automation Deploy a RAG Cluster with a Vector Database by Using a Self-Service Catalog Item in VMware Aria Automation.
    By using the kubectl comamnd
    1. Provision a GPU-Accelerated TKG Cluster by using the kubectl command.
  2. If you are using the kubectl command, deploy the NVIDIA NIMs.
    1. Fetch the Helm charts with the NVIDIA NIMs.
    2. Deploy NVIDIA NIM LLM, NVIDIA NeMo Retriever Embedding, and NVIDIA NeMo Retriever Ranking Microservice.
  3. Fetch the Helm chart for the sample multi-turn chatbot.
    helm fetch https://helm.ngc.nvidia.com/nvidia/aiworkflows/charts/rag-app-multiturn-chatbot-24.08.tgz --username='$oauthtoken' --password=<YOUR API KEY>
    
  4. Create a YAML with custom values for configuring the chatbot with the pgvector PostgreSQL database.
    For a pgvector database with a connection string postgres://pgvector_db_admin:encoded_pgvector_db_admin_password@pgvector_db_ip_address:5432/pgvector_db_name, prepare the following app_values.yaml file.

    To provide an external IP for the sample chat application, in the YAML file, set frontend.service.type to loadBalancer.

    query:
      env:
        APP_VECTORSTORE_URL: "pgvector_db_ip_address:5432"
        APP_VECTORSTORE_NAME: "pgvector"
        POSTGRES_PASSWORD: "encoded_pgvector_db_admin_password"
        POSTGRES_USER: "pgvector_db_admin"
        POSTGRES_DB: "pgvector_db_name"
        APP_EMBEDDINGS_MODELNAME: "nvidia/nv-embedqa-e5-v5"
    frontend:
      service:
        type: LoadBalancer
  5. Deploy the multi-turn chatbot in a namespace using the custom values file.
    kubectl create namespace multiturn-rag
    kubectl label --overwrite ns multiturn-rag pod-security.kubernetes.io/enforce=privileged
     
    export NGC_CLI_API_KEY="<NGC-API-key>"
     
    helm install multiturn-rag rag-app-multiturn-chatbot-24.08.tgz -n multiturn-rag --set imagePullSecret.password=$NGC_CLI_API_KEY -f ./app_values.yaml
  6. To access the chatbot application, run the following command to get the application's external IP address.
    kubectl -n multiturn-rag get service
  7. In a Web browser, open the sample chat application at http://application_external_ip:3001/converse.