As a DevOps engineer, on a TKG cluster in a Supervisor, you can deploy a RAG workload based on the RAG Sample Pipeline from NVIDIA that uses a pgvector PostgreSQL database managed by VMware Data Services Manager.

Prerequisites

Procedure

  1. Provision a GPU-accelated TKG cluster.
    You can use one of the following workflows.
    Provisioning Workflow Steps
    By using a catalog item in VMware Aria Automation Deploy a GPU-accelerated Tanzu Kubernetes Grid RAG cluster.
    By using the kubectl comamnd
    1. Provision a GPU-Accelerated TKG Cluster by using the kubectl command.
    2. Install the RAG LLM Operator.

      See Install the RAG LLM Operator.

  2. If you used the kubectl command to provision the TKG cluster, install the NVIDIA RAG LLM Operator on the TKG cluster.

    See Install the RAG LLM Operator.

    During deployment, the AI Kubernetes RAG Cluster catalog item in VMware Aria Automation automatically installs the NVIDIA RAG LLM Operator on the TKG cluster.

  3. Download the manifests for the NVIDIA sample RAG pipeline.
  4. Configure the sample RAG pipeline with the pgvector PostgreSQL database.
    1. Edit the sample pipeline YAML file.
      See Step 4 in Sample RAG Pipeline.
    2. In the YAML file, configure the sample pipeline with the pgvector PostgreSQL database by using the database's connection string.
  5. To provide an external IP for the sample chat application, in the YAML file, set frontend.service.type to loadBalancer.
  6. Start the sample RAG pipeline.
  7. To access the sample chat application, run the following command to get the application's external IP address.
    kubectl -n rag-sample get service rag-playground
  8. In a Web browser, open the sample chat application at http://application_external_ip:3001/orgs/nvidia/models/text-qa-chatbot.