This document details the setup of a Jupyter Notebook that implements the ingestion process of text documents so you build a specialized knowledge base, used by an LLM to answer questions about that knowledge domain. This process is called Retrieval Augmented Generation (RAG).
Prerequisites
- Access to Docker Hub registry to pull PostgreSQL pgvector database containers.
- Access to NVIDIA NGC registry to pull PyTorch containers.
- A vSphere cluster that has the following available resources:
- 16 vCPUs
- 128 GB RAM
GPU with 40GB+ of memory (vGPU Time Slice or MIG)
- 100 GB of disk space
- A deep learning VM running a PyTorch container.
For information on deploying deep learning VMs, see Adding VMware Private AI Foundation with NVIDIA to Private AI Ready Infrastructure for VMware Cloud Foundation.
- (Optionally) A PostgreSQL database with the pgvector extension deployed by VMware Data Services Manager.
Procedure
Results
You sucesfully deployed and tested the core components of a RAG workload for Private AI Ready Infrastructure for VMware Cloud Foundation.
What to do next
- To learn more about the core elements of a RAG workload, explore the contents and output of each cell in the script.
To learn more about different RAG approaches and their evaluation, you can deploy a deep learning VM with at least two A100 40 GB GPUs and follow the instructions from the README files inside the rest of the folders within the Improved RAG Starter Pack repository.