Sie können eine Deep Learning-VM mit einer NVIDIA RAG-Arbeitslast unter Verwendung einer PostgreSQL-Datenbank vom Typ „pgvector“ bereitstellen, die von VMware Data Services Manager verwaltet wird.

Informationen zur NVIDIA RAG-Arbeitslast finden Sie in der Dokumentation zu NVIDIA RAG-Anwendungen mit Docker Compose (erfordert bestimmte Kontoberechtigungen).

Voraussetzungen

Prozedur

  1. Wenn Sie als Datenwissenschaftler die Deep Learning-VM mithilfe eines Katalogelements in VMware Aria Automation bereitstellen, geben Sie die Details der PostgreSQL-Datenbank „pgvector“ an, nachdem Sie die virtuelle Maschine bereitgestellt haben.
    1. Bereitstellen einer RAG-Workstation mit einer Vektordatenbank unter Verwendung eines Self-Service-Katalogelements in VMware Aria Automation.
  2. Wenn Sie als DevOps-Ingenieur die Deep Learning-VM für einen Datenwissenschaftler direkt auf dem vSphere-Cluster oder mithilfe des Befehls kubectl bereitstellen, erstellen Sie ein cloud-init-Skript und stellen Sie die Deep Learning-VM bereit.
    1. Erstellen Sie ein cloud-init-Skript für NVIDIA RAG und die von Ihnen erstellte pgvector PostgreSQL-Datenbank.
      Sie können die anfängliche Version des cloud-init-Skripts für NVIDIA RAG ändern. Beispielsweise für NVIDIA RAG 24.08 und eine pgvector-PostgreSQL-Datenbank mit Verbindungsdetails postgres://pgvector_db_admin:encoded_pgvector_db_admin_password@pgvector_db_ip_address:5432/pgvector_db_name.
      #cloud-config
      write_files:
      - path: /opt/dlvm/dl_app.sh
        permissions: '0755'
        content: |
          #!/bin/bash
          set -eu
          source /opt/dlvm/utils.sh
          trap 'error_exit "Unexpected error occurs at dl workload"' ERR
          set_proxy "http" "https"
          
          sudo mkdir -p /opt/data/
          sudo chown vmware:vmware /opt/data
          sudo chmod -R 775 /opt/data
          cd /opt/data/
      
          cat <<EOF > /opt/data/config.json
          {
            "_comment_1": "This provides default support for RAG v24.08: llama3-8b-instruct model",
            "_comment_2": "Update llm_ms_gpu_id: specifies the GPU device ID to make available to the inference server when using multiple GPU",
            "_comment_3": "Update embedding_ms_gpu_id: specifies the GPU ID used for embedding model processing when using multiple GPU",
            "rag": {
              "org_name": "nvidia",
              "org_team_name": "aiworkflows",
              "rag_name": "ai-chatbot-docker-workflow",
              "rag_version": "24.08",
              "rag_app": "rag-app-multiturn-chatbot",
              "nim_model_profile": "auto",
              "llm_ms_gpu_id": "0",
              "embedding_ms_gpu_id": "0",
              "model_directory": "model-cache",
              "ngc_cli_version": "3.41.2"
            }
          }
          EOF
      
          CONFIG_JSON=$(cat "/opt/data/config.json")
          required_vars=("ORG_NAME" "ORG_TEAM_NAME" "RAG_NAME" "RAG_VERSION" "RAG_APP" "NIM_MODEL_PROFILE" "LLM_MS_GPU_ID" "EMBEDDING_MS_GPU_ID" "MODEL_DIRECTORY" "NGC_CLI_VERSION")
      
          # Extract rag values from /opt/data/config.json
          for index in "${!required_vars[@]}"; do
            key="${required_vars[$index]}"
            jq_query=".rag.${key,,} | select (.!=null)"
            value=$(echo "${CONFIG_JSON}" | jq -r "${jq_query}")
            if [[ -z "${value}" ]]; then 
              error_exit "${key} is required but not set."
            else
              eval ${key}=\""${value}"\"
            fi
          done
      
          # Read parameters from config-json to connect DSM PGVector on RAG
          CONFIG_JSON_BASE64=$(grep 'config-json' /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
          CONFIG_JSON_PGVECTOR=$(echo "${CONFIG_JSON_BASE64}" | base64 -d)
          PGVECTOR_VALUE=$(echo ${CONFIG_JSON_PGVECTOR} | jq -r '.rag.pgvector')
          if [[ -n "${PGVECTOR_VALUE}" && "${PGVECTOR_VALUE}" != "null" ]]; then
            echo "Info: extract DSM PGVector parameters from config-json in XML"
            POSTGRES_USER=$(echo ${PGVECTOR_VALUE} | awk -F[:@/] '{print $4}')
            POSTGRES_PASSWORD=$(echo ${PGVECTOR_VALUE} | awk -F[:@/] '{print $5}')
            POSTGRES_HOST_IP=$(echo ${PGVECTOR_VALUE} | awk -F[:@/] '{print $6}')
            POSTGRES_PORT_NUMBER=$(echo ${PGVECTOR_VALUE} | awk -F[:@/] '{print $7}')
            POSTGRES_DB=$(echo ${PGVECTOR_VALUE} | awk -F[:@/] '{print $8}')
      
            for var in POSTGRES_USER POSTGRES_PASSWORD POSTGRES_HOST_IP POSTGRES_PORT_NUMBER POSTGRES_DB; do
              if [ -z "${!var}" ]; then
                error_exit "${var} is not set."
              fi
            done
          fi
      
          gpu_info=$(nvidia-smi -L)
          echo "Info: the detected GPU info, $gpu_info"
          if [[ ${NIM_MODEL_PROFILE} == "auto" ]]; then 
            case "${gpu_info}" in
              *A100*)
                NIM_MODEL_PROFILE="751382df4272eafc83f541f364d61b35aed9cce8c7b0c869269cea5a366cd08c"
                echo "Info: GPU type A100 detected. Setting tensorrt_llm-A100-fp16-tp1-throughput as the default NIM model profile."
                ;;
              *H100*)
                NIM_MODEL_PROFILE="cb52cbc73a6a71392094380f920a3548f27c5fcc9dab02a98dc1bcb3be9cf8d1"
                echo "Info: GPU type H100 detected. Setting tensorrt_llm-H100-fp16-tp1-throughput as the default NIM model profile."
                ;;
              *L40S*)
                NIM_MODEL_PROFILE="d8dd8af82e0035d7ca50b994d85a3740dbd84ddb4ed330e30c509e041ba79f80"
                echo "Info: GPU type L40S detected. Setting tensorrt_llm-L40S-fp16-tp1-throughput as the default NIM model profile."
                ;;
              *)
                NIM_MODEL_PROFILE="8835c31752fbc67ef658b20a9f78e056914fdef0660206d82f252d62fd96064d"
                echo "Info: No supported GPU type detected (A100, H100, L40S). Setting vllm as the default NIM model profile."
                ;;
            esac
          else
            echo "Info: using the NIM model profile provided by the user, $NIM_MODEL_PROFILE"
          fi
      
          RAG_URI="${ORG_NAME}/${ORG_TEAM_NAME}/${RAG_NAME}:${RAG_VERSION}"
          RAG_FOLDER="${RAG_NAME}_v${RAG_VERSION}"
          NGC_CLI_URL="https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/${NGC_CLI_VERSION}/files/ngccli_linux.zip"
      
          if [ ! -f .initialize ]; then
            # clean up
            rm -rf compose.env ngc* ${RAG_NAME}* ${MODEL_DIRECTORY}* .initialize
      
            # install ngc-cli
            wget --content-disposition ${NGC_CLI_URL} -O ngccli_linux.zip && unzip -q ngccli_linux.zip
            export PATH=`pwd`/ngc-cli:${PATH}
      
            APIKEY=""
            DEFAULT_REG_URI="nvcr.io"
      
            REGISTRY_URI_PATH=$(grep registry-uri /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            if [[ -z "${REGISTRY_URI_PATH}" ]]; then
              REGISTRY_URI_PATH=${DEFAULT_REG_URI}
              echo "Info: registry uri was empty. Using default: ${REGISTRY_URI_PATH}"
            fi
      
            if [[ "$(grep registry-uri /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')" == *"${DEFAULT_REG_URI}"* ]]; then
              APIKEY=$(grep registry-passwd /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            fi
      
            if [ -z "${APIKEY}" ]; then
                error_exit "No APIKEY found"
            fi
      
            # config ngc-cli
            mkdir -p ~/.ngc
      
            cat << EOF > ~/.ngc/config
            [CURRENT]
            apikey = ${APIKEY}
            format_type = ascii
            org = ${ORG_NAME}
            team = ${ORG_TEAM_NAME}
            ace = no-ace
          EOF
            
            # Extract registry URI if path contains '/'
            if [[ ${REGISTRY_URI_PATH} == *"/"* ]]; then
              REGISTRY_URI=$(echo "${REGISTRY_URI_PATH}" | cut -d'/' -f1)
            else
              REGISTRY_URI=${REGISTRY_URI_PATH}
            fi
      
            REGISTRY_USER=$(grep registry-user /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
      
            # Docker login if credentials are provided
            if [[ -n "${REGISTRY_USER}" && -n "${APIKEY}" ]]; then
              docker login -u ${REGISTRY_USER} -p ${APIKEY} ${REGISTRY_URI}
            else
              echo "Warning: the ${REGISTRY_URI} registry's username and password are invalid, Skipping Docker login."
            fi
      
            # DockerHub login for general components
            DOCKERHUB_URI=$(grep registry-2-uri /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            DOCKERHUB_USERNAME=$(grep registry-2-user /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            DOCKERHUB_PASSWORD=$(grep registry-2-passwd /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
      
            DOCKERHUB_URI=${DOCKERHUB_URI:-docker.io}
            if [[ -n "${DOCKERHUB_USERNAME}" && -n "${DOCKERHUB_PASSWORD}" ]]; then
              docker login -u ${DOCKERHUB_USERNAME} -p ${DOCKERHUB_PASSWORD} ${DOCKERHUB_URI}
            else
              echo "Warning: ${DOCKERHUB_URI} not logged in"
            fi
      
            # Download RAG files
            ngc registry resource download-version ${RAG_URI}
      
            mkdir -p /opt/data/${MODEL_DIRECTORY}
      
            # Update the docker-compose YAML files to correct the issue with GPU free/non-free status reporting
            /usr/bin/python3 -c "import yaml, json, sys; print(json.dumps(yaml.safe_load(sys.stdin.read())))" < "${RAG_FOLDER}/docker-compose-nim-ms.yaml"> docker-compose-nim-ms.json
            jq --arg profile "${NIM_MODEL_PROFILE}" \
               '.services."nemollm-inference".environment.NIM_MANIFEST_ALLOW_UNSAFE = "1" |
                .services."nemollm-inference".environment.NIM_MODEL_PROFILE = $profile |
                .services."nemollm-inference".deploy.resources.reservations.devices[0].device_ids = ["${LLM_MS_GPU_ID:-0}"] |
                del(.services."nemollm-inference".deploy.resources.reservations.devices[0].count)' docker-compose-nim-ms.json > temp.json && mv temp.json docker-compose-nim-ms.json
            /usr/bin/python3 -c "import yaml, json, sys; print(yaml.safe_dump(json.load(sys.stdin), default_flow_style=False, sort_keys=False))" < docker-compose-nim-ms.json > "${RAG_FOLDER}/docker-compose-nim-ms.yaml"
            rm -rf docker-compose-nim-ms.json
      
            # Update docker-compose YAML files to config PGVector as the default databse
            /usr/bin/python3 -c "import yaml, json, sys; print(json.dumps(yaml.safe_load(sys.stdin.read())))" < "${RAG_FOLDER}/${RAG_APP}/docker-compose.yaml"> rag-app-multiturn-chatbot.json
            jq '.services."chain-server".environment.APP_VECTORSTORE_NAME = "pgvector" |
               .services."chain-server".environment.APP_VECTORSTORE_URL = "${POSTGRES_HOST_IP:-pgvector}:${POSTGRES_PORT_NUMBER:-5432}" |
               .services."chain-server".environment.POSTGRES_PASSWORD = "${POSTGRES_PASSWORD:-password}" |
               .services."chain-server".environment.POSTGRES_USER = "${POSTGRES_USER:-postgres}" |
               .services."chain-server".environment.POSTGRES_DB = "${POSTGRES_DB:-api}"' rag-app-multiturn-chatbot.json > temp.json && mv temp.json rag-app-multiturn-chatbot.json
            /usr/bin/python3 -c "import yaml, json, sys; print(yaml.safe_dump(json.load(sys.stdin), default_flow_style=False, sort_keys=False))" < rag-app-multiturn-chatbot.json > "${RAG_FOLDER}/${RAG_APP}/docker-compose.yaml"
            rm -rf rag-app-multiturn-chatbot.json
      
            # config compose.env
            cat << EOF > compose.env
            export MODEL_DIRECTORY="/opt/data/${MODEL_DIRECTORY}"
            export NGC_API_KEY=${APIKEY}
            export USERID=$(id -u)
            export LLM_MS_GPU_ID=${LLM_MS_GPU_ID}
            export EMBEDDING_MS_GPU_ID=${EMBEDDING_MS_GPU_ID}
          EOF
      
            if [[ -n "${PGVECTOR_VALUE}" && "${PGVECTOR_VALUE}" != "null" ]]; then 
              cat << EOF >> compose.env
              export POSTGRES_HOST_IP="${POSTGRES_HOST_IP}"
              export POSTGRES_PORT_NUMBER="${POSTGRES_PORT_NUMBER}"
              export POSTGRES_PASSWORD="${POSTGRES_PASSWORD}"
              export POSTGRES_USER="${POSTGRES_USER}"
              export POSTGRES_DB="${POSTGRES_DB}"
          EOF
            fi
          
            touch .initialize
      
            deploy_dcgm_exporter
          fi
      
          # start NGC RAG
          echo "Info: running the RAG application"
          source compose.env
          if [ -z "${PGVECTOR_VALUE}" ] || [ "${PGVECTOR_VALUE}" = "null" ]; then 
            echo "Info: running the pgvector container as the Vector Database"
            docker compose -f ${RAG_FOLDER}/${RAG_APP}/docker-compose.yaml --profile local-nim --profile pgvector up -d
          else
            echo "Info: using the provided DSM PGVector as the Vector Database"
            docker compose -f ${RAG_FOLDER}/${RAG_APP}/docker-compose.yaml --profile local-nim up -d
          fi
          
      - path: /opt/dlvm/utils.sh
        permissions: '0755'
        content: |
          #!/bin/bash
          error_exit() {
            echo "Error: $1" >&2
            vmtoolsd --cmd "info-set guestinfo.vmservice.bootstrap.condition false, DLWorkloadFailure, $1"
            exit 1
          }
      
          check_protocol() {
            local proxy_url=$1
            shift
            local supported_protocols=("$@")
            if [[ -n "${proxy_url}" ]]; then
              local protocol=$(echo "${proxy_url}" | awk -F '://' '{if (NF > 1) print $1; else print ""}')
              if [ -z "$protocol" ]; then
                echo "No specific protocol provided. Skipping protocol check."
                return 0
              fi
              local protocol_included=false
              for var in "${supported_protocols[@]}"; do
                if [[ "${protocol}" == "${var}" ]]; then
                  protocol_included=true
                  break
                fi
              done
              if [[ "${protocol_included}" == false ]]; then
                error_exit "Unsupported protocol: ${protocol}. Supported protocols are: ${supported_protocols[*]}"
              fi
            fi
          }
      
          # $@: list of supported protocols
          set_proxy() {
            local supported_protocols=("$@")
      
            CONFIG_JSON_BASE64=$(grep 'config-json' /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            CONFIG_JSON=$(echo ${CONFIG_JSON_BASE64} | base64 --decode)
      
            HTTP_PROXY_URL=$(echo "${CONFIG_JSON}" | jq -r '.http_proxy // empty')
            HTTPS_PROXY_URL=$(echo "${CONFIG_JSON}" | jq -r '.https_proxy // empty')
            if [[ $? -ne 0 || (-z "${HTTP_PROXY_URL}" && -z "${HTTPS_PROXY_URL}") ]]; then
              echo "Info: The config-json was parsed, but no proxy settings were found."
              return 0
            fi
      
            check_protocol "${HTTP_PROXY_URL}" "${supported_protocols[@]}"
            check_protocol "${HTTPS_PROXY_URL}" "${supported_protocols[@]}"
      
            if ! grep -q 'http_proxy' /etc/environment; then
              sudo bash -c 'echo "export http_proxy=${HTTP_PROXY_URL}
              export https_proxy=${HTTPS_PROXY_URL}
              export HTTP_PROXY=${HTTP_PROXY_URL}
              export HTTPS_PROXY=${HTTPS_PROXY_URL}
              export no_proxy=localhost,127.0.0.1" >> /etc/environment'
              source /etc/environment
            fi
            
            # Configure Docker to use a proxy
            sudo mkdir -p /etc/systemd/system/docker.service.d
            sudo bash -c 'echo "[Service]
            Environment=\"HTTP_PROXY=${HTTP_PROXY_URL}\"
            Environment=\"HTTPS_PROXY=${HTTPS_PROXY_URL}\"
            Environment=\"NO_PROXY=localhost,127.0.0.1\"" > /etc/systemd/system/docker.service.d/proxy.conf'
            sudo systemctl daemon-reload
            sudo systemctl restart docker
      
            echo "Info: docker and system environment are now configured to use the proxy settings"
          }
      
          deploy_dcgm_exporter() {
            CONFIG_JSON_BASE64=$(grep 'config-json' /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            CONFIG_JSON=$(echo ${CONFIG_JSON_BASE64} | base64 --decode)
            DCGM_EXPORT_PUBLIC=$(echo "${CONFIG_JSON}" | jq -r '.export_dcgm_to_public // empty')
      
            DCGM_EXPORTER_IMAGE="$REGISTRY_URI_PATH/nvidia/k8s/dcgm-exporter"
            DCGM_EXPORTER_VERSION="3.2.5-3.1.8-ubuntu22.04"
            if [ -z "${DCGM_EXPORT_PUBLIC}" ] || [ "${DCGM_EXPORT_PUBLIC}" != "true" ]; then
              echo "Info: launching DCGM Exporter to collect vGPU metrics, listening only on localhost (127.0.0.1:9400)"
              docker run -d --gpus all --cap-add SYS_ADMIN -p 127.0.0.1:9400:9400 $DCGM_EXPORTER_IMAGE:$DCGM_EXPORTER_VERSION
            else
              echo "Info: launching DCGM Exporter to collect vGPU metrics, exposed on all network interfaces (0.0.0.0:9400)"
              docker run -d --gpus all --cap-add SYS_ADMIN -p 9400:9400 $DCGM_EXPORTER_IMAGE:$DCGM_EXPORTER_VERSION
            fi
          }
    2. Codieren Sie das cloud-init-Skript im base64-Format.
      Sie verwenden ein base64-Codierungstool, wie z. B. https://decode64base.com/, um die codierte Version Ihres cloud-init-Skripts zu generieren.
    3. Erstellen Sie eine Konfigurationsdatei im JSON-Format, in der die Details der pgvector-Datenbank angegeben werden.
      “rag”:{“pgvector”:"postgresql://pgadmin:encoded_pgvector_db_admin_password@pgvector_db_ip_address:5432/pgvector_db_name"}

      Wenn Sie einen Proxyserver für den Internetzugriff konfigurieren müssen, fügen Sie die Eigenschaften http_proxy und https_proxy zu dieser JSON-Konfigurationsdatei hinzu.

    4. Stellen Sie die Deep Learning-VM bereit und übergeben Sie den base64-Wert des cloud-init-Skripts an den OVF-Eingabeparameter user-data und den base64-Wert der JSON-Konfigurationsdatei an den Parameter config-json.