Vous pouvez déployer une VM à apprentissage profond avec une charge de travail NVIDIA RAG à l'aide d'une base de données PostgreSQL pgvector gérée par VMware Data Services Manager.

Pour plus d'informations sur la charge de travail NVIDIA RAG, reportez-vous à la documentation Outil Docker Compose des applications NVIDIA RAG (nécessite des autorisations de compte spécifiques).

Conditions préalables

Procédure

  1. Si, en tant que scientifique des données, vous déployez la VM à apprentissage profond à l'aide d'un élément de catalogue dans VMware Aria Automation, fournissez les détails de la base de données PostgreSQL pgvector après avoir déployé la machine virtuelle.
    1. Déployer un poste de travail RAG dans VMware Aria Automation.
    2. Accédez à Consommer > Déploiements > Déploiements et recherchez le déploiement de VM à apprentissage profond.
    3. Dans la section VM Workstation, enregistrez les détails de la connexion par SSH à la machine virtuelle.
    4. Connectez-vous à la VM à apprentissage profond via SSH à l'aide des informations d'identification disponibles dans Automation Service Broker.
    5. Ajoutez les variables pgvector suivantes au fichier /opt/data/compose.env :
      POSTGRES_HOST_IP=pgvector_db_ip_address
      POSTGRES_PORT_NUMBER=5432
      POSTGRES_DB=pgvector_db_name
      POSTGRES_USER=pgvector_db_admin
      POSTGRES_PASSWORD=encoded_pgvector_db_admin_password
    6. Redémarrez l'application à conteneurs multiples NVIDIA RAG en exécutant les commandes suivantes.
      Par exemple, pour NVIDIA RAG 24.03 :
      cd /opt/data
      docker compose -f rag-docker-compose_v24.03/rag-app-text-chatbot.yaml down
      docker compose -f rag-docker-compose_v24.03/docker-compose-vectordb.yaml down
      docker compose -f rag-docker-compose_v24.03/docker-compose-vectordb.yaml up -d 
  2. Si, en tant qu'ingénieur DevOps, vous déployez la VM à apprentissage profond pour un scientifique des données directement sur le cluster vSphere ou à l'aide de la commande kubectl, créez un script cloud-init et déployez la VM à apprentissage profond.
    1. Créez un script cloud-init pour NVIDIA RAG et la base de données PostgreSQL pgvector que vous avez créée.
      Vous pouvez modifier la version initiale du script cloud-init pour NVIDIA RAG. Par exemple, pour NVIDIA RAG 24.03 et une base de données PostgreSQL pgvector avec les détails de connexion postgres://pgvector_db_admin:encoded_pgvector_db_admin_password@pgvector_db_ip_address:5432/pgvector_db_name.
      #cloud-config
      write_files:
      - path: /opt/dlvm/dl_app.sh
        permissions: '0755'
        content: |
          #!/bin/bash
          set -eu
          source /opt/dlvm/utils.sh
          trap 'error_exit "Unexpected error occurs at dl workload"' ERR
          set_proxy "http" "https"
      
          cat <<EOF > /opt/dlvm/config.json
          {
            "_comment": "This provides default support for RAG: TensorRT inference, llama2-13b model, and H100x2 GPU",
            "rag": {
              "org_name": "cocfwga8jq2c",
              "org_team_name": "no-team",
              "rag_repo_name": "nvidia/paif",
              "llm_repo_name": "nvidia/nim",
              "embed_repo_name": "nvidia/nemo-retriever",
              "rag_name": "rag-docker-compose",
              "rag_version": "24.03",
              "embed_name": "nv-embed-qa",
              "embed_type": "NV-Embed-QA",
              "embed_version": "4",
              "inference_type": "trt",
              "llm_name": "llama2-13b-chat",
              "llm_version": "h100x2_fp16_24.02",
              "num_gpu": "2",
              "hf_token": "huggingface token to pull llm model, update when using vllm inference",
              "hf_repo": "huggingface llm model repository, update when using vllm inference"
            }
          }
          EOF
          CONFIG_JSON=$(cat "/opt/dlvm/config.json")
          INFERENCE_TYPE=$(echo "${CONFIG_JSON}" | jq -r '.rag.inference_type')
          if [ "${INFERENCE_TYPE}" = "trt" ]; then
            required_vars=("ORG_NAME" "ORG_TEAM_NAME" "RAG_REPO_NAME" "LLM_REPO_NAME" "EMBED_REPO_NAME" "RAG_NAME" "RAG_VERSION" "EMBED_NAME" "EMBED_TYPE" "EMBED_VERSION" "LLM_NAME" "LLM_VERSION" "NUM_GPU")
          elif [ "${INFERENCE_TYPE}" = "vllm" ]; then
            required_vars=("ORG_NAME" "ORG_TEAM_NAME" "RAG_REPO_NAME" "LLM_REPO_NAME" "EMBED_REPO_NAME" "RAG_NAME" "RAG_VERSION" "EMBED_NAME" "EMBED_TYPE" "EMBED_VERSION" "LLM_NAME" "NUM_GPU" "HF_TOKEN" "HF_REPO")
          else
            error_exit "Inference type '${INFERENCE_TYPE}' is not recognized. No action will be taken."
          fi
          for index in "${!required_vars[@]}"; do
            key="${required_vars[$index]}"
            jq_query=".rag.${key,,} | select (.!=null)"
            value=$(echo "${CONFIG_JSON}" | jq -r "${jq_query}")
            if [[ -z "${value}" ]]; then 
              error_exit "${key} is required but not set."
            else
              eval ${key}=\""${value}"\"
            fi
          done
      
          RAG_URI="${RAG_REPO_NAME}/${RAG_NAME}:${RAG_VERSION}"
          EMBED_MODEL_URI="${EMBED_REPO_NAME}/${EMBED_NAME}:${EMBED_VERSION}"
      
          NGC_CLI_VERSION="3.41.2"
          NGC_CLI_URL="https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/${NGC_CLI_VERSION}/files/ngccli_linux.zip"
      
          mkdir -p /opt/data
          cd /opt/data
      
          if [ ! -f .file_downloaded ]; then
            # clean up
            rm -rf compose.env ${RAG_NAME}* ${LLM_NAME}* ngc* ${EMBED_NAME}* *.json .file_downloaded
      
            # install ngc-cli
            wget --content-disposition ${NGC_CLI_URL} -O ngccli_linux.zip && unzip ngccli_linux.zip
            export PATH=`pwd`/ngc-cli:${PATH}
      
            APIKEY=""
            REG_URI="nvcr.io"
      
            if [[ "$(grep registry-uri /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')" == *"${REG_URI}"* ]]; then
              APIKEY=$(grep registry-passwd /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            fi
      
            if [ -z "${APIKEY}" ]; then
                error_exit "No APIKEY found"
            fi
      
            # config ngc-cli
            mkdir -p ~/.ngc
      
            cat << EOF > ~/.ngc/config
            [CURRENT]
            apikey = ${APIKEY}
            format_type = ascii
            org = ${ORG_NAME}
            team = ${ORG_TEAM_NAME}
            ace = no-ace
          EOF
      
            # ngc docker login
            docker login nvcr.io -u \$oauthtoken -p ${APIKEY}
      
            # dockerhub login for general components, e.g. minio
            DOCKERHUB_URI=$(grep registry-2-uri /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            DOCKERHUB_USERNAME=$(grep registry-2-user /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            DOCKERHUB_PASSWORD=$(grep registry-2-passwd /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
      
            if [[ -n "${DOCKERHUB_USERNAME}" && -n "${DOCKERHUB_PASSWORD}" ]]; then
              docker login -u ${DOCKERHUB_USERNAME} -p ${DOCKERHUB_PASSWORD}
            else
              echo "Warning: DockerHub not login"
            fi
      
            # get RAG files
            ngc registry resource download-version ${RAG_URI}
      
            # get llm model
            if [ "${INFERENCE_TYPE}" = "trt" ]; then
              LLM_MODEL_URI="${LLM_REPO_NAME}/${LLM_NAME}:${LLM_VERSION}"
              ngc registry model download-version ${LLM_MODEL_URI}
              chmod -R o+rX ${LLM_NAME}_v${LLM_VERSION}
              LLM_MODEL_FOLDER="/opt/data/${LLM_NAME}_v${LLM_VERSION}"
            elif [ "${INFERENCE_TYPE}" = "vllm" ]; then
              pip install huggingface_hub
              huggingface-cli login --token ${HF_TOKEN}
              huggingface-cli download --resume-download ${HF_REPO}/${LLM_NAME} --local-dir ${LLM_NAME} --local-dir-use-symlinks False
              LLM_MODEL_FOLDER="/opt/data/${LLM_NAME}"
              cat << EOF > ${LLM_MODEL_FOLDER}/model_config.yaml 
              engine:
                model: /model-store
                enforce_eager: false
                max_context_len_to_capture: 8192
                max_num_seqs: 256
                dtype: float16
                tensor_parallel_size: ${NUM_GPU}
                gpu_memory_utilization: 0.8
          EOF
              chmod -R o+rX ${LLM_MODEL_FOLDER}
              python3 -c "import yaml, json, sys; print(json.dumps(yaml.safe_load(sys.stdin.read())))" < "${RAG_NAME}_v${RAG_VERSION}/rag-app-text-chatbot.yaml"> rag-app-text-chatbot.json
              jq '.services."nemollm-inference".image = "nvcr.io/nvidia/nim/nim_llm:24.02-day0" |
                  .services."nemollm-inference".command = "nim_vllm --model_name ${MODEL_NAME} --model_config /model-store/model_config.yaml" |
                  .services."nemollm-inference".ports += ["8000:8000"] |
                  .services."nemollm-inference".expose += ["8000"]' rag-app-text-chatbot.json > temp.json && mv temp.json rag-app-text-chatbot.json
              python3 -c "import yaml, json, sys; print(yaml.safe_dump(json.load(sys.stdin), default_flow_style=False, sort_keys=False))" < rag-app-text-chatbot.json > "${RAG_NAME}_v${RAG_VERSION}/rag-app-text-chatbot.yaml"
            fi
      
            # get embedding models
            ngc registry model download-version ${EMBED_MODEL_URI}
            chmod -R o+rX ${EMBED_NAME}_v${EMBED_VERSION}
      
            # config compose.env
            cat << EOF > compose.env
            export MODEL_DIRECTORY="${LLM_MODEL_FOLDER}"
            export MODEL_NAME=${LLM_NAME}
            export NUM_GPU=${NUM_GPU}
            export APP_CONFIG_FILE=/dev/null
            export EMBEDDING_MODEL_DIRECTORY="/opt/data/${EMBED_NAME}_v${EMBED_VERSION}"
            export EMBEDDING_MODEL_NAME=${EMBED_TYPE}
            export EMBEDDING_MODEL_CKPT_NAME="${EMBED_TYPE}-${EMBED_VERSION}.nemo"
            export POSTGRES_HOST_IP=pgvector_db_ip_address
            export POSTGRES_PORT_NUMBER=5432
            export POSTGRES_DB=pgvector_db_name
            export POSTGRES_USER=pgvector_db_admin
            export POSTGRES_PASSWORD=encoded_pgvector_db_admin_password
          EOF
      
            touch .file_downloaded
          fi
      
          # start NGC RAG
          docker compose -f ${RAG_NAME}_v${RAG_VERSION}/docker-compose-vectordb.yaml up -d pgvector
          source compose.env; docker compose -f ${RAG_NAME}_v${RAG_VERSION}/rag-app-text-chatbot.yaml up -d
      
      - path: /opt/dlvm/utils.sh
        permissions: '0755'
        content: |
          #!/bin/bash
          error_exit() {
            echo "Error: $1" >&2
            vmtoolsd --cmd "info-set guestinfo.vmservice.bootstrap.condition false, DLWorkloadFailure, $1"
            exit 1
          }
      
          check_protocol() {
            local proxy_url=$1
            shift
            local supported_protocols=("$@")
            if [[ -n "${proxy_url}" ]]; then
              local protocol=$(echo "${proxy_url}" | awk -F '://' '{if (NF > 1) print $1; else print ""}')
              if [ -z "$protocol" ]; then
                echo "No specific protocol provided. Skipping protocol check."
                return 0
              fi
              local protocol_included=false
              for var in "${supported_protocols[@]}"; do
                if [[ "${protocol}" == "${var}" ]]; then
                  protocol_included=true
                  break
                fi
              done
              if [[ "${protocol_included}" == false ]]; then
                error_exit "Unsupported protocol: ${protocol}. Supported protocols are: ${supported_protocols[*]}"
              fi
            fi
          }
      
          # $@: list of supported protocols
          set_proxy() {
            local supported_protocols=("$@")
      
            CONFIG_JSON_BASE64=$(grep 'config-json' /opt/dlvm/ovf-env.xml | sed -n 's/.*oe:value="\([^"]*\).*/\1/p')
            CONFIG_JSON=$(echo ${CONFIG_JSON_BASE64} | base64 --decode)
      
            HTTP_PROXY_URL=$(echo "${CONFIG_JSON}" | jq -r '.http_proxy // empty')
            HTTPS_PROXY_URL=$(echo "${CONFIG_JSON}" | jq -r '.https_proxy // empty')
            if [[ $? -ne 0 || (-z "${HTTP_PROXY_URL}" && -z "${HTTPS_PROXY_URL}") ]]; then
              echo "Info: The config-json was parsed, but no proxy settings were found."
              return 0
            fi
      
            check_protocol "${HTTP_PROXY_URL}" "${supported_protocols[@]}"
            check_protocol "${HTTPS_PROXY_URL}" "${supported_protocols[@]}"
      
            if ! grep -q 'http_proxy' /etc/environment; then
              echo "export http_proxy=${HTTP_PROXY_URL}
              export https_proxy=${HTTPS_PROXY_URL}
              export HTTP_PROXY=${HTTP_PROXY_URL}
              export HTTPS_PROXY=${HTTPS_PROXY_URL}
              export no_proxy=localhost,127.0.0.1" >> /etc/environment
              source /etc/environment
            fi
            
            # Configure Docker to use a proxy
            mkdir -p /etc/systemd/system/docker.service.d
            echo "[Service]
            Environment=\"HTTP_PROXY=${HTTP_PROXY_URL}\"
            Environment=\"HTTPS_PROXY=${HTTPS_PROXY_URL}\"
            Environment=\"NO_PROXY=localhost,127.0.0.1\"" > /etc/systemd/system/docker.service.d/proxy.conf
            systemctl daemon-reload
            systemctl restart docker
      
            echo "Info: docker and system environment are now configured to use the proxy settings"
          }
    2. Codez le script cloud-init au format base64.
      Utilisez un outil de codage au format base64, tel que https://decode64base.com/ pour générer la version codée de votre script cloud-init.
    3. Déployez la VM à apprentissage profond, en transmettant la valeur base64 du script cloud-init au paramètre d'entrée user-data.