After you deploy a deep learning VM in VMware Private AI Foundation with NVIDIA, downloading the specified DL workload on the virtual machine fails with error log messages indicating invalid authentication credentials.

Problem

If you are installing a DL workload container image, such as Triton Inference Server, TensorFlow or Pytorch, the /var/log/dl.log file contains the following message:

Unable to find image 'nvcr.io/nvidia/tritonserver-pb24h1:24.03.02-py3' locally
docker: Error response from daemon: unauthorized: <html>
<head><title>401 Authorization Required</title></head>
<body>

For NVIDIA RAG, the /var/log/dl.log file contains the following message:

Error: Invalid apikey
chmod: cannot access 'llama2-13b-chat_vh100x2_fp16_24.02': No such file or directory
 
Error: Invalid apikey
chmod: cannot access 'nv-embed-qa_v4': No such file or directory
stat /opt/data/rag-docker-compose_v24.03/docker-compose-vectordb.yaml: no such file or directory
stat /opt/data/rag-docker-compose_v24.03/rag-app-text-chatbot.yaml: no such file or directory

Cause

The authentication to the nvcr.io container registry has failed. As a result, the DL workload image cannot be downloaded on the virtual machine.

Solution

  • Verify the credentials for login to the nvcr.io registry passed as OVF parameters or to the catalog setup wizard for private AI in VMware Aria Automation.

    • Registry: nvcr.io
    • Registry user account: $oauthtoken
    • Registry password: NGC portal API key
  • Verify that the NVIDIA NGC portal API key has the permissions to access the required resources and that the key has not expired.