Storing ML Models in VMware Private AI Foundation with NVIDIA

Starting from VMware Private AI Foundation with NVIDIA for VMware Cloud Foundation 5.2.1, as an MLOps engineer, you can distribute ML models across deep learning VMs and TKG clusters by using a central Harbor container registry.

Using a central model gallery for your ML models has the following benefits:

Distribute models within your organization that you validated on a deep learning VM. Models on the Internet could contain malicious code or could be tuned for malicious behavior.
Distribute models for continuous delivery across organizations or promoting it between platforms or environments.
Maintain model integrity in your controlled environment.
By using the Harbor project access capabilities, you can restrict access to sensitive data that is used to train and tune a model.
Store metadata in the Open Container Initiative (OCI) format describing the contents and dependencies of your ML models. In this way, you can identify the platforms that can run a target model.

You can use Harbor to store models from the NVIDIA NGC catalog, Hugging Face, and another ML model catalog.

What Is a Model Gallery?

A model gallery in VMware Private AI Foundation with NVIDIA is a Harbor project with the following configuration:

Table 1. ML Model Gallery in Harbor
Mode Gallery Entity	Harbor or OCI Entity	Requirements
Model gallery	Project with configured user access	Must have a unique name in the Habor registry.
Model	OCI repository	Must have a unique name in the project according to the OCI format.
Revision	OCI artifact	Immutable manifest that is identified by content digest. If the same model data is pushed multiple times, only one revision is stored. If the model data changes, each push operation creates a new revision. You can tag a model revision. Unlike in a container ecosystem, although supported, the `latest` tag is not the default option when pulling a model.
File	OCI layer and blob	-

For example, Harbor uses the following format for an example Llama model revision:

Pushing to and Pulling Models from a Model Gallery

In VMware Private AI Foundation with NVIDIA, you can use a deep learning VM to validate and fine tune models, downloaded from a public container registry or from a container registry in your organization. Then, you use the pais command line utility on the VM to push to and pull models from and across model galleries in Harbor.

Updating the pais Command Line Utility in Deep Learning VMs

You can download new versions and hot fixes of the pais CLI from the Broadcom Support portal. To update the pais CLI in your deep learning VMs from a local package repository.

Download the pais binary file from the VMware Private AI Services CLI download group on the Broadcom Support portal in the current user directory on the deep learning VM.
Set execution permissions on the pais file.
```
chmod 755 ./pais
```
Verify that the version of the binary file is the expected one.
```
./pais --version
```
Back up the active pais version in the /usr/local/bin directory.
```
mv /usr/local/bin /usr/local/bin/pais-backup
```
Replace the active version of the pais binary file with the downloaded one.
```
sudo mv ./pais /usr/local/bin/pais
```

Before You Begin Storing ML Models in VMware Private AI Foundation with NVIDIA

Storing ML models in Harbor requires the use of deep learning VMs that you can deploy by using VMware Aria Automation or request from your cloud administrator or DevOps engineer.

Verify that VMware Private AI Foundation with NVIDIA is configured up to this step of the deployment workflow. See Preparing VMware Cloud Foundation for Private AI Workload Deployment.
Verify that you have access to a Harbor registry with read-write permissions to a project in Harbor. See Setting Up a Private Harbor Registry in VMware Private AI Foundation with NVIDIA.
If your organization uses VMware Aria Automation for AI development, verify that the following prerequisites are in place:
- Verify that your cloud administrator has set up the VMware Aria Automation catalog for private AI application deployment according to Add Private AI items to the Automation Service Broker catalog.
- Verify the your cloud administrator has assigned the user role that is required for deploying deep learning VMs.

Example Workflow: Validate and Upload an ML Model to a Model Gallery

As an MLOps engineer, you validate ML models that are onboarded against the security, privacy, and technical requirements of your organization. You can then upload the models to a dedicated model gallery for use by AI application developers or MLOps engineers for automated CI/CD based deployment of model runtimes.

Procedure

Deploy a deep learning VM with a Triton Inference Server and open an SSH connection to it as vmware.

You can use one of the following workflows. As an MLOps engineer, you can directly deploy a database from VMware Aria Automation. Otherwise, you request VM deployment from your cloud administrator or DevOps engineer.


Deployment Workflow	Required User Role	Description
Deploy by using a self-service catalog item in VMware Aria Automation	MLOps engineer	See Deploy a Deep Learning VM with NVIDIA Triton Inference Server by Using a Self-Service Catalog Item in VMware Aria Automation.
Deploy directly on a vSphere cluster.	Cloud administrator	See Deploy a Deep Learning VM Directly on a vSphere Cluster in VMware Private AI Foundation with NVIDIA.
Deploy by using the `kubectl` command.	DevOps engineer	See Deploy a Deep Learning VM by Using the kubectl Command in VMware Private AI Foundation with NVIDIA.

If the model is hosted in Hugging Face, you can install the huggingface-cli command utility as part of the cloud-init script and use it to download open-weights models hosted on the Hugging Face Hub. Use the --local-dir flag to download the model without symbolic links so that the pais CLI can process the model.

Add the issuer certificate of the Harbor registry to the certificate trust store on the deep learning VM.
1. Request the CA certificate from the Harbor registry administrator.
2. Upload the certificate to the virtual machine, for example, by using a secure copy protocol (scp) client on your workstation.
  For example:
```
scp infra/my-harbor-issuing-ca.crt [email protected]:
```
3. Copy the certificate to the /usr/local/share/ca-certificates directory and add it to the trust store.
  For example:
```
sudo cp my-harbor-issuing-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
```
4. To save the changes, restart the docker service.
```
sudo systemctl restart docker
```
Download the model you plan to validate into the deep learning VM from the NVIDIA NGC catalog, Hugging Face, or another model hub.
Perform runtime tests in the deep learning VM as a sandbox.
1. Validate the integrity of the model by checking the hash code of the model files.
2. Scan the model files for malware.
3. Scan the model for serialization attacks.
4. Validate that inference works as expected by using the Triton Inference Server.
  See Triton Inference Server.
5. Evaluate model's performance and safety for your business use case.
  For example, you can examine inference requests for malicious behavior and perform hands-on functional tests.

docker login -u my_harbor_user_name my-harbor-repo-mycompany.com

From the directory that contains the model files, push the model to the Harbor project for the target organization.
For example, to upload a bge-small-en-v1.5 model from Beijing Academy of Artificial Intelligence (BAAI) to a store for AI application developers:
```
cd ./baai/bge-small-en-v1.5
pais models push --modelName baai/bge-small-en-v1.5 --modelStore my-harbor-repo-mycompany.com/dev-models --tag approved
```
Pushing again to the same modelName will create new revisions of the model which can be tagged according to your preference. Each revision is assigned a unique digest to maintain integrity.
List the models available in the model gallery and verify whether your model was uploaded.
```
pais models list --modelStore my-harbor-repo-mycompany.com/dev-models
```

What to do next

By using the pais command-line utility in deep learning VM instances, copy models between model galleries to make them available across organizations or functional areas.

Example Workflow: Deploy an ML Model to Run Inference

After an ML model is validated and uploaded to the model gallery in your Harbor registry, an MLOps engineer can run that model for inference on a deep learning VM running a Triton Inference Server from NVIDIA NGC catalog.

You pull ML models from Harbor in a similar way when deploying a RAG deep learning VM using the default or your own knowledge base. See Deploy a Deep Learning VM with a RAG Workload.

Procedure

Deploy a deep learning VM with a Triton Inference Server and open an SSH connection to it as vmware.


Deployment Workflow	Required User Role	Description
Deploy by using a self-service catalog item in VMware Aria Automation	MLOps engineer	See Deploy a Deep Learning VM with NVIDIA Triton Inference Server by Using a Self-Service Catalog Item in VMware Aria Automation.
Deploy directly on a vSphere cluster.	Cloud administrator	See Deploy a Deep Learning VM Directly on a vSphere Cluster in VMware Private AI Foundation with NVIDIA.
Deploy by using the `kubectl` command.	DevOps engineer	See Deploy a Deep Learning VM by Using the kubectl Command in VMware Private AI Foundation with NVIDIA.

Add the issuer certificate of the Harbor registry to the certificate trust store on the deep learning VM.
1. Request the CA certificate from the Harbor registry administrator.
2. Upload the certificate to the virtual machine, for example, by using a secure copy protocol (scp) client on your workstation.
  For example:
```
scp infra/my-harbor-issuing-ca.crt [email protected]:
```
3. Copy the certificate to the /usr/local/share/ca-certificates directory and add it to the trust store.
  For example:
```
sudo cp my-harbor-issuing-ca.crt /usr/local/share/ca-certificates/
sudo update-ca-certificates
```
4. To save the changes, restart the docker service.
```
sudo systemctl restart docker
```

docker login -u my_harbor_user_name my-harbor-repo-mycompany.com

Pull the model you plan to run inference on.

pais models pull --modelName baai/bge-small-en-v1.5 --modelStore my-harbor-repo-mycompany.com/dev-models --tag approved

Create a model repository for the Triton Inference Server and start sending model inference requests.
See Triton Inference Server.