How do I deploy VMware Private AI Foundation catalog items in the Automation Service Broker

If your cloud administrator has set up Private AI Automation Services in VMware Aria Automation, you can request AI workfloads using the Automation Service Broker catalog.

Private AI Automation Services support two catalog items in Automation Service Broker that users with the respective permissions can access and request.

AI Workstation – a GPU-enabled virtual machine that can be configured with desired vCPU, vGPU, memory, and AI/ML software from NVIDIA.
AI Kubernetes Cluster – a GPU-enabled Tanzu Kubernetes cluster that can be configured with an NVIDIA GPU operator.

Important: The Private AI Automation Services offering is available for VMware Aria Automation 8.16.2.

Before you begin

Verify that Private AI Automation Services are configured for your project and you have permissions to request AI catalog items.

Remember that all values here are use case samples. Your account values depend on your environment.

Deploy a deep learning virtual machine to a VI workload domain

As a data scientist, you can deploy a single GPU software-defined development environment from the self-service Automation Service Broker catalog. You can customize the GPU-enabled virtual machine with machine parameters to model development requirements, specify the AI/ML software configurations to meet training and inference requirements, and specify the AI/ML packages from the NVIDIA NGC registry via a portal access key.

Procedure

Click the Consume tab in Automation Service Broker.
Click Catalog.
The available catalog items are available to you based on the project your selected. If you didn't select a project, all catalog items that are available to you appear in the catalog.
Locate the AI workstation card and click Request.
Select a project.
Enter a name and description for your deployment.

Configure the AI workstation parameters.

Setting	Sample value
VM class	`A100 Small - 1 vGPU (16 GB), 8 CPUs and 16 GB Memory`
Data disk size	`8 GB`
User password	Enter a password for the defalt user. You might be prompted to reset your password when you first log in.
SSH public key	This setting is optional.

Select a software bundle to install on your workstation.

Setting	Description
PyTorch	The PyTorch NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container also contains software for accelerating ETL (DALI, RAPIDS), Training (cuDNN, NCCL), and Inference (TensorRT) workloads.
TensorFlow	The TensorFlow NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container may also contain modifications to the TensorFlow source code in order to maximize performance and compatibility. This container also contains software for accelerating ETL (DALI, RAPIDS), Training (cuDNN, NCCL), and Inference (TensorRT) workloads.
CUDA Samples	This is a collection of containers to run CUDA workloads on the GPUs. The collection includes containerized CUDA samples for example, vectorAdd (to demonstrate vector addition), nbody (or gravitational n-body simulation) and other examples. These containers can be used for validating the software configuration of GPUs in the system or simply to run some example workloads.
DCGM Exporter	NVIDIA Data Center GPU Manager (DCGM) is a suite of tools for managing and monitoring NVIDIA datacenter GPUs in cluster environments. The monitoring stacks usually consist of a collector, a time-series database to store metrics and a visualization layer. DCGM-Exporter is an exporter for Prometheus to monitor the health and get metrics from GPUs.
Triton Inference Server	Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton is available as a shared library with a C API that allows the full functionality of Triton to be included directly in an application.
Generative AI Workflow - RAG	This reference solution demonstrates how to find business value in generative AI by augmenting an existing foundational LLM to fit your business use case. This is done using retrieval augmented generation (RAG) which retrieves facts from an enterprise knowledge base containing a company’s business data. A reference solution for a powerful RAG-based AI Chatbot is described in this brief including code available in NVIDIA’s Generative AI Examples Github for developers. Pay special attention to ways in which you can augment an LLM with your domain-specific business data to create AI applications that are agile and responsive to new developments.

Enter a custom cloud-init that you want to install in addition to the cloud-init defined for the software bundle.
VMware Aria Automation merges the cloud-init from the software bundle and the custom cloud-init.
Provide your NVIDIANGC Portal access key.
Click Submit.

Deploy an AI-enabled Tanzu Kubernetes cluster

As a DevOps engineer, you can request a GPU-enabled Tanzu Kubernetes cluster, where worker nodes can run AI/ML workloads.

The TKG cluster contains an NVIDIA GPU operator, which is a Kubernetes operator that is responsible for setting up the proper NVIDIA driver for the NVIDIA GPU hardware on the TKG cluster nodes. The deployed cluster is ready-to-use for AI/ML workloads without needing additional GPU-related setup.

Procedure

Locate the AI Kubernetes Cluster card and click Request.
Select a project.
Enter a name and description for your deployment.

Select the number of control pane nodes.

Setting	Sample value
Node count	`1`
VM class	`cpu-only-medium - 8 CPUs and 16 GB Memory` The class selection defines the resources available within the virtual machine.

Select the number of work nodes.

Setting	Description
Node count	`3`
VM class	`a100-medium - 4 vGPU (64 GB), 16 CPUs and 32 GB Memory`

Click Submit.

Results

The deployment contains a supervisor namespace, a TKG cluster with three work nodes, multiple resources inside the TKG cluster, and a carvel application which deploys the GPU-operator application.

Monitor your Private AI deployments

You use the Deployments page to manage your deployments and the associated resources, making changes to deployments, troubleshooting failed deployments, making changes to the resources, and destroying unused deployments.

To manage your deployments, select Consume > Deployments > Deployments.

For more information, see How do I manage my Automation Service Broker deployments.