GenAI on Tanzu Platform

This is documentation for GenAI on VMware Tanzu Platform for Cloud Foundry. You can download the GenAI on Tanzu Platform tile from Broadcom Support.

Caution
GenAI on VMware Tanzu Platform for Cloud Foundry (GenAI on Tanzu Platform) is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment. GenAI on Tanzu Platform was previously called GenAI for Tanzu Application Service.

GenAI on Tanzu Platform allows you to use large language models (LLMs) in your applications. LLMs are a powerful new way to process natural language, which is frequently used in chatbots. The LLMs are hosted on a Tanzu Application Service VM, which enables their lifecycle to be managed through Tanzu Operations Manager.

Privacy: The LLM is a privately hosted model inside the Tanzu Application Service Foundation, which enables the maximum privacy and security available.
Accessibility: The LLM is discoverable in the CF marketplace, which allows development teams to find the resources they need.
Unlimited Tokens: There are no token limitations from the LLM hosted inside of your foundation. The only limitation is the hardware itself.
CPU Hardware Support: For proof of concept installations, LLMs can be run on modern CPUs. The expected response time is within 30 seconds and is often closer to 10 seconds with VMs running 16 cores of 4th generation Xeons.

Product Snapshot

The following table provides version and version-support information about GenAI on Tanzu Platform.

Element	Details
Version	v0.8
Release date	September 27, 2024
Compatible Tanzu Application Service versions	v2.13 or later
Compatible VMware Tanzu Operations Manager versions	v3.0.27 or later
IaaS support	vSphere, AWS, Azure, and GCP

Requirements

The following are the software, hardware, and other requirements for GenAI on Tanzu Platform:

Software Requirements

Tanzu Application Service 2.11 or later
Tanzu Operations Manager 3.0.27 or later

Hardware requirements

Supported Infrastructures: vSphere, AWS, Azure, and GCP. In addition to hardware requirements for Tanzu Application Service and Tanzu Operations Manager, compute nodes with attached GPUs are required. For a list of GPUs that are compatible, see the VMware Compatibility Guide.

vSphere - Hosts that support GPU passthrough with NVIDIA GPUs.

AWS - Access to EC2 instances with NVIDIA GPUs. - IAM Policy needs to be setup to allow for heavy stemcell creation. Here is a link to a sample policy: https://github.com/cloudfoundry/bosh-aws-cpi-release/blob/master/docs/iam-policy.json#L26-L34.

GCP - Access to instances with NVIDIA GPUs.

Azure - Access to instances with NVIDIA GPUs.

While the latest NVIDIA A100 chips are difficult to procure, the GenAI on VMware Tanzu Platform for Cloud Foundry tile works well as a proof of concept on Telsa T4 chips that are widely available.

While modern CPUs are fast, running an LLM on a CPU results in queries that take up to two minutes for a response, versus seconds on a Telsa T4 chip, or milliseconds on the latest NVIDA A100s.

Installation

VMware would like to partner with your organization to help you get up and running. To sign up for our beta program, see VMware Tanzu AI for Tanzu Application Service.

Architecture

Cloud Foundry applications communicate with the AI tile through a controller VM, which can talk to the LLM. The pg vector database enables natural language processing.

Running LLMs on virtual machines, vSphere with NVIDIA AI Enterprise using NVIDIA vGPUs and NVIDIA AI software, delivers from 94% to 105% of the bare metal performance measured as queries served per second for MLPerf Inference v3.0 benchmarks.

Licensing

This tile adds material functionality to the NVIDIA SDK and exercises the Distribution requirements available in the EULA. If NVIDIA GPUs are used, NVIDIA Drivers and CUDA cores must be licensed separately.