This is documentation for GenAI on VMware Tanzu Platform for Cloud Foundry. You can download the GenAI on Tanzu Platform tile from Broadcom Support.
CautionGenAI on VMware Tanzu Platform for Cloud Foundry (GenAI on Tanzu Platform) is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment. GenAI on Tanzu Platform was previously called GenAI for Tanzu Application Service.
GenAI on Tanzu Platform allows you to use large language models (LLMs) in your applications. LLMs are a powerful new way to process natural language, which is frequently used in chatbots. The LLMs are hosted on a Tanzu Application Service VM, which enables their lifecycle to be managed through Tanzu Operations Manager.
The following table provides version and version-support information about GenAI on Tanzu Platform.
Element | Details |
---|---|
Version | v0.8 |
Release date | September 27, 2024 |
Compatible Tanzu Application Service versions | v2.13 or later |
Compatible VMware Tanzu Operations Manager versions | v3.0.27 or later |
IaaS support | vSphere, AWS, Azure, and GCP |
The following are the software, hardware, and other requirements for GenAI on Tanzu Platform:
Supported Infrastructures: vSphere, AWS, Azure, and GCP. In addition to hardware requirements for Tanzu Application Service and Tanzu Operations Manager, compute nodes with attached GPUs are required. For a list of GPUs that are compatible, see the VMware Compatibility Guide.
vSphere - Hosts that support GPU passthrough with NVIDIA GPUs.
AWS - Access to EC2 instances with NVIDIA GPUs. - IAM Policy needs to be setup to allow for heavy stemcell creation. Here is a link to a sample policy: https://github.com/cloudfoundry/bosh-aws-cpi-release/blob/master/docs/iam-policy.json#L26-L34.
GCP - Access to instances with NVIDIA GPUs.
Azure - Access to instances with NVIDIA GPUs.
While the latest NVIDIA A100 chips are difficult to procure, the GenAI on VMware Tanzu Platform for Cloud Foundry tile works well as a proof of concept on Telsa T4 chips that are widely available.
While modern CPUs are fast, running an LLM on a CPU results in queries that take up to two minutes for a response, versus seconds on a Telsa T4 chip, or milliseconds on the latest NVIDA A100s.
VMware would like to partner with your organization to help you get up and running. To sign up for our beta program, see VMware Tanzu AI for Tanzu Application Service.
Cloud Foundry applications communicate with the AI tile through a controller VM, which can talk to the LLM. The pg vector database enables natural language processing.
Running LLMs on virtual machines, vSphere with NVIDIA AI Enterprise using NVIDIA vGPUs and NVIDIA AI software, delivers from 94% to 105% of the bare metal performance measured as queries served per second for MLPerf Inference v3.0 benchmarks.
This tile adds material functionality to the NVIDIA SDK and exercises the Distribution requirements available in the EULA. If NVIDIA GPUs are used, NVIDIA Drivers and CUDA cores must be licensed separately.