This page explains how to install GenAI on VMware Tanzu Platform for Cloud Foundry with an existing Tanzu Application Service deployment on VMware vSphere, so that apps hosted on Tanzu Application Service can use AI models running on NVIDIA vGPUs.
CautionGenAI on Tanzu Platform is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment. This document applies to GenAI on Tanzu Platform v0.0.118 and may not apply to later versions.
A Tanzu Application Service environment with:
NVD-AIE-800_550.54.16-1OEM.800.1.0.20613240_23471877.zip
.There are three options for the vGPU driver that’s installed on your ESXi host, which GenAI on Tanzu Platform includes when it creates vGPU-enabled VMs:
nvidia-linux-grid-550_550.54.15_amd64.deb
Each option has pros and cons.
To install a NVIDIA NVAIe or NVIDIA vGPU driver on your ESXi host:
Validate NVIDIA support for your ESXi version, vCenter version, and vGPU card by checking the support matrix for your vGPU driver:
Download the appropriate version of the NVAIe or NVIDIA vGPU software from the NVIDIA License portal.
Extract the downloaded zip file and find the guest driver.
Upload the guest driver to a local file server where GenAI on Tanzu Platform can access it when it creates VMs. This can be either:
Install the NVIDIA GPU vSphere Installation Bundle (VIB) on your ESXi Host, as described in Installing and configuring the NVIDIA VIB on ESXi in the Broadcom Support knowledge base.
NoteIt is critical that the Guest Driver and ESXi host driver / VIB come from the same NVIDIA software release.
Follow the NVIDIA License System Quick Start Guide to set up an NVIDIA license server, either in the cloud or on-premises:
Download the GenAI on Tanzu Platform tile from Broadcom Support. After you have the GenAI on Tanzu Platform tile downloaded, upload the tile to your Tanzu Operations Manager. After the tile has been uploaded and added for installation, follow the steps below for each tile configuration tab:
Configure the AZ for the tile to use where your GPUs are located. This can be an existing AZ within your deployment or a new one added with ESXi hosts that have installed GPUs. VMs deployed by the GenAI tile are new workloads and have no correlation to Deigo or isolation segments.
Most of the GenAI on Tanzu Platform tile configuration is in this tab. See the following section for how to populate each field.
The default populated model is “lmsys/vicuna-7b-v1.5” in the GenAI on Tanzu Platform tile. However, this is simply a string representation of a model located on huggingface. The LLM model does not actually ship with the tile due to size. You can configure this field to match any FastChat compatible model on huggingface. Ensure that whatever model you select runs on your hardware. For more information, see Model Support in the FastChat repo.
This field relates to what model names are advertised on our OpenAI-compatible API server, which is hosted on our FastChat controller. The default configuration is “text-embedding-ada-002,gpt-3.5-turbo,text-davinci-003,vicuna-7b-v1.5” and works fine if you keep vicuna as your model. However, you may need to update this field as you swap models out depending on their interfaces. Valid model names for this field are configured by the platform operator.
You can add additional models that spin up on additional VMs under the “worker additional model” line. The following use case adds the codellama model to run code completion workloads as well. Valid model names for this field are configured by the platform operator.
This is a drop-down menu to select the type of GPU used within the tile. This example uses an NVIDIA Telsa T4, with the NVIDIA NVAIe vGPU drives, so we select the “grid_t4-16c” vgpu profile to use the compute-optimized profile.
This checkbox enables quantization of the model. This can be useful to fit models on smaller graphics cards, such as the Telsa T4. Quantization impacts some performance of the model, so you need to evaluate whether this is needed for your model/card combination. This is checked by default.
Check this box if you plan to use NVIDIA vGPU software (either NVIDIA vGPU or NVIDIA NVAIE vGPU).
The following fields below should be used to populate the required information to successfully install the NVIDIA vGPU drivers and ensure BOSH deployed VMs can communicate with the appropriate NVIDIA License server.
Important Only fill these fields out if you are using NVIDIA VGPU.
This is the address to the NVIDIA license server. In this example, we are using a NVIDIA NLS appliance running on-premises vs. the NVIDIA CLS cloud service.
Get this token from your NVIDIA license server of choice. It is needed to register BOSH deployed VMs and successfully activate the NVIDIA GPU drivers.
In this example, the guest driver is hosted on a TrueNAS fileserver by using the “webdav” protocol. The example string is “http://webdav:[email protected]:8080/webserver/nvidia-linux-grid-550_550.54.15_amd64.deb”. You should be able to host this file on any webserver running on your premises. If you do not have a fileserver to use, you can even host this on Tanzu Application Service itself by using the static file buildpack.
For security, provide the SHA-256 hash value of the NVIDIA driver file. For example, to obtain this on MacOS, run a command and paste in output resembling:
$ shasum -a 256 ./nvidia-linux-grid-550_550.54.15_amd64.deb
cad102a736f6b0b62b7e75d874afb12d84a3cc35dc6f653269439b3033b358e6
Confirm the AZ where your GPUs are located.
Some of these models can be quite large, so assign the appropriate storage size so your LLM workers have enough space to host the models they will run.
These fields are unlikely to apply to an installation on vSphere. If you do have custom VM extensions, you can add those here.