This topic describes the changes in GenAI on VMware Tanzu Platform for Cloud Foundry.
Note
GenAI on VMware Tanzu Platform for Cloud Foundry (GenAI on Tanzu Platform for short) was initially called GenAI for Tanzu Application Service.
v0.6.0
Release Date: August 13, 2024
Beta Release
Sixth beta release of GenAI on Tanzu Platform.
Caution
GenAI on Tanzu Platform is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment.
Features
- Introduces the concept of Model Capabilities.
- Operators now have the option to select from a range of model capabilities when configuring models.
- Selected capabilities are surfaced in the CF marketplace through plan descriptions, as well as in the binding credentials block.
- Their purpose is to aid in discoverability and to help application libraries such as java-cfenv to auto-configure connections to models.
- Supported capabilities are
chat
, embedding
, image_generation
, vision
, audio_transcription
, audio_speech
, code
, and tools
.
- Adds initial support for logging and metrics.
- Adds support for floating Stemcell versions.
- This allows the tile to pick the most recent
ubuntu-jammy
stemcell available in Tanzu Operations Manager.
- This reduces friction during the installation process as operators no longer have to upload a specific Stemcell version.
- Improves robustness of service related to using BOSH DNS for worker VM host names.
- Previously, worker VMs were addressed and routed to individual BOSH DNS queries that incorporated the instance number, for example,
q-s4-i${INDEX}...
.
- Now, a more general BOSH DNS query is used,
q-s4...
, which rather than returning the IP of a specific instance, returns IPs for all instances in the instance group.
- Load balancing is handled through BOSH DNS, rather than by individual entries in the proxy’s config.yml.
- (vSphere) Adds the ability to configure
vmx_opts
for vGPUs.
- New docs:
Notable component version updates
- Bump Ollama from
0.3.1
to 0.3.4
- Bump genai-boshrelease from
0.6.6
to 0.7.0
Bug fixes
- The improvements to usage of BOSH DNS (described in the features above) fixes a known bug where changing the AZ for a deployed model led to deployment errors.
Known Limitations
- No known limitations at time of release.
CVEs
- No known exploitable CVEs at time of release.
v0.5.0
Release Date: August 01, 2024
Beta Release
Fifth beta release of GenAI on Tanzu Platform.
Caution
GenAI on Tanzu Platform is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment.
Features
- Vastly improved the UX around service offerings and plans in the CF Marketplace.
- Previously, GenAI on Tanzu Platform only supported a single offering (
genai
) with a single plan (shared
).
- This was limiting in a number of ways:
- It made it hard for developers to discover which models were offered as part of the plan.
- It made it hard for operators to understand which models were actually being used.
- It meant there was an “all-or-nothing” approach to model access.
- Now, the UX has moved away from the single
shared
plan and towards a “one-plan-per-model” approach.
- A separate plan is created for each configured model.
- The plan name is equal to the model name, which enables better discoverability of the available models.
- Operators can use
cf enable-service-access
and cf disable-service-access
for more fine-grained control over which users (Orgs) have access to which models.
- Building on top of the new UX, the model name is now returned as part of the binding credential (under the
model_name
key).
- This makes it easier to pragmatically consume and connect to models through the binding.
- Introduced the concept of Model Aliases.
- Previously, GenAI on Tanzu Platform required operators to provide a “Public name” and a “Model name” when configuring models.
- This was confusing and also restrictive (some providers required that the Public name must equal the Model name).
- Model Aliases improve this situation.
- Now, operators must always provide the model name and can optionally add one or more aliases for the model.
- The proxy correctly routes requests using either the model name itself or any of its known aliases.
- Added support for pulling models from a configured HTTP endpoint.
- Previously, if using vLLM model provider, models were pulled down from HuggingFace.
- This meant that worker VMs had to have outbound Internet access during installation.
- This is no longer the case, it is now possible to provide a “Model URL”, which you can point to an internally-accessible URL, in order to pull the model down from there instead.
- The same applies for the Ollama model provider.
- Extended the Model Config form to now group specific config by model provider type.
- Added GPU Utilization and Quantization configuration fields for the vLLM model provider.
- Improved integration with VMware Private AI.
- There is now a separate configuration form specific to VMware Private AI.
- Must expose OpenAI API compatible endpoint at this time.
- Added the ability to configure PostgreSQL connection.
- Previously GenAI on Tanzu Platform relied on an internally deployed postgres instance.
- Now operators can configure their own PostgreSQL connection.
- Improved the Release Tests errand.
- It now first checks for the health of all configured models before attempting to consume (bind to) the first configured model only.
- Removed ipex-worker.
- The Ollama model provider is now the recommended way for running models on CPUs.
- New docs:
Notable component version updates
- Bump Ollama from
0.2.6
to 0.3.1
- Bump vLLM from
0.5.2
to 0.5.3.post1
- Bump stemcell from
1.222
to 1.506
- Bump genai-boshrelease from
0.5.2
to 0.6.6
Bug fixes
- Fixed a bug in the vllm-worker post-start script which caused the post-start to not fail upon timeout.
Known Limitations
- There is no upgrade path between v0.4.0 and v0.5.0. You must uninstall older versions of the tile before installing v0.5.0.
CVEs
- No known exploitable CVEs at time of release.
v0.4.0
Release Date: July 23, 2024
Beta Release
Fourth beta release of GenAI on Tanzu Platform.
Caution
GenAI on Tanzu Platform is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment.
Features
- Added the ability to integrate with any OpenAI API compatible endpoint through the new “Integrations” configuration form.
- Added support for deploying models with Ollama.
- Updated the service offering and plan name:
- Previously
genai-shared/shared-ai-plan
- Now
genai/shared
- Note the
shared
plan will be deprecated in the next release, where we intend to move to a “one plan per model” design.
- Model names are now included in the plan description in the cf marketplace.
- Updated the name of the “On-Prem Model Config” configuration form to simply “Model Config”.
- Removed default model from the collection.
- This, along with the new “Integrations” form, allows for installations of the tile that do not actually deploy any models at all and instead simply integrate with existing OpenAI API compatible endpoints.
- New docs:
Known Limitations
- There is no upgrade path between v0.3.0 and v0.4.0. You must uninstall older versions of the tile before installing v0.4.0.
- Worker VMs must have Internet access in order to be able to pull down the models.
- This can be configured by clicking on the Infrastructure Config tab, then selecting the Allow outbound Internet access from service instances checkbox.
- Access to models is currently shared across all service instances. It is not currently possible to offer a subset of models to a subset of orgs and spaces.
- No ability to configure connection to PostgreSQL for the proxy.
CVEs
- No known exploitable CVEs at time of release.
v0.3.0
Release Date: July 11, 2024
Beta Release
Third beta release of GenAI on Tanzu Platform.
Caution
GenAI on Tanzu Platform is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment.
Features
- Implemented a new and improved AI API proxy layer based on top of LiteLLM, which now replaces the previous FastChat-based topology.
- Improved the design and usability of the Tile configuration forms.
- Decluttered the main “LLM Worker” form by splitting out groups of related configuration into new and separate tabs.
- Added the ability to configure your own custom VM Types.
- Users are no longer limited to only using the small range of VM Types / GPUs that are included with the Tile.
Resolved Issues
- No longer logging HuggingFace access tokens in plain text.
Known Limitations
- There is no upgrade path between v0.2.0 and v0.3.0. You must uninstall older versions of the tile before installing v0.3.0.
- Worker VMs must have Internet access in order to be able to pull down the models.
- This can be configured by clicking on the Infrastructure Config tab, then selecting the Allow outbound Internet access from service instances checkbox.
- Access to models is currently shared across all service instances. It is not currently possible to offer a subset of models to a subset of orgs and spaces.
- No ability to configure connection to PostgreSQL for the proxy.
CVEs
- No known exploitable CVEs at time of release.
v0.2.0
Release Date: June 10, 2024
Beta Release
Second beta release of GenAI on Tanzu Platform.
Caution
GenAI on Tanzu Platform is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment.
Features
- Updated the name of the product from “GenAI for Tanzu Application Service” to “GenAI on Tanzu Platform for Cloud Foundry” (subsequently renamed “GenAI on Tanzu Platform”).
- Added ability to specify a HuggingFace access token, which allows you to pull private models or models requiring licenses, for example, Llama2.
- Added additional GPU types to the drop-down menu, including A100.
- Reduced the size of the Bosh release, allowing for quicker download and upload times.
Resolved Issues
- Fixed an issue in which API keys in bindings were lost upon resource changes to the controller VM, for example, stemcell upgrade.
Known Limitations
- Worker VMs must have access to HuggingFace in order to be able to pull down the models.
- If specified, the HuggingFace access token is not redacted in the BOSH logs (appears in plain text).
- Access to models is currently shared across all service instances. It is not currently possible to offer a subset of models to a subset of orgs and spaces.
CVEs
- No known exploitable CVEs at time of release.
v0.1
Release Date: February 5, 2024
Beta Release
Initial beta release of GenAI on Tanzu Platform v0.1.
Features
- Platform operators can pick and choose from a range of LLM models to offer through integration with HuggingFace
-
Application Developers can access an OpenAI-compatible API server to interact with the models on offer.
-
Highly performant services can be achieved through support of a wide range of hardware and software configurations, including support for:
- A variety of modern GPUs (such as t4 and v100).
- vGPU (vSphere only).
- vLLM (dependent on model).
- Embeddings.
-
In addition, each model can be scaled independently (vertically and/or horizontally), allowing platform operators to fine-tune hardware resources according to demand.
-
Seamless integration with Tanzu Application Service for both platform operators and application developers.
- Platform operators can configure and manage the service through Tanzu Operations Manager.
- A Gen AI Service Broker allows application developers to create GenAI service instances and bind them to their apps by using the same tooling and workflows as any other services on the platform.
Known Limitations
- Worker VMs must have access to HuggingFace in order to be able to pull down the models.
- Access to models is currently shared across all service instances. It is not currently possible to offer a subset of models to a subset of orgs and spaces.
CVEs
- No known exploitable CVEs at time of release
Caution
GenAI on Tanzu Platform is currently in beta and is intended for evaluation and test purposes only. Do not use in a production environment.