You can deploy artificial intelligence and machine learning workloads on clusters provisioned by the Tanzu Kubernetes Grid. The deployment of artificial intelligence and machine learning workloads requires some initial setup by service providers, and some configuration by organization administrators and tenant users in the cluster creation workflow.

To prepare VMware Cloud Director environment to provision clusters that can handle artificial intelligence and machine learning workloads, service providers must create a vGPU policy and add a vGPU policy to an organization VDC. Once service providers perform these steps, tenant users can deploy artificial intelligence and machine learning workloads to their Tanzu Kubernetes Grid clusters. To create Tanzu Kubernetes Grid clusters with vGPU functionality, see Create a VMware Tanzu Kubernetes Grid Cluster.
Note: vGPU support only extends to Tanzu Kubernetes Grid 1.5.

BIOS Firmware Limitations

VMware Cloud Director Container Service Extension Tanzu Kubernetes Grid templates are built with BIOS firmware, and it is not possible to change this firmware configuration. The BAR1 memory on this firmware cannot exceed 256 MB. NVIDIA Grid cards with more than 256MB of BAR1 memory require EFI firmware. For more information on firmware limitations, refer to VMware vSphere: NVIDIA Virtual GPU Software Documentation.

Create a Custom Image with EFI Firmware

To overcome the BIOS firmware limitations that exist on Tanzu Kubernetes Grid templates, service providers can create a custom image with EFI firmware in vSphere. For instructions, refer to the Configuring vGPU on Tanzu Kubernetes Grid Clusters to allow Artificial Intelligence and Machine Learning Workloads section in Using VMware Cloud Director Container Service Extension as a Service Provider.