As a cloud administrator, you can utilize your VMware Cloud Foundation stack to manage GPU-enabled infrastructure and AI/ML workload domains. In VMware Aria Automation, you can set up and provide GPU-enabled deep learning virtual machines (DL VM) and Tanzu Kubernetes Grid (TKG) clusters as catalog items that data scientists and DevOps teams in your organization can request in the self-service Automation Service Broker catalog.

What is VMware Private AI Foundation?

VMware Private AI Foundation with NVIDIA provides a platform for provisioning AI workloads on VMware Cloud Foundation with NVIDIA GPUs. In addition, running AI workloads based on NVIDIA GPU Cloud (NGC) containers is specifically validated by VMware by Broadcom. To learn more, see What is VMware Private AI Foundation with NVIDIA.

Private AI Automation Services is the collective name for all VMware Private AI Foundation features that are available in VMware Aria Automation.

Important: The Private AI Automation Services offering is available for VMware Aria Automation 8.16.2.

To get started with Private AI Automation Services, you run the Catalog Setup Wizard in VMware Aria Automation. The wizard helps you connect VMware Private AI Foundation to VMware Aria Automation.

How does the Catalog Setup Wizard work?

Important: The Catalog Setup Wizard is not enabled by default. Contact VMware by Broadcom Professional Services to activate the wizard for your organization.
Using the Catalog Setup Wizard, you do the following tasks:
  1. Add a vCenter cloud account. Cloud accounts are the credentials that are used to collect data from and deploy resources to your vCenter instance.
  2. Add an NVIDIA license.
  3. Select content to add to the Automation Service Broker catalog.
  4. Create a project. The project links your users with cloud account regions, so that they can deploy cloud templates with networks and storage resources to your vCenter instance.
After you run the Catalog Setup Wizard the first time, two catalog items are created in the Automation Service Broker catalog that become available for users in your organization to deploy:
  • AI Workstation – a GPU-enabled virtual machine which can be configured with desired vCPU, vGPU, Memory, and AI/ML software from NVIDIA.
  • AI Kubernetes Cluster – a GPU-enabled Tanzu Kubernetes cluster which can be configured with NVIDIA GPU operator.

You can run the wizard again multiple times if you need to change any of the settings that you provided, like changes in licensing, or if you want to create AI catalog items for other projects. Each time you run the wizard, two new catalog items are created for you in addition to any previously created items.

Before you begin

  • Verify that you are running VMware Aria Automation 8.16.2.
  • Verify that you are running VMware Cloud Foundation 5.1.1, which includes vCenter 8.0 Update U2b.
  • Verify that you have a vCenter cloud account in VMware Aria Automation.
  • Verify that you have an NVIDIA GPU Cloud Enterprise organization with a premium cloud service subscription.
  • Verify that you have a configured GPU-enabled Supervisor cluster via workload management..
  • Configure VMware Aria Automation for VMware Private AI Foundation with NVIDIA. See Set Up VMware Aria Automation for VMware Private AI Foundation with NVIDIA.
  • Complete the VMware Cloud Foundation Quickstart before running the Catalog Setup Wizard. Your SDDC and Supervisor clusters must be registered with VMware Aria Automation. See How do you get started with VMware Aria Automation using the VMware Cloud Foundation Quickstart.
  • Verify that your have generated the licensing .tok file from the NVIDIA licensing server and that you have your NVIDIA NGC Portal API key. The NVIDIA NGC Portal Access key is used to download and install vGPU drivers.
  • Configure Single Sign-On (SSO) for Cloud Consumption Interface (CCI). See Setting Up Single Sign-On for CCI.
  • Verify that you are subscribed to the content library at https://packages.vmware.com/dl-vm/lib.json.

Procedure

  1. After you install VMware Aria Automation and log in for the first time, click Launch Quickstart.

    Console with the Launch Quickstart tile.

  2. On the Private AI Automation Services card, click Start.
  3. Select the cloud account to provision access to.

    Step 1 of the Catalog Setup Wizard is to select a cloud account.

    Remember that all values here are use case samples. Your account values depend on your environment.

    1. Select a vCenter cloud account.
    2. Select a GPU-enabled supervisor.
    3. Enter a region name.

      Consider using a descriptive name for your region that helps your users distinguish GPU-enabled regions from other available regions.

      A region is automatically selected if the supervisor is already configured with a region.

    4. Click Next.
  4. Provide information about your NVIDIA license server.

    Step 2 of the Catalog Setup Wizard is to add a license.

    1. Select the NVIDIA licensing server type.
      • A Cloud License Service (CLS) instance is hosted on the NVIDIA Licensing Portal.
      • A Delegated License Service (DLS) instance is hosted on-premises at a location and is accessed from а private network. If you select this server type, you must also provide the location of the server.
      For more information, see the NVIDIA License System documentation.
    2. Copy and paste the contents of the license file.
      The NVIDIA Licensing Portal API key is used to evaluate if a user has the right entitlement to download the NVIDIA vGPU drivers. The API key must be a UUID.
      Note: The API key that you generate from the NVIDIA Licensing Portal is not the same as the NVAIE API Key.
    3. Click Next.
  5. Configure the catalog items.

    Step 3 of the Catalog Setup Wizard is to configure catalog items.

    1. Select the VM image you want to use to create the workstation VM.
    2. Select the VM classes you want to make available to your catalog users.
      You must add at least one GPU-capable and one non-GPU-capable class.
      • GPU-enabled VM classes are used for the deep learning VM and for the worker nodes of the Kubernetes cluster. When the catalog item is deployed, the Kubernetes cluster is created with the selected VM classes.
      • Non-GPU-capable nodes are required to run the Kubernetes control planes.
    3. Select the storage class to apply to the virtual machines.
    4. Specify the container registry where you want to pull NVIDIA GPU Cloud resources.

      If you select a self-hosted registry, the catalog items require additional manual configuration after you complete the wizard. Contact VMware by Broadcom Professional Services.

    5. Click Next.
  6. Configure access to the catalog items by creating a project and assigning users.

    Step 4 of the Catalog Setup Wizard is to configure user access to the catalog items.

    Projects are used to manage people, assigned resources, cloud templates, and deployments.

    1. Enter a name and description for the project.

      The project name must contain only lowercase alphanumeric characters or hyphens (-).

    2. To make the catalog items available to others, add an Administrator and Members.

      Administrators have more permissions than the members have. For more information, see What are the VMware Aria Automation user roles.

    3. Click Next.
  7. Verify your configuration on the Summary page.

    Consider saving the details for your configuration before running the wizard.

  8. Click Finish.

Results

The AI Workstation and the AI Kubernetes Cluster catalog items are created in the Automation Service Broker catalog and users in your organization can now deploy them.

View of the Service Broker Catalog page with the two Private AI Foundation catalog items.

What to do next

Troubleshooting

  • If the Catalog Setup Wizard fails, run the wizard again for a different project.