The VM service in the Supervisor in vSphere with Tanzu enables data scientists and DevOps engineers to deploy and run deep learning VMs by using the Kubernetes API.
As a data scientist or DevOps engineer, you use kubectl to deploy a deep learning VM on the namespace configured by the cloud administrator.
Prerequisites
Verify with the cloud administrator that the following prerequisites are in place for the AI-ready infrastructure.
- VMware Private AI Foundation with NVIDIA is deployed and configured. See Deploying VMware Private AI Foundation with NVIDIA.
- A content library with deep learning VMs is added to the namespace for AI workloads. See Create a Content Library with Deep Learning VM Images for VMware Private AI Foundation with NVIDIA.
Procedure
- Log in to the Supervisor control plane.
kubectl vsphere login --server=SUPERVISOR-CONTROL-PLANE-IP-ADDRESS-or-FQDN --vsphere-username USERNAME
- Examine that all required VM resources, such as VM classes and VM images, are in place on the namespace.
See View VM Resources Available on a Namespace in vSphere with Tanzu.
- Prepare the YAML file for the deep learning VM.
Use the vm-operator-api, setting the OVF properties as a ConfigMap object.
For example, you can create a YAML specification example-dl-vm.yaml for an example deep learning VM running PyTorch.
apiVersion: vmoperator.vmware.com/v1alpha1 kind: VirtualMachine metadata: name: example-dl-vm namespace: vpaif-ns labels: app: example-dl-app spec: className: gpu-a30 imageName: vmi-xxxxxxxxxxxxx powerState: poweredOn storageClass: tanzu-storage-policy vmMetadata: configMapName: example-dl-vm-config transport: OvfEnv
apiVersion: v1 kind: ConfigMap metadata: name: example-dl-vm-config namespace: vpaif-ns data: user-data: I2Nsb3VkLWNvbmZpZwogICAgd3JpdGVfZmlsZXM6CiAgICAtIHBhdGg6IC9vcHQvZGx2bS9kbF9hcHAuc2gKICAgICAgcGVybWlzc2lvbnM6ICcwNzU1JwogICAgICBjb250ZW50OiB8CiAgICAgICAgIyEvYmluL2Jhc2gKICAgICAgICBkb2NrZXIgcnVuIC1kIC1wIDg4ODg6ODg4OCBudmNyLmlvL252aWRpYS9weXRvcmNoOjIzLjEwLXB5MyAvdXNyL2xvY2FsL2Jpbi9qdXB5dGVyIGxhYiAtLWFsbG93LXJvb3QgLS1pcD0qIC0tcG9ydD04ODg4IC0tbm8tYnJvd3NlciAtLU5vdGVib29rQXBwLnRva2VuPScnIC0tTm90ZWJvb2tBcHAuYWxsb3dfb3JpZ2luPScqJyAtLW5vdGVib29rLWRpcj0vd29ya3NwYWNl vgpu-license: NVIDIA-client-configuration-token nvidia-portal-api-key: API-key-from-NVIDIA-licensing-portal password: password-for-vmware-user
Note:user-data
is the base64 encoded value for the following cloud-init code:#cloud-config write_files: - path: /opt/dlvm/dl_app.sh permissions: '0755' content: | #!/bin/bash docker run -d -p 8888:8888 nvcr.io/nvidia/pytorch:23.10-py3 /usr/local/bin/jupyter lab --allow-root --ip=* --port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/workspace
apiVersion: vmoperator.vmware.com/v1alpha1 kind: VirtualMachineService metadata: name: example-dl-vm namespace: vpaif-ns spec: ports: - name: ssh port: 22 protocol: TCP targetPort: 22 - name: junyperlab port: 8888 protocol: TCP targetPort: 8888 selector: app: example-dl-app type: LoadBalancer
- Switch to the context of the vSphere namespace created by the cloud administrator.
For example, for a namespace called
example-dl-vm-namespace
.kubectl config use-context example-dl-vm-namespace
- Deploy the deep learning VM.
kubectl apply -f example-dl-vm.yaml
- Verify that the VM has been created by running these commands.
kubectl get vm -n example-dl-vm.yaml
kubectl describe virtualmachine example-dl-vm
ping IP_address_returned_by_kubectl_describe
- Ping the IP address of the virtual machine assigned by the requested networking service.
To get the public address and the ports for access to the deep learning VM, get the details about the load balancer service that has been created.
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE example-dl-vm LoadBalancer <internal-ip-address> <public-IPaddress> 22:30473/TCP,8888:32180/TCP 9m40s
The vGPU guest driver and the specified DL workload is installed the first time you start the deep learning VM.
You can examine the logs or open the JupyterLab notebook that comes with some of the images. See Deep Learning Workloads in VMware Private AI Foundation with NVIDIA.