Refer to this delta topic if you are using the NVIDIA Delegated Licensing Server (DLS) for your NVIDIA AI Enterprise account.
Cluster Operator Addendum for Deploying AI/ML Workloads on TKGS Clusters
NVIDIA provides a new NVIDIA Licensing Server (NLS) system called DLS which stands for Delegated Licensing Server. For more information, refer to the NVIDIA documentation.
If you are using DLS for your NVAIE account, the steps for preparing to and deploying the NVAIE GPU Operator are different that what is documented here: Cluster Operator Workflow for Deploying AI/ML Workloads on TKGS Clusters. Specifically, Steps 9 and 10 are modified as follows.
Operator Step 9: Prepare to Install the NVAIE GPU Operator
- Create a Secret.
kubectl create secret docker-registry registry-secret \ --docker-server=<users private NGC registry name> --docker-username='$oauthtoken' \ --docker-password=ZmJj…………Ri \ --docker-email=<user-email-address> -n gpu-operator-resources
Note: The password is the user API Key that was previously created on the NVIDIA GPU Cloud (NGC) Portal. - Get a Client Token from the DLS Server.
A user who wishes to use a vGPU license will need to get a token from that DLS license server called a “Client token. The mechanism for doing this is in the NVIDIA documentation.
- Create a ConfigMap object in the TKGS cluster using the Client Token.
Place the Client Token file into a file at <path>/client_configuration_token.tok.
Then, run the following command:
kubectl delete configmap licensing-config -n gpu-operator-resources; > gridd.conf kubectl create configmap licensing-config \ -n gpu-operator-resources --from-file=./gridd.conf --from-file=./client_configuration_token.tok
Note: The grid.conf file used by the DLS is empty. However, both the "--from-file" parameters are required.
Operator Step 10: Install the NVAIE GPU Operator
- Install the NVAIE GPU Operator in the TKGS cluster.
- Install Helm by referring to the Helm documentation.
- Add the
gpu-operator
Helm repository.helm repo add nvidia https://nvidia.github.io/gpu-operator
- Install the GPU Operator using Helm.
export PRIVATE_REGISTRY="<user’s private registry name>" export OS_TAG=ubuntu20.04 export VERSION=470.63.01 export VGPU_DRIVER_VERSION=470.63.01-grid export NGC_API_KEY=Zm……………Ri <- The user’s NGC AP Key export REGISTRY_SECRET_NAME=registry-secret helm show chart . kubectl delete crd clusterpolicies.nvidia.com helm install gpu-operator . -n gpu-operator-resources \ --set psp.enabled=true \ --set driver.licensingConfig.configMapName=licensing-config \ --set operator.defaultRuntime=containerd \ --set driver.imagePullSecrets={$REGISTRY_SECRET_NAME} \ --set driver.version=$VERSION \ --set driver.repository=$PRIVATE_REGISTRY \ --set driver.licensingConfig.nlsEnabled=true
- Verify that DLS has worked.
From within a NVIDIA Driver DaemonSet pod that was deployed by the GPU Operator, execute the
nvidia-smi
command to verify that DLS is working.First, run the following command to get into the pod and bring up a shell session:kubectl exec -it nvidia-driver-daemonset-cvxx6 nvidia-driver-ctr -n gpu-operator-resources – bash
Now you can run the command to verify the DLS setup.nvidia-smi
If DLS is setup correctly, this command should return "Licensed" in the output.