After you complete the implementation of the Private AI Ready Infrastructure for VMware Cloud Foundation validated solution and VMware Private AI Foundation with NVIDIA, you perform common operations on the environment, such as examining the operational state of the NVAIE components added to the environment during the implementation.
Verify the operational state of the NVIDIA AI Enterprise (NVAIE) Kubernetes Operators and host components by checking their state and health status.
Validate that the NVAIE components are properly functioning and ready for GPU-enabled workloads running on top of VMware Cloud Foundation.
Prerequisites
Install the vSphere kubectl plug-in to connect to the Supervisor as a vCenter Single Sign-On user. See Download and Install the Kubernetes CLI Tools for vSphere.
Verify the Status of the ESXi Host Components for Private AI Ready Infrastructure for VMware Cloud Foundation
Verify the operational state of the ESXi host by checking its state and health status.
Expected Outcomes |
---|
|
Procedure
Verify the Status of the GPU Operator for Private AI Ready Infrastructure for VMware Cloud Foundation
Verify the operational state of the NVIDIA GPU Operator by checking its state and health status.
Expected Outcomes |
---|
|
Procedure
What to do next
Troubleshooting Tips |
---|
|
Verify the Status of the Network Operator for Private AI Ready Infrastructure for VMware Cloud Foundation
Verify the operational state of the NVIDIA Network Operator by checking its state and health status.
Expected Outcomes |
---|
|
Procedure
What to do next
Troubleshooting Tips |
---|
|