Refer to this topic for various techniques to check the health of Supervisor with respect to TKG components.
Check the State of Supervisor Pods
Supervisor pods run TKG infrastructure components.
kubectl get pods -A | grep "Running"
grep -v "Running"
to return pods that are not Running.
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-855c5b4cfd-8w4hp 1/1 Running 0 27d kube-system coredns-855c5b4cfd-bx2hk 1/1 Running 0 27d kube-system coredns-855c5b4cfd-rrb5n 1/1 Running 0 27d kube-system docker-registry-423f01b9b30c727e9c237a0031999b14 1/1 Running 0 27d kube-system docker-registry-423f568f75dcb48725b0d768b7e4bdf5 1/1 Running 0 27d kube-system docker-registry-423f930ca2413d96beef34526c2e61b4 1/1 Running 0 27d kube-system etcd-423f01b9b30c727e9c237a0031999b14 1/1 Running 1 (27d ago) 27d kube-system etcd-423f568f75dcb48725b0d768b7e4bdf5 1/1 Running 1 (27d ago) 27d kube-system etcd-423f930ca2413d96beef34526c2e61b4 1/1 Running 1 (27d ago) 27d kube-system kube-apiserver-423f01b9b30c727e9c237a0031999b14 1/1 Running 1 (27d ago) 27d kube-system kube-apiserver-423f568f75dcb48725b0d768b7e4bdf5 1/1 Running 1 (27d ago) 27d kube-system kube-apiserver-423f930ca2413d96beef34526c2e61b4 1/1 Running 1 (27d ago) 27d kube-system kube-controller-manager-423f01b9b30c727e9c237a0031999b14 1/1 Running 0 27d kube-system kube-controller-manager-423f568f75dcb48725b0d768b7e4bdf5 1/1 Running 0 27d kube-system kube-controller-manager-423f930ca2413d96beef34526c2e61b4 1/1 Running 0 27d kube-system kube-proxy-8h499 1/1 Running 0 27d kube-system kube-proxy-bm7qt 1/1 Running 0 27d kube-system kube-proxy-dnmq2 1/1 Running 0 27d kube-system kube-scheduler-423f01b9b30c727e9c237a0031999b14 2/2 Running 13 (25d ago) 27d kube-system kube-scheduler-423f568f75dcb48725b0d768b7e4bdf5 2/2 Running 0 27d kube-system kube-scheduler-423f930ca2413d96beef34526c2e61b4 2/2 Running 0 27d kube-system kubectl-plugin-vsphere-423f01b9b30c727e9c237a0031999b14 1/1 Running 3 (27d ago) 27d kube-system kubectl-plugin-vsphere-423f568f75dcb48725b0d768b7e4bdf5 1/1 Running 3 (27d ago) 27d kube-system kubectl-plugin-vsphere-423f930ca2413d96beef34526c2e61b4 1/1 Running 3 (27d ago) 27d kube-system wcp-authproxy-423f01b9b30c727e9c237a0031999b14 1/1 Running 0 27d kube-system wcp-authproxy-423f568f75dcb48725b0d768b7e4bdf5 1/1 Running 0 27d kube-system wcp-authproxy-423f930ca2413d96beef34526c2e61b4 1/1 Running 0 27d kube-system wcp-fip-423f01b9b30c727e9c237a0031999b14 1/1 Running 0 27d kube-system wcp-fip-423f568f75dcb48725b0d768b7e4bdf5 1/1 Running 0 27d kube-system wcp-fip-423f930ca2413d96beef34526c2e61b4 1/1 Running 0 27d svc-tmc-c63 agent-updater-69f6598bcd-zrkwq 1/1 Running 0 27d svc-tmc-c63 agentupdater-workload-27696934--1-vz5sg 0/1 Completed 0 35s svc-tmc-c63 cluster-health-extension-68948f657-4gpcd 1/1 Running 0 27d svc-tmc-c63 extension-manager-f8886bfb7-vdsm9 1/1 Running 0 27d svc-tmc-c63 extension-updater-79b4787cf6-bwssn 1/1 Running 0 27d svc-tmc-c63 intent-agent-66576db5bd-lj2gk 1/1 Running 0 5d6h svc-tmc-c63 sync-agent-f9c68cc58-6zddj 1/1 Running 0 6d svc-tmc-c63 tmc-agent-installer-27696934--1-jgwvw 0/1 Completed 0 35s svc-tmc-c63 tmc-auto-attach-6488b9cd8b-xdfzz 1/1 Running 0 18h svc-tmc-c63 vsphere-resource-retriever-58985c99cb-68h6v 1/1 Running 0 18h vmware-system-appplatform-operator-system vmware-system-appplatform-operator-mgr-0 1/1 Running 0 27d vmware-system-appplatform-operator-system vmware-system-psp-operator-mgr-587f66646d-xxvmr 1/1 Running 0 27d vmware-system-capw capi-controller-manager-766c6fc449-4qqvf 2/2 Running 423 (26d ago) 27d vmware-system-capw capi-controller-manager-766c6fc449-bcpdq 2/2 Running 410 (26d ago) 27d vmware-system-capw capi-controller-manager-766c6fc449-rnznx 2/2 Running 0 26d vmware-system-capw capi-kubeadm-bootstrap-controller-manager-58fd767b49-585f2 2/2 Running 402 (25d ago) 27d vmware-system-capw capi-kubeadm-bootstrap-controller-manager-58fd767b49-96q6m 2/2 Running 398 (25d ago) 27d vmware-system-capw capi-kubeadm-bootstrap-controller-manager-58fd767b49-nssgq 2/2 Running 407 (25d ago) 27d vmware-system-capw capi-kubeadm-control-plane-controller-manager-559df997b-762jr 2/2 Running 193 (26d ago) 27d vmware-system-capw capi-kubeadm-control-plane-controller-manager-559df997b-bb42s 2/2 Running 189 (26d ago) 27d vmware-system-capw capi-kubeadm-control-plane-controller-manager-559df997b-wxhqv 2/2 Running 199 (26d ago) 27d vmware-system-capw capw-controller-manager-6dd47d75b-6ncxk 2/2 Running 400 (25d ago) 27d vmware-system-capw capw-controller-manager-6dd47d75b-k2ph4 2/2 Running 399 (25d ago) 27d vmware-system-capw capw-controller-manager-6dd47d75b-np9sg 2/2 Running 403 (25d ago) 27d vmware-system-capw capw-webhook-5484757c7-2pkbt 2/2 Running 0 27d vmware-system-capw capw-webhook-5484757c7-fkt7z 2/2 Running 0 27d vmware-system-capw capw-webhook-5484757c7-r85kw 2/2 Running 0 27d vmware-system-cert-manager cert-manager-6ccbcfcd57-lppgn 1/1 Running 1 (27d ago) 27d vmware-system-cert-manager cert-manager-cainjector-796f7b74db-5qvgn 1/1 Running 3 (27d ago) 27d vmware-system-cert-manager cert-manager-webhook-586948846f-b584m 1/1 Running 0 27d vmware-system-csi vsphere-csi-controller-6d8cfd75cd-66zbj 6/6 Running 0 27d vmware-system-csi vsphere-csi-controller-6d8cfd75cd-b4nhz 6/6 Running 1 (27d ago) 27d vmware-system-csi vsphere-csi-controller-6d8cfd75cd-v6hlf 6/6 Running 0 27d vmware-system-kubeimage image-controller-ff79fb5fc-kd6ts 1/1 Running 0 27d vmware-system-license-operator vmware-system-license-operator-controller-manager-7d555768bnxjb 1/1 Running 0 25d vmware-system-license-operator vmware-system-license-operator-controller-manager-7d555768j2sb8 1/1 Running 0 25d vmware-system-license-operator vmware-system-license-operator-controller-manager-7d555768w7v77 1/1 Running 0 25d vmware-system-logging fluentbit-p24gk 1/1 Running 0 27d vmware-system-logging fluentbit-rj2t8 1/1 Running 0 27d vmware-system-logging fluentbit-xx2lk 1/1 Running 0 27d vmware-system-nsop vmware-system-nsop-controller-manager-65b8445959-66msw 1/1 Running 0 27d vmware-system-nsop vmware-system-nsop-controller-manager-65b8445959-nm6xh 1/1 Running 0 27d vmware-system-nsop vmware-system-nsop-controller-manager-65b8445959-sv5w7 1/1 Running 0 27d vmware-system-nsx nsx-ncp-6f989c9c67-vb4x6 1/1 Running 5 (27d ago) 27d vmware-system-registry vmware-registry-controller-manager-7f49485b9-72kh7 2/2 Running 0 27d vmware-system-tkg masterproxy-tkgs-plugin-8npzx 1/1 Running 0 27d vmware-system-tkg masterproxy-tkgs-plugin-bjtsz 1/1 Running 0 27d vmware-system-tkg masterproxy-tkgs-plugin-v92gt 1/1 Running 0 27d vmware-system-tkg tkgs-plugin-server-5fc4c985c7-bz8jh 1/1 Running 0 27d vmware-system-tkg tkgs-plugin-server-5fc4c985c7-r9wj5 1/1 Running 0 27d vmware-system-tkg tkgs-plugin-server-5fc4c985c7-sdr55 1/1 Running 0 27d vmware-system-tkg vmware-system-tkg-controller-manager-7ffcc55df5-dqkkm 2/2 Running 0 25d vmware-system-tkg vmware-system-tkg-controller-manager-7ffcc55df5-hkvx9 2/2 Running 0 25d vmware-system-tkg vmware-system-tkg-controller-manager-7ffcc55df5-txxrf 2/2 Running 0 25d vmware-system-tkg vmware-system-tkg-state-metrics-5bbb6d668c-7c5vt 2/2 Running 238 (26d ago) 27d vmware-system-tkg vmware-system-tkg-state-metrics-5bbb6d668c-c87zs 2/2 Running 237 (26d ago) 27d vmware-system-tkg vmware-system-tkg-state-metrics-5bbb6d668c-wc46p 2/2 Running 237 (26d ago) 27d vmware-system-tkg vmware-system-tkg-webhook-567f9fd68c-425xs 2/2 Running 0 25d vmware-system-tkg vmware-system-tkg-webhook-567f9fd68c-97d6z 2/2 Running 0 25d vmware-system-tkg vmware-system-tkg-webhook-567f9fd68c-dnkgt 2/2 Running 0 25d vmware-system-ucs upgrade-compatibility-service-5745846d58-tpk67 1/1 Running 0 27d vmware-system-ucs upgrade-compatibility-service-5745846d58-twxkt 1/1 Running 0 27d vmware-system-ucs upgrade-compatibility-service-5745846d58-wzl8x 1/1 Running 0 27d vmware-system-vmop vmware-system-vmop-controller-manager-c8499b9df-5h6f9 2/2 Running 0 27d vmware-system-vmop vmware-system-vmop-controller-manager-c8499b9df-6wgr7 2/2 Running 0 27d vmware-system-vmop vmware-system-vmop-controller-manager-c8499b9df-tvbg6 2/2 Running 0 27d vmware-system-vmop vmware-system-vmop-hostvalidator-8498cc5f4d-vqhnk 1/1 Running 0 27d
kubectl describe pod <POD Name> -n <Namespace>
Check the State of Supervisor Resources
kubectl get tkc
kubectl get cluster-api
kubectl get virtualmachines,virtualmachineservices,virtualmachinesetresourcepolicies
kubectl get virtualmachineimages
kubectl get persistentvolumeclaims,cnsnodevmattachment,cnsvolumemetadatas
kubectl get service,lb,lbm,vnet,vnetif,nsxerrors,nsxnetworkinterfaces
kubectl api-resources --namespaced -o name | paste -d',' -s | xargs kubectl get -n <namespace> > resources_in_namespace.txt
Verify that Cluster API Deployments Are Present
kubectl -n vmware-system-capw get deployments.apps NAME READY UP-TO-DATE AVAILABLE AGE capi-controller-manager 2/2 2 2 18h capi-kubeadm-bootstrap-controller-manager 2/2 2 2 18h capi-kubeadm-control-plane-controller-manager 2/2 2 2 18h capv-controller-manager 2/2 2 2 10h capw-controller-manager 2/2 2 2 18h capw-webhook 2/2 2 2 18h
Check Support Bundle Files
The commands/ folder in the support bundle has journalctl logs that provide details about what happened during the WCP start-up process.
kubectl_describe_virtualmachine.txt
kubectl_describe_tanzukubernetescluster.txt
kubectl_describe_kubeadmconfig.txt
kubectl-describe-pod_kube-system.txt
kubectl-describe-pod_vmware-system-capw.txt
kubectl-describe-pod_vmware-system-tkg.txt
kubectl-describe-pod_vmware-system-ucs.txt
kubectl-describe-pod_vmware-system-vmop.txt
kubectl_describe_cluster_resource_virtualmachineimages.txt
docker_images.txt
Check the Health of a TKG Cluster
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME tkgs-cluster-13-control-plane-dpmjj Ready control-plane,master 12d v1.22.9+vmware.1 10.244.0.25 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11 tkgs-cluster-13-control-plane-nb5r6 Ready control-plane,master 12d v1.22.9+vmware.1 10.244.0.18 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11 tkgs-cluster-13-control-plane-zpcgs Ready control-plane,master 12d v1.22.9+vmware.1 10.244.0.26 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11 tkgs-cluster-13-worker-nodepool-a1-gq458-9d6458d6f-c7t8c Ready <none> 12d v1.22.9+vmware.1 10.244.0.24 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11 tkgs-cluster-13-worker-nodepool-a1-gq458-9d6458d6f-slzvn Ready <none> 12d v1.22.9+vmware.1 10.244.0.19 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11 tkgs-cluster-13-worker-nodepool-a1-gq458-9d6458d6f-vzrsd Ready <none> 12d v1.22.9+vmware.1 10.244.0.22 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11 tkgs-cluster-13-worker-nodepool-a2-tw99z-7b547b7f85-k5h4s Ready <none> 12d v1.22.9+vmware.1 10.244.0.20 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11 tkgs-cluster-13-worker-nodepool-a2-tw99z-7b547b7f85-lkmdx Ready <none> 12d v1.22.9+vmware.1 10.244.0.21 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11 tkgs-cluster-13-worker-nodepool-a2-tw99z-7b547b7f85-qwv98 Ready <none> 12d v1.22.9+vmware.1 10.244.0.23 <none> VMware Photon OS/Linux 4.19.225-3.ph3 containerd://1.5.11
kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE kube-system antrea-agent-58hv7 2/2 Running 0 12d kube-system antrea-agent-6x897 2/2 Running 0 12d kube-system antrea-agent-7d99k 2/2 Running 0 12d kube-system antrea-agent-b7vdv 2/2 Running 0 12d kube-system antrea-agent-dhdlg 2/2 Running 0 12d kube-system antrea-agent-mj4wx 2/2 Running 0 12d kube-system antrea-agent-v7vtv 2/2 Running 0 12d kube-system antrea-agent-x49gz 2/2 Running 1 (12d ago) 12d kube-system antrea-agent-z2gth 2/2 Running 0 12d kube-system antrea-controller-bb59f5fbf-t6cm9 1/1 Running 0 12d kube-system antrea-resource-init-65b586c9db-2cbxx 1/1 Running 0 12d kube-system coredns-5f64c4fff8-2gsqn 1/1 Running 0 12d kube-system coredns-5f64c4fff8-hvkg9 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-control-plane-dpmjj 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-control-plane-nb5r6 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-control-plane-zpcgs 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-worker-nodepool-a1-gq458-9d6458d6f-c7t8c 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-worker-nodepool-a1-gq458-9d6458d6f-slzvn 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-worker-nodepool-a1-gq458-9d6458d6f-vzrsd 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-worker-nodepool-a2-tw99z-7b547b7f85-k5h4s 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-worker-nodepool-a2-tw99z-7b547b7f85-lkmdx 1/1 Running 0 12d kube-system docker-registry-tkgs-cluster-13-worker-nodepool-a2-tw99z-7b547b7f85-qwv98 1/1 Running 0 12d kube-system etcd-tkgs-cluster-13-control-plane-dpmjj 1/1 Running 0 12d kube-system etcd-tkgs-cluster-13-control-plane-nb5r6 1/1 Running 0 12d kube-system etcd-tkgs-cluster-13-control-plane-zpcgs 1/1 Running 0 12d kube-system kube-apiserver-tkgs-cluster-13-control-plane-dpmjj 1/1 Running 0 12d kube-system kube-apiserver-tkgs-cluster-13-control-plane-nb5r6 1/1 Running 0 12d kube-system kube-apiserver-tkgs-cluster-13-control-plane-zpcgs 1/1 Running 0 12d kube-system kube-controller-manager-tkgs-cluster-13-control-plane-dpmjj 1/1 Running 0 12d kube-system kube-controller-manager-tkgs-cluster-13-control-plane-nb5r6 1/1 Running 1 (12d ago) 12d kube-system kube-controller-manager-tkgs-cluster-13-control-plane-zpcgs 1/1 Running 0 12d kube-system kube-proxy-4kp57 1/1 Running 0 12d kube-system kube-proxy-5q8pw 1/1 Running 0 12d kube-system kube-proxy-5th6p 1/1 Running 0 12d kube-system kube-proxy-8m6mx 1/1 Running 0 12d kube-system kube-proxy-dn5lp 1/1 Running 0 12d kube-system kube-proxy-qgmcg 1/1 Running 0 12d kube-system kube-proxy-vbq27 1/1 Running 0 12d kube-system kube-proxy-xhnws 1/1 Running 0 12d kube-system kube-proxy-zgfvn 1/1 Running 0 12d kube-system kube-scheduler-tkgs-cluster-13-control-plane-dpmjj 1/1 Running 0 12d kube-system kube-scheduler-tkgs-cluster-13-control-plane-nb5r6 1/1 Running 1 (12d ago) 12d kube-system kube-scheduler-tkgs-cluster-13-control-plane-zpcgs 1/1 Running 0 12d kube-system metrics-server-774bc4dc99-qp7tb 1/1 Running 0 12d vmware-system-auth guest-cluster-auth-svc-6m6cd 1/1 Running 0 12d vmware-system-auth guest-cluster-auth-svc-h44xf 1/1 Running 0 12d vmware-system-auth guest-cluster-auth-svc-l968n 1/1 Running 0 12d vmware-system-cloud-provider guest-cluster-cloud-provider-5f87d5d7d8-rmd78 1/1 Running 1 (12d ago) 12d vmware-system-csi vsphere-csi-controller-7d858778bd-h7zhg 6/6 Running 4 (12d ago) 12d vmware-system-csi vsphere-csi-controller-7d858778bd-rkl98 6/6 Running 0 12d vmware-system-csi vsphere-csi-controller-7d858778bd-snmk7 6/6 Running 0 12d vmware-system-csi vsphere-csi-node-22fnt 3/3 Running 1 (12d ago) 12d vmware-system-csi vsphere-csi-node-5jtbr 3/3 Running 0 12d vmware-system-csi vsphere-csi-node-87lz6 3/3 Running 0 12d vmware-system-csi vsphere-csi-node-gp9sf 3/3 Running 0 12d vmware-system-csi vsphere-csi-node-k2psv 3/3 Running 0 12d vmware-system-csi vsphere-csi-node-mg8bw 3/3 Running 0 12d vmware-system-csi vsphere-csi-node-pctmv 3/3 Running 0 12d vmware-system-csi vsphere-csi-node-sslrl 3/3 Running 1 (12d ago) 12d vmware-system-csi vsphere-csi-node-zbqbq 3/3 Running 0 12d
kubectl get tkc <clustername>
kubectl describe tkc <clustername>
Check TKG Controller Manager Health
kubectl get deployments -n vmware-system-tkg vmware-system-tkg-controller-manager -o yaml
Check VM Operator Health
kubectl get pods -n vmware-system-vmop NAME READY STATUS RESTARTS AGE vmware-system-vmop-controller-manager-c8499b9df-5h6f9 2/2 Running 0 27d vmware-system-vmop-controller-manager-c8499b9df-6wgr7 2/2 Running 0 27d vmware-system-vmop-controller-manager-c8499b9df-tvbg6 2/2 Running 0 27d vmware-system-vmop-hostvalidator-8498cc5f4d-vqhnk 1/1 Running 0 27d
VM Operator creates the VirtualNetworkInterface
and verifies its status. If a node VM does not get an IP, this is the first area to check. Did the virtual machine creation pass this phase?
VM Operator is also responsible for reconciling the VirtualMachineService
and updating its status. If a TKG cluster Kubernetes API is not accessible by its external IP, check the VM Operator log.
logs
command is for a container. Inside any controller pod is a manager container whose logs you can check.)
kubectl logs -f vmware-system-vmop-controller-manager-c8499b9df-5h6f9 -n vmware-system-vmop manager