Pull Logs to Troubleshoot TKG Clusters on Supervisor

Refer to this topic to pull various logs for troubleshooting TKG clusters on Supervisor, including a Supervisor support bundle, the Workload Management log, and CAPI, CAPV, VM Operator, and TKG Controller Manager logs.

Collect a Support Bundle for Supervisor

To troubleshoot TKG cluster errors, you can export the Supervisor logs. Typically the review of such logs in performed in consultation with VMware Support.

Log in to your vSphere IaaS control plane environment using the vSphere Client.
Select Menu > Workload Management.
Select the Supervisor tab.
Select the target Supervisor instance.
Select Export Logs.

Once you have collected the support bundle, refer to the following KB article: Uploading diagnostic information for VMware through the Secure FTP portal: http://kb.vmware.com/kb/2069559. See also Gathering Logs for vSpehre with Tanzu.

Collect a Support Bundle for a TKG Cluster

You can use the TKC Support Bundler utility to collect TKG cluster log files and troubleshoot problems.

To obtain and use the TKC Support Bundler utility, refer to the article Gathering Logs for vSpehre with Tanzu at the VMware Support Knowledge Base.

To collect logs from Windows nodes, provision the Windows nodes with a built-in administrative account when building Windows node images. For information on how to build the Windows image with a custom answer file, see the Provision Administrative Account for Log Collection documentation.

Tail the Workload Management Log File

Tailing the Workload Control Plane (WCP) log file can help you troubleshoot Supervisor and TKG cluster errors.

Establish an SSH connection to the vCenter Server Appliance.
Log in as the root user.

Run the command shell.

You see the following:

Shell access is granted to root
root@localhost [ ~ ]#

Run the following command to tail the WCP log file.
```
tail -f /var/log/vmware/wcp/wcpsvc.log
```

Gather TKG-specific Logs from Supervisor

Supervisor runs several Kubernetes pods that provide infrastructure to TKG 2.0.

kubectl -n vmware-system-capw get deployments.apps
NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
capi-controller-manager                         2/2     2            2           18h
capi-kubeadm-bootstrap-controller-manager       2/2     2            2           18h
capi-kubeadm-control-plane-controller-manager   2/2     2            2           18h
capv-controller-manager                         2/2     2            2           10h
capw-controller-manager                         2/2     2            2           18h
capw-webhook                                    2/2     2            2           18h

The infrastructure pods are deployments that run replicas. You will have to determine which replica is the leader and check its logs for the latest. A non-leader will usually stop after logging something about attempting to acquire the lease.

You will need to log in to Supervisor and use the Supervisor vSphere Namespace to check these pods.

Logs using a label selector may not work, so you may have to flesh out the random string that gets added to the end of the pod name. Piping output to grep 'error' or grep -i 'error' is sometimes a useful start. For example kubectl logs <args> | grep error.

CAPI logs

Cluster API provider:

kubectl logs -n vmware-system-capw -c manager vmware-system-capw-capi-controller-manager-<id>

CAPV logs

Cluster API for vSphere provider:

kubectl logs -n vmware-system-capv -c manager vmware-system-capw-v1alpha3-vmware-system-capv-v1alpha3-controller-manager-<id>

VM Operator logs

VM Operator:

kubectl logs -n vmware-system-vmop -c manager vmware-system-vmop-controller-manager-<id>

TKG Controller Manager logs

GCM Controller Manager

kubectl logs -n vmware-system-tkg -c manager vmware-system-tkg-controller-manager-<id>