Troubleshooting Clusters with the diagnostics Plugin

This topic explains how to use the Tanzu CLI to install a diagnostics plugin and then use its tanzu diagnostics commands to diagnose unstable or unresponsive clusters in Tanzu Kubernetes Grid (TKG) with a standalone management cluster.

The diagnostics plugin leverages Crash Diagnostics (Crashd). For how to use Crashd itself to diagnose workload clusters, see Troubleshooting Workload Clusters with Crash Diagnostics in the TKG v2.4 documentation.

Overview: Crashd

Crashd is an open source project that makes it easy troubleshoot problems with Kubernetes clusters.

Crashd uses a script file written in Starlark, a Python-like language, that interacts with your management, workload, or bootstrap clusters to collect infrastructure and cluster information.

Crashd takes the output from the commands run by the script and adds the output to a tar file. The tar file is then saved locally for further analysis.

The Tanzu CLI diagnostics plugin includes an embedded Crashd and default diagnostics script files for collection of all cluster types.

Install the `diagnostics` plugin

To install the Tanzu CLI diagnostics plugin, run:

tanzu plugin install diagnostics

Collect Diagnostics from Workload Clusters

To use the diagnostics plugin to collect workload cluster diagnostics, run:

tanzu diagnostics cluster get --name WORKLOAD-CLUSTER-NAME

Collect Diagnostics from Bootstrap Clusters

To use the diagnostics plugin to collect bootstrap cluster diagnostics, run:

tanzu diagnostics bootstrap-cluster get --name BOOTSTRAP-CLUSTER-NAME

Collect diagnostics from Management Clusters

To collect diagnostics for a management cluster, you need to pass a list of target node IP addresses to the --node-ips command option, for example by embedding a kubectl query piped through awk:

tanzu diagnostics management-cluster get --node-ips $(kubectl get node -o wide | awk 'NR>1 {print $6}')