Check TKG Cluster Health Using Kubectl

When the TKG Controller provisions a workload cluster, several status conditions are reported that you can use to get direct insight into key aspects of cluster health.

About Cluster Health Conditions

A TKG cluster provisioned comprises several moving parts, all operated by independent but related controllers, working together to build and maintain a set of Kubernetes nodes. The TanzuKubernetesCluster and Cluster objects provide status conditions that give you with fine-grained information about cluster and machine health.

Check Cluster Health

To check the health of a TKG cluster:

Run the command kubectl describe cluster.

If the status is ready, it means that both the cluster infrastructure and the cluster control plane are ready. For example:

Status:
  Conditions:
    Last Transition Time:     2020-11-24T21:37:32Z
    Status:                   True
    Type:                     Ready
    Last Transition Time:     2020-11-24T21:37:32Z
    Status:                   True
    Type:                     ControlPlaneReady
    Last Transition Time:     2020-11-24T21:31:34Z
    Status:                   True
    Type:                     InfrastructureReady

But, if a cluster condition is false, the cluster is not ready, and a message field describes what is wrong. For example, here is the status is False and because the infrastructure is not ready:

Status:
  Conditions:
    Last Transition Time:     2020-11-24T21:37:32Z
    Status:                   False
    Type:                     Ready
    Last Transition Time:     2020-11-24T21:37:32Z
    Status:                   True
    Type:                     ControlPlaneReady
    Last Transition Time:     2020-11-24T21:31:34Z
    Status:                   False
    Type:                     InfrastructureReady

If the cluster is not ready, run the following command to determine what is wrong with the cluster infrastructure:
```
kubectl describe vspherecluster
```

List of Cluster Health Conditions

The table lists and defines the available health conditions for a TKG cluster.

Condition	Description
`Ready`	Summarizes the operational state of a Cluster API object.
`Deleting`	The Status is not True because the underlying object is currently being deleted.
`DeletionFailed`	The Status is not True because the underlying object encountered problems during deletion. This is a warning because the reconciler will retry deletion.
`Deleted`	The Status is not True because the underlying object was deleted.
`InfrastructureReady`	Reports a summary of current status of the infrastructure object defined for this cluster.
`WaitingForInfrastructure`	Reported when a cluster is waiting for the underlying infrastructure to be available. NOTE: This condition is used as a fallback when the infrastructure is not reporting a ready state.
`ControlPlaneReady`	Reported when the cluster control plane is ready.
`WaitingForControlPlane`	Reported when a cluster is waiting for the control plane to be available. NOTE: This condition is used as a fallback when the control plane is not reporting a ready state.

Condition Fields

Each condition may contain several fields.


`Type`	Describes the type of condition. For example, `ControlPlaneReady`. For the `Ready` condition, it is a summary of all the other conditions.
`Status`	Describes the status of the type. States can be `True`, `False`, or `Unknown`.
`Severity`	Classification of the `Reason`. `Info` means the reconciliation is happening. `Warning` means something might wrong and retry. `Error` means an error occured and manual action is required to resolve.
`Reason`	Provides a reason why the status is `False`. It can be a waiting for ready or a failure reason. Usually is thrown when the status is `False`.
`Message`	Human readable information that explains the `Reason`.