When the TKG Controller provisions a workload cluster, several status conditions are reported that you can use to get direct insight into key aspects of cluster health.

About Cluster Health Conditions

A TKG cluster provisioned comprises several moving parts, all operated by independent but related controllers, working together to build and maintain a set of Kubernetes nodes. The TanzuKubernetesCluster and Cluster objects provide status conditions that give you with fine-grained information about cluster and machine health.

Check Cluster Health

To check the health of a TKG cluster:
  1. Run the command kubectl describe cluster.
    If the status is ready, it means that both the cluster infrastructure and the cluster control plane are ready. For example:
    Status:
      Conditions:
        Last Transition Time:     2020-11-24T21:37:32Z
        Status:                   True
        Type:                     Ready
        Last Transition Time:     2020-11-24T21:37:32Z
        Status:                   True
        Type:                     ControlPlaneReady
        Last Transition Time:     2020-11-24T21:31:34Z
        Status:                   True
        Type:                     InfrastructureReady
    But, if a cluster condition is false, the cluster is not ready, and a message field describes what is wrong. For example, here is the status is False and because the infrastructure is not ready:
    Status:
      Conditions:
        Last Transition Time:     2020-11-24T21:37:32Z
        Status:                   False
        Type:                     Ready
        Last Transition Time:     2020-11-24T21:37:32Z
        Status:                   True
        Type:                     ControlPlaneReady
        Last Transition Time:     2020-11-24T21:31:34Z
        Status:                   False
        Type:                     InfrastructureReady
  2. If the cluster is not ready, run the following command to determine what is wrong with the cluster infrastructure:
    kubectl describe vspherecluster

List of Cluster Health Conditions

The table lists and defines the available health conditions for a TKG cluster.

Condition Description
Ready Summarizes the operational state of a Cluster API object.
Deleting The Status is not True because the underlying object is currently being deleted.
DeletionFailed The Status is not True because the underlying object encountered problems during deletion. This is a warning because the reconciler will retry deletion.
Deleted The Status is not True because the underlying object was deleted.
InfrastructureReady Reports a summary of current status of the infrastructure object defined for this cluster.
WaitingForInfrastructure Reported when a cluster is waiting for the underlying infrastructure to be available. NOTE: This condition is used as a fallback when the infrastructure is not reporting a ready state.
ControlPlaneReady Reported when the cluster control plane is ready.
WaitingForControlPlane Reported when a cluster is waiting for the control plane to be available. NOTE: This condition is used as a fallback when the control plane is not reporting a ready state.

Condition Fields

Each condition may contain several fields.
Type Describes the type of condition. For example, ControlPlaneReady. For the Ready condition, it is a summary of all the other conditions.
Status

Describes the status of the type.

States can be True, False, or Unknown.

Severity

Classification of the Reason.

Info means the reconciliation is happening.

Warning means something might wrong and retry.

Error means an error occured and manual action is required to resolve.

Reason

Provides a reason why the status is False. It can be a waiting for ready or a failure reason. Usually is thrown when the status is False.

Message Human readable information that explains the Reason.