This topic explains how to use the Tanzu Command Line Interface (CLI) to create, update, retrieve, and delete MachineHealthCheck
objects for workload clusters created by Tanzu Kubernetes Grid.
For more information, see tanzu cluster machinehealthcheck
in the Tanzu CLI Command Reference.
NoteTo support machine health checks for both control plane and workload nodes, the Tanzu CLI v1.6 and later replaces the
tanzu cluster machinehealthcheck set/get/delete
commands with thetanzu cluster machinehealthcheck control-plane set/get/delete
andtanzu cluster machinehealthcheck node set/get/delete
commands. Thetanzu cluster machinehealthcheck set/get/delete
commands are deprecated and will be removed in a future release.
MachineHealthCheck
MachineHealthCheck
is a controller that provides health monitoring and auto-repair for machines. It is automatically enabled in all management and workload clusters, for both control plane and worker nodes. If the controller is enabled when you deploy a cluster, Tanzu Kubernetes Grid creates two default MachineHealthCheck
objects in the cluster, one for the control plane nodes and one for the worker nodes. These objects are created in the same namespace as the cluster.
If you deactivate the controller when you create a workload cluster, you can re-enable it by using the commands documented in the Create or Update a MachineHealthCheck
Object. You can also use the commands to update existing MachineHealthCheck
objects.
MachineHealthCheck
ObjectTo create a default MachineHealthCheck
object,
For the control plane of a cluster, run:
tanzu cluster machinehealthcheck control-plane set CLUSTER-NAME --mhc-name MHC-NAME
For the worker nodes of a cluster, run:
tanzu cluster machinehealthcheck node set CLUSTER-NAME --mhc-name MHC-NAME
Where:
CLUSTER-NAME
is the name of the target cluster.MHC-NAME
is a name you choose for the MachineHealthCheck
object. If not specified, the name is set to CLUSTER-NAME
. If you are running both of these commands, specifying --mhc-name
is required.You can also use the above commands to create customized MachineHealthCheck
objects or to update existing MachineHealthCheck
objects. To customize or update a MachineHealthCheck
object, you can specify one or more of the following flags:
--match-labels
: This option filters machines by label keys and values. You can specify one or more label constraints. The MachineHealthCheck
object is applied to all machines that satisfy the specified constraints. Format the key-value pairs as follows:
tanzu cluster machinehealthcheck control-plane set CLUSTER-NAME --mhc-name MHC-NAME --match-labels "key1:value1,key2:value2"
tanzu cluster machinehealthcheck node set CLUSTER-NAME --mhc-name MHC-NAME --match-labels "key1:value1,key2:value2"
--node-startup-timeout
: This option controls the amount of time that the MachineHealthCheck
controller waits for a machine to join the cluster before considering the machine unhealthy. For example, the commands below set the --node-startup-timeout
option to 21m
:
tanzu cluster machinehealthcheck control-plane set my-cluster --mhc-name my-control-plane-mhc --node-startup-timeout 21m
tanzu cluster machinehealthcheck node set my-cluster --mhc-name my-worker-mhc --node-startup-timeout 21m
If a machine fails to join the cluster within the specified amount of time, the MachineHealthCheck
controller recreates the machine.
--unhealthy-conditions
: This option can set the Ready
, MemoryPressure
, DiskPressure
, PIDPressure
, and NetworkUnavailable
conditions. The MachineHealthCheck
controller uses the conditions that you set to monitor the health of your control plane and worker nodes. To set the status of a condition, use True
, False
, or Unknown
. For example:
tanzu cluster machinehealthcheck control-plane set my-cluster --mhc-name my-control-plane-mhc --unhealthy-conditions "Ready:False:5m,Ready:Unknown:5m"
tanzu cluster machinehealthcheck node set my-cluster --mhc-name my-worker-mhc --unhealthy-conditions "Ready:False:5m,Ready:Unknown:5m"
The example above sets the Ready
condition to False:5m
and Unknown:5m
. If a machine remains in the Unknown
or False
status for longer than 5m
, the MachineHealthCheck
controller considers the machine unhealthy and recreates it.
MachineHealthCheck
ObjectTo retrieve a MachineHealthCheck
object,
For the control plane of the target cluster, run:
tanzu cluster machinehealthcheck control-plane get CLUSTER-NAME --mhc-name MHC-NAME
You can omit the --mhc-name
flag if the object was created with the default name.
For the worker nodes of the target cluster, run:
tanzu cluster machinehealthcheck node get CLUSTER-NAME --mhc-name MHC-NAME
You can omit the --mhc-name
flag if the object was created with the default name.
MachineHealthCheck
ObjectTo delete a MachineHealthCheck
object,
For the control plane of the target cluster, run:
tanzu cluster machinehealthcheck control-plane delete CLUSTER-NAME --mhc-name MHC-NAME
You can omit the --mhc-name
flag if the object was created with the default name.
For the worker nodes of the target cluster, run:
tanzu cluster machinehealthcheck node delete CLUSTER-NAME --mhc-name MHC-NAME
You can omit the --mhc-name
flag if the object was created with the default name.