This topic explains how to use the Tanzu Command Line Interface (CLI) to create, update, retrieve, and delete MachineHealthCheck
objects for control plane and worker nodes.
For more information, see tanzu cluster machinehealthcheck
in the Tanzu CLI Command Reference.
MachineHealthCheck
MachineHealthCheck
is a controller that provides health monitoring and auto-repair for machines. It is automatically enabled in all management and workload clusters, for both control plane and worker nodes. If the controller is enabled when you deploy a class-based cluster with a single machine deployment or a legacy cluster, Tanzu Kubernetes Grid creates two default MachineHealthCheck
objects in the cluster, one for the control plane nodes and one for the worker nodes. For class-based clusters with multiple machine deployments, Tanzu Kubernetes Grid creates one MachineHealthCheck
object for the control plane and one for each machine deployment. These objects are created in the same namespace as the cluster.
If you deactivate the controller, you can re-enable it by using the commands documented in the Create or Update a MachineHealthCheck
Object. You can also use the commands to update existing MachineHealthCheck
objects.
MachineHealthCheck
ObjectFollow the steps below to create or update MachineHealthCheck
objects for your clusters.
Class-based clusters:
To create the default MachineHealthCheck
object for the control plane of a class-based cluster:
tanzu cluster machinehealthcheck control-plane set CLUSTER-NAME
To create the default MachineHealthCheck
object for the worker nodes of a class-based cluster:
If the cluster has a single machine deployment, run:
tanzu cluster machinehealthcheck node set CLUSTER-NAME
If the cluster has multiple machine deployments, run the following command for each machine deployment. This will create the default MachineHealthCheck
object for each machine deployment.
tanzu cluster machinehealthcheck node set CLUSTER-NAME --machine-deployment MACHINE-DEPLOYMENT-NAME
Where:
CLUSTER-NAME
is the name of the target cluster.MACHINE-DEPLOYMENT-NAME
is the name of the machine deployment. For example, md-0
. To retrieve the machine deployment name, run kubectl get cluster CLUSTER-NAME -o yaml
and then locate spec.topology.workers.machineDeployments.name
in the output.Legacy clusters:
To create the default MachineHealthCheck
object for the control plane of a legacy cluster:
tanzu cluster machinehealthcheck control-plane set CLUSTER-NAME --mhc-name MHC-NAME
To create the default MachineHealthCheck
object for the worker nodes of a legacy cluster:
tanzu cluster machinehealthcheck node set CLUSTER-NAME --mhc-name MHC-NAME
Where:
CLUSTER-NAME
is the name of the target cluster.MHC-NAME
is a name you choose for the MachineHealthCheck
object. If not specified, the name is set to CLUSTER-NAME
. If you are running both of these commands, specifying --mhc-name
is required. The --mhc-name
flag is ignored for class-based clusters.You can also use the above commands to create customized MachineHealthCheck
objects or to update existing MachineHealthCheck
objects. To customize or update a MachineHealthCheck
object, you can specify one or more of the flags below.
NoteThese examples assume that you are customizing or updating your
MachineHealthCheck
settings for a class-based cluster with a single machine deployment. When customizing or updating theMachineHealthCheck
object for the worker nodes of a class-based cluster with multiple machine deployments, you must specify the--machine-deployment
flag. For legacy clusters, specify--mhc-name
as described above.
--match-labels
: This option filters machines by label keys and values. You can specify one or more label constraints. The MachineHealthCheck
object is applied to all machines that satisfy the specified constraints. Format the key-value pairs as follows:
tanzu cluster machinehealthcheck control-plane set CLUSTER-NAME --match-labels "key1:value1,key2:value2"
tanzu cluster machinehealthcheck node set CLUSTER-NAME --match-labels "key1:value1,key2:value2"
--max-unhealthy
: If the number of unhealthy machines exceeds the value you set using this flag, the MachineHealthCheck
controller does not perform remediation. The --max-unhealthy
setting defaults to 100%
. You can specify either an absolute number or percentage for this flag.
tanzu cluster machinehealthcheck control-plane set CLUSTER-NAME --max-unhealthy "60%"
tanzu cluster machinehealthcheck node set CLUSTER-NAME --max-unhealthy "60%"
--node-startup-timeout
: This option controls the amount of time that the MachineHealthCheck
controller waits for a machine to join the cluster before considering the machine unhealthy. For example, the commands below set the --node-startup-timeout
option to 21m
:
tanzu cluster machinehealthcheck control-plane set my-cluster --node-startup-timeout 21m
tanzu cluster machinehealthcheck node set my-cluster --node-startup-timeout 21m
If a machine fails to join the cluster within the specified amount of time, the MachineHealthCheck
controller recreates the machine.
--unhealthy-conditions
: This option can set the Ready
, MemoryPressure
, DiskPressure
, PIDPressure
, and NetworkUnavailable
conditions. The MachineHealthCheck
controller uses the conditions that you set to monitor the health of your control plane and worker nodes. To set the status of a condition, use True
, False
, or Unknown
. For example:
tanzu cluster machinehealthcheck control-plane set my-cluster --unhealthy-conditions "Ready:False:5m,Ready:Unknown:5m"
tanzu cluster machinehealthcheck node set my-cluster --unhealthy-conditions "Ready:False:5m,Ready:Unknown:5m"
The example above sets the Ready
condition to False:5m
and Unknown:5m
. If a machine remains in the Unknown
or False
status for longer than 5m
, the MachineHealthCheck
controller considers the machine unhealthy and recreates it.
MachineHealthCheck
ObjectFollow the steps below to retrieve MachineHealthCheck
objects for your clusters. The --mhc-name
flag is ignored for class-based clusters.
To retrieve the MachineHealthCheck
object for the control plane of the target cluster, run:
tanzu cluster machinehealthcheck control-plane get CLUSTER-NAME --mhc-name MHC-NAME
Omit the --mhc-name
flag if the object was created with the default name or if you are targeting a class-based cluster.
To retrieve the MachineHealthCheck
object for the worker nodes of the target cluster, run:
tanzu cluster machinehealthcheck node get CLUSTER-NAME --mhc-name MHC-NAME
Omit the --mhc-name
flag if the object was created with the default name or if you are targeting a class-based cluster.
MachineHealthCheck
ObjectFollow the steps below to delete MachineHealthCheck
objects for your clusters.
Class-based clusters:
To delete the MachineHealthCheck
object for the control plane of a class-based cluster:
tanzu cluster machinehealthcheck control-plane delete CLUSTER-NAME
To delete the MachineHealthCheck
object or objects for the worker nodes of a class-based cluster:
If the cluster has a single machine deployment, run:
tanzu cluster machinehealthcheck node delete CLUSTER-NAME
If the cluster has multiple machine deployments, run the following command for each machine deployment:
tanzu cluster machinehealthcheck node delete CLUSTER-NAME --machine-deployment MACHINE-DEPLOYMENT-NAME
Legacy clusters:
To delete the MachineHealthCheck
object for the control plane of a legacy cluster:
tanzu cluster machinehealthcheck control-plane delete CLUSTER-NAME --mhc-name MHC-NAME
Omit the --mhc-name
flag if the object was created with the default name.
To delete the MachineHealthCheck
object for the worker nodes of a legacy cluster, run:
tanzu cluster machinehealthcheck node delete CLUSTER-NAME --mhc-name MHC-NAME
Omit the --mhc-name
flag if the object was created with the default name.