This topic describes how to use the Tanzu Kubernetes Grid CLI to create, update, retrieve, and delete MachineHealthCheck
objects for Tanzu Kubernetes clusters.
MachineHealthCheck
MachineHealthCheck
is a controller that provides node health monitoring and node auto-repair for Tanzu Kubernetes clusters.
This controller is enabled in the global Tanzu Kubernetes Grid configuration by default, for all Tanzu Kubernetes clusters. You can override your global Tanzu Kubernetes Grid configuration for individual Tanzu Kubernetes clusters in two ways:
MachineHealthCheck
in either the Tanzu Kubernetes Grid installer interface or the .tkg/config.yaml
file. Each Tanzu Kubernetes cluster that you deploy with your management cluster inherits this configuration by default. For more information, see Deploying and Managing Management Clusters.MachineHealthCheck
objects for individual Tanzu Kubernetes clusters. See the sections below.When MachineHealthCheck
is enabled in a Tanzu Kubernetes cluster, it runs in the same namespace as the cluster.
MachineHealthCheck
To create a MachineHealthCheck
with the default configuration, run the following command:
tkg set machinehealthcheck CLUSTER-NAME
Where CLUSTER-NAME
is the name of the Tanzu Kubernetes cluster you want to monitor.
You can also use this command to create MachineHealthCheck
objects with custom configuration options or update existing MachineHealthCheck
objects. To set custom configuration options for a MachineHealthCheck
, run the tkg set machinehealthcheck
command with one or more of the following:
--mhc-name
: By default, when you run tkg set machinehealthcheck CLUSTER-NAME
, the command sets the name of the MachineHealthCheck
to CLUSTER-NAME
. Specify the --mhc-name
option if you want to set a different name. For example:
tkg set machinehealthcheck my-cluster --mhc-name my-mhc
--match-labels
: This option filters machines by label keys and values. You can specify one or more label constraints. The MachineHealthCheck
is applied to all machines that satisfy these constraints. Use the syntax below:
tkg create machinehealthcheck my-cluster --match-labels "key1:value1,key2:value2"
For example:
tkg set machinehealthcheck my-cluster --match-labels "node-pool:my-cluster-worker-pool"
--node-startup-timeout
: This option controls the amount of time that the MachineHealthCheck
waits for a machine to join the cluster before considering the machine unhealthy. For example, the command below sets the --node-startup-timeout
option to 10m
:
tkg set machinehealthcheck my-cluster --node-startup-timeout 10m
If a machine fails to join the cluster within this amount of time, the MachineHealthCheck
recreates the machine.
--unhealthy-conditions
: This option can set the Ready
, MemoryPressure
, DiskPressure
, PIDPressure
, and NetworkUnavailable
conditions. The MachineHealthCheck
uses the conditions that you set to determine whether a node is unhealthy. To set the status of a condition, use True
, False
, or Unknown
. For example:
tkg set machinehealthcheck my-cluster --unhealthy-conditions "Ready:False:5m,Ready:Unknown:5m"
In the example above, if the status of the Ready
node condition remains Unknown
or False
for longer than 5m
, the MachineHealthCheck
considers the machine unhealthy and recreates it.
MachineHealthCheck
To retrieve a MachineHealthCheck
object, run the following command:
tkg get machinehealthcheck CLUSTER-NAME
If you assigned a non-default name to the object, specify the --mhc-name
flag.
MachineHealthCheck
To delete a MachineHealthCheck
object, run the following command:
tkg delete machinehealthcheck CLUSTER-NAME
If you assigned a non-default name to the object, specify the --mhc-name
flag.