MachineHealthCheck is a controller that provides health monitoring and auto-repair facilities for machines. It is automatically deactivated in all management and workload clusters, for both control plane and worker nodes. You can specify conditions for determining when machines in a cluster are considered unhealthy using a MachineHealthCheck resource. Machines that meet these conditions are automatically remediated.

  • Configure Machine Health Check: Select the check box to configure MachineHealthCheck.
  • Node Start Up Timeout: (Optional) This option controls the amount of time that the MachineHealthCheck controller waits for a machine to join the cluster before considering the machine unhealthy.
  • Node Unhealthy Conditions: This option can set the Ready, MemoryPressure, DiskPressure, PIDPressure, and NetworkUnavailable conditions. The MachineHealthCheck controller uses the conditions that you set to monitor the health of your control plane and worker nodes. To set the status of a condition, use True, False, or Unknown.
  • Type: Ready, MemoryPressure, DiskPressure, PIDPressure, and NetworkUnavailable.
  • Status: True, False, or Unknown.
  • Timeout: Specify the timeout duration for the node condition. If the condition is met for the duration of the timeout, the machine will be remediated. Long timeouts can result in long periods of downtime for a workload on an unhealthy machine.