When you configure machine health checks on your management cluster, Cluster API starts to detect unhealthy machines on a specified workload cluster and remediate them. You must perform the configuration on each workload cluster.
- If the feature is not configured in a cluster and a cluster node running StatefulSet is unresponsive, the StatefulSet pods are stuck in Terminating phase. Then StatefulSet pods are not scheduled to a different node.
- If the feature is configured on the cluster, Cluster API ignores the non-terminated pods and re-creates the failed machine. Then the StatefulSet pods start on the new machine.
The followng code is an example of a MachineHealthCheck API object.
apiVersion: cluster.x-k8s.io/v1beta1 kind: MachineHealthCheck metadata: name: capi-quickstart-node-unhealthy-5m spec: # clusterName is required to associate this MachineHealthCheck with a particular cluster clusterName: capi-quickstart # (Optional) maxUnhealthy prevents further remediation if the cluster is already partially unhealthy maxUnhealthy: 40% # (Optional) nodeStartupTimeout determines how long a MachineHealthCheck should wait for # a Node to join the cluster, before considering a Machine unhealthy. # Defaults to 10 minutes if not specified. # Set to 0 to disable the node startup timeout. # Disabling this timeout will prevent a Machine from being considered unhealthy when # the Node it created has not yet registered with the cluster. This can be useful when # Nodes take a long time to start up or when you only want condition based checks for # Machine health. nodeStartupTimeout: 10m # selector is used to determine which Machines should be health checked selector: matchLabels: nodepool: nodepool-0 # Conditions to check on Nodes for matched Machines, if any condition is matched for the duration of its timeout, the Machine is considered unhealthy unhealthyConditions: - type: Ready status: Unknown timeout: 300s - type: Ready status: "False" timeout: 300s
For more information about Machine Health Check, see Healthchecking in The Cluster API Book.