This topic describes how to configure MachineHealthCheck for TKG Service clusters provisioned using the v1beta1 API.
MachineHealthCheck for v1beta1 Clusters
MachineHealthCheck is a Kubernetes Cluster API resource that defines conditions for remediating unhealthy machines. In Kubernetes a machine is a custom resource that can run kubelet. In the vSphere IaaS control plane, a Kubernetes machine resource is backed by a vSphere virtual machine. For more information, refer to the upstream documentation.
When you provision a cluster using the TKG Service, the system creates default MachineHealthCheck objects, one for all the control planes and one for each machine deployment. Starting with vSphere 8 Update 3, machine health checks are configurable for v1beta1 clusters. Supported settings include the following:
- maxUnhealthy
- nodeStartupTimeout
- unhealthyConditions
- unhealthyRange
Field | Value | Description |
---|---|---|
maxUnhealthy |
string Absolute number or a percentage |
Remediation will not be performed when the number of unhealthy machines exceeds the value. |
nodeStartupTimeout |
string Duration in the form |
Any machine being created that takes longer than the duration to join the cluster is considered failed and will be remediated. |
unhealthyConditions |
array [] of unhealthyConditions types Available condition types: [ Available condition status: [ |
List of conditions that determine whether a control plane node is considered unhealthy. |
unhealthyRange |
string | Any further remediation is only allowed if the number of machines selected by "selector" as not healthy is within the range of the Takes precedence over |
MachineHealthCheck Example
machineHealthCheck
for a given
machineDeployment
.
... topology: class: tanzukubernetescluster version: v1.28.8---vmware.1-fips.1-tkg.2 controlPlane: machineHealthCheck: enable: true maxUnhealthy: 100% nodeStartupTimeout: 4h0m0s unhealthyConditions: - status: Unknown timeout: 5m0s type: Ready - status: "False" timeout: 12m0s type: Ready ... workers: machineDeployments: - class: node-pool failureDomain: np1 machineHealthCheck: enable: true maxUnhealthy: 100% nodeStartupTimeout: 4h0m0s unhealthyConditions: - status: Unknown timeout: 5m0s type: Ready - status: "False" timeout: 12m0s type: Ready
Patch MachineHealthCheck Using Kubectl
MachineHealthCheck
for a v1beta1 cluster after it has been provisioned, use the
patch
method.
MachineHealthCheck
for an existing cluster.
- Get the
machineDeployment
from the cluster resource definition.kubectl get cluster CLUSTER_NAME -o yaml
In the section
spec.topology.workers.machineDeployments
, you should see the value identifying eachmachineDeployment
. - Delete the worker node MachineHealthCheck.
kubectl patch cluster <Cluster Name> -n <cluster namespace> --type json -p='{"op": "replace", "path": "/spec/topology/workers/machineDeployments/<index>/machineHealthCheck", "value":{"enable":false}}'
- Delete the control plane MachineHealthCheck.
kubectl patch cluster <cluster-name> -n <cluster-namespace> --type json -p='{"op": "replace", "path": "/spec/topology/controlPlane/machineHealthCheck", "value":{"enable":false}}'
- Create or update the control plane MachineHealthCheck with the desired settings.
kubectl patch cluster <cluster-name> -n <cluster-namespace> --type json -p='[{"op": "replace", "path": "/spec/topology/controlPlane/machineHealthCheck", "value":{"enable":true,"nodeStartupTimeout":"1h58m","unhealthyConditions":[{"status":"Unknown","timeout":"5m10s","type":"Unknown"},{"status":"Unknown","timeout":"5m0s","type":"Ready"}],"maxUnhealthy":"100%"}}]'
- Create or update the worker node MachineHealthCheck with the desired settings.
kubectl patch cluster <cluster-name> -n <cluster-namespace> --type json -p='[{"op": "replace", "path": "/spec/topology/workers/machineDeployments/<index>/machineHealthCheck", "value":{"enable":true,"nodeStartupTimeout":"1h58m","unhealthyConditions":[{"status":"Unknown","timeout":"5m10s","type":"Unknown"},{"status":"Unknown","timeout":"5m0s","type":"Ready"}],"maxUnhealthy":"100%"}}]'
Configure MachineHealthCheck Using Tanzu CLI
You can use the Tanzu CLI to configure MachineHealthCheck for a v1beta1 cluster.
tanzu cluster mhc control-plane set <cluster-name> --node-startup-timeout 2h7m10s
tanzu cluster mhc control-plane get <cluster-name>
tanzu cluster mhc node set <cluster-name> --machine-deployment node-pool-1 --node-startup-timeout 1h59m0s
tanzu cluster mhc node get <cluster-name> -m <cluster-name>-node-pool-1-nr7r5
Besides get and set, the system supports the delete operation. For example:
tanzu cluster mhc control-plane delete <cluster-name>For node, you can use the following command:
tanzu cluster mhc <cluster-name> --machine-deployment <machine deployment name>