You can configure, activate, and deactivate the Node Health Check parameters in Tanzu Kubernetes Grid clusters through the Kubernetes Container Clusters UI plug-in.
- Detection
- Remediation
Node Failure Detection
VMware Cloud Director Container Service Extension 4.1 and newer versions can detect when a node in a Tanzu Kubernetes Grid cluster becomes unhealthy. When a node is in an unhealthy state, the Kubernetes Container Clusters UI plug-in reflects the available and desired node count in the cluster information page, and also the failure appears in the Events section of the same page.
- Network outages
- Power interruptions
- Low node speed due to high memory, CPU or disk utilization
- Node startup failure
- Failure to join the cluster
Node Remediation
Node Health Check Parameter | Default Value | Description |
---|---|---|
Max Unhealthy Nodes | 100% | Remediation is suspended when the percentage of unhealthy nodes exceeds this value. When the default value is 100%, this means the cluster is always remediated. When the default value is 0%, this means the cluster does not remediate. |
Node Startup Timeout | 900 seconds | If a node does not start in this time frame, it is considered unhealthy and is remediated. For a given VMware Cloud Director environment, it is recommended for service providers to set Node Health Check parameter to be at least twice the time for a VM to be created and bootstrapped. |
Node Status "Not Ready" Timeout | 300 seconds | If a newly joined node cannot host workloads for longer than this timeout, it is considered unhealthy and is remediated. |
Node Status "Unknown" Timeout | 300 seconds | If a healthy node is unreachable for longer than this timeout, it is considered unhealthy and is remediated. |
Activate or Deactivate Node Health Check in a VMware Cloud Director Container Service Extension 4.0.x Cluster
Tenant users can also activate or deactivate Node Health Check on clusters that were created in VMware Cloud Director Container Service Extension 4.0.x.
The following steps outline how tenant users can perform this action:
- Log in to VMware Cloud Director portal, and from the top navigation bar, select .
- Click the cluster name, and in the cluster information page, click Settings.
- Activate or deactivate Node Health Check toggle, and click Save.