What to do when a Worker Node Goes Down

In this topic, you can find steps to remove a node and then scale-out the node when a worker node goes down.

Procedure

Use Ansible Playbook with kubespray/remove-node.yml feature.

Perform the following steps to remove or scale down the node.

cd /root/k8s-installer/
export ANSIBLE_CONFIG=/root/k8s-installer/scripts/ansible/ansible.cfg
ansible-playbook -i inventory/<ClusterName>/hosts.yml scripts/ansible/kubespray/remove-node.yml -u <SSH-username> --become -e @scripts/ansible/internal_vars.yml -e @scripts/ansible/vars.yml --extra-vars "node=node3,node4,node6"
Verify the status of scaled down cluster inside the control node.

kubectl get nodes

The node must be removed.

Example:

[tco@node1 ~]$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                  KERNEL-VERSION                  CONTAINER-RUNTIME
node1   Ready    control-plane   16h   v1.26.5   10.180.13.70    <none>        Oracle Linux Server 8.7   5.15.0-3.60.5.1.el8uek.x86_64   containerd://1.7.1
node2   Ready    <none>          16h   v1.26.5   10.180.13.144   <none>        Oracle Linux Server 8.7   5.15.0-3.60.5.1.el8uek.x86_64   containerd://1.7.1
node5   Ready    <none>          16h   v1.26.5   10.180.13.133   <none>        Oracle Linux Server 8.7   5.15.0-3.60.5.1.el8uek.x86_64   containerd://1.7.1
[tco@node1 ~]$

Prepare for deployment inside the Deployment Container.
1. cd /root/k8s-installer/
2. export ANSIBLE_CONFIG=/root/k8s-installer/scripts/ansible/ansible.cfg
3. Add or edit new VM IPs inside the Deployment Container.
  Edit the /root/k8s-installer/scripts/ansible/vars.yml used for deployment and add new node IPs.
```
# The list of IP addresses of your Control Nodes and Worker Nodes. This should be a YAML list.    
 
   control_plane_ips: # The list of control plane IP addresses of your VMs. This should be a YAML list.
      - <IP1>
      - <IP2>
    worker_node_ips: # The list of worker nodes IP addresses of your VMs.This should be a YAML list.
      - <IP3>
      - <New IP>
```
4. ansible-playbook scripts/ansible/prepare.yml -e @scripts/ansible/vars.yml --become

Use the ansible-playbook with kubespray/scale_k8s.yml feature to scale the node again.

Perform the following steps to scale the node.

cd /root/k8s-installer/
export ANSIBLE_CONFIG=/root/k8s-installer/scripts/ansible/ansible.cfg
ansible-playbook -i inventory/<ClusterName>/hosts.yml scripts/ansible/scale_k8s.yml -u <SSH-username> --become -e @scripts/ansible/internal_vars.yml -e @scripts/ansible/vars.yml --limit="localhost,node6"

Verify the status of the scaled down cluster inside the control node.

kubectl get nodes

The node must be removed.

Example:

[tco@node1 ~]$ kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                  KERNEL-VERSION                  CONTAINER-RUNTIME
node1   Ready    control-plane   15h   v1.26.5   10.180.13.70    <none>        Oracle Linux Server 8.7   5.15.0-3.60.5.1.el8uek.x86_64   containerd://1.7.1
node2   Ready    <none>          15h   v1.26.5   10.180.13.144   <none>        Oracle Linux Server 8.7   5.15.0-3.60.5.1.el8uek.x86_64   containerd://1.7.1
node5   Ready    <none>          15h   v1.26.5   10.180.13.133   <none>        Oracle Linux Server 8.7   5.15.0-3.60.5.1.el8uek.x86_64   containerd://1.7.1
node6   Ready    <none>          73m   v1.26.5   10.180.13.149   <none>        Oracle Linux Server 8.7   5.15.0-3.60.5.1.el8uek.x86_64   containerd://1.7.1
[tco@node1 ~]$

Note:

If a node is unreachable, follow Option B worst case scenario steps.
If a node is reachable but unrecoverable at the OS level, follow Option B worst case scenario steps.