Update CA tool hangs on ansible playbook updating the node CA certificate when running update-all command. It requires user interaction to terminate it and run the command again.

Problem

Update CA tool hangs on ansible playbook updating the node CA certificate when running update-all command.

The logging output could halt for a few minutes. For example:

TASK [test curl] ********************************************************************************************************************************* 
changed: [172.16.68.129] 
changed: [172.16.68.231] 
changed: [172.16.69.68] 
TASK [test pull tkr-compatibility] *************************************************************************************************************** 
changed: [172.16.68.231] 
changed: [172.16.69.68]

The operation is hung on testing 172.16.68.129 in the example.

Cause

Node disconnected while running the node update tasks may be caused by redeployment of control plane node.

Solution

  1. Check ansible processes and kill the relevant one.
    [root@hxu-tcacp-2 ~]# ps -ef | grep ansible 
    root 753978 753971 15 07:28 pts/0 00:00:30 /usr/bin/python3 /usr/bin/ansible-playbook -i /root/update-ca/ansible/hosts /root/update-ca/ansible/update_node_ca.yml 
    root 754043 1 0 07:28 ? 00:00:00 ssh: /root/.ansible/cp/d2d0af91b5 [mux] 
    root 754309 753978 0 07:29 pts/0 00:00:00 /usr/bin/python3 /usr/bin/ansible-playbook -i /root/update-ca/ansible/hosts /root/update-ca/ansible/update_node_ca.yml 
    root 754314 754309 0 07:29 pts/0 00:00:00 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User="capv" -o ConnectTimeout=60 -o ControlPath=/root/.ansible/cp/d2d0af91b5 172.16.68.129 /bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-eyjcipkrfnnbjzybxqencwxtdgulalmp ; /usr/bin/python'"'"' && sleep 0' 
    root 756460 755805 0 07:32 pts/1 00:00:00 grep ansible 
    
    [root@hxu-tcacp-2 ~]# kill 754314 

    After killing the script, continue to run, but it will report some error finally.

  2. Run the update-allcommand again.