Recovering a Cluster Node

Restoring a Automation Orchestrator node can cause issues with the Kubernetes service.

To recover a problematic node in your Automation Orchestrator cluster, you must locate the node, remove it from the cluster, and then add it to the cluster again.

Procedure

Identify the primary node of your Automation Orchestrator cluster.
1. Log in to the Automation Orchestrator Appliance command line of one of your nodes over SSH as root.
2. Find the node with the primary role by running the kubectl -n prelude exec postgres-0 command.
```
kubectl -n prelude exec postgres-0 – chpst -u postgres repmgr cluster show --terse --compact
```
3. Retrieve the name of the pod in which the primary node is located.
  In most cases, the name of the pod is postgres-0.postgres.prelude.svc.cluster.local.
4. Find the FQDN address of the primary node by running the kubectl -n prelude get pods command.
```
kubectl -n prelude get pods -o wide
```
5. Find the database pod with the name you retrieved and get the FQDN address for the corresponding node.
Locate the problematic node by running the kubectl -n prelude get node command.
The problematic node has a NotReady status.
Log in to the Automation Orchestrator Appliance command line of the primary node over SSH as root.
Remove the problematic node from the cluster by running the vracli cluster remove <NODE-FQDN> command.
Log in to the Automation Orchestrator Appliance command line of the problematic node over SSH as root.
Add the node to the cluster again by running the vracli cluster join <MASTER-DB-NODE-FQDN> command.