Restoring a Automation Orchestrator node can cause issues with the Kubernetes service.
To recover a problematic node in your Automation Orchestrator cluster, you must locate the node, remove it from the cluster, and then add it to the cluster again.
Procedure
- Identify the primary node of your Automation Orchestrator cluster.
- Log in to the Automation Orchestrator Appliance command line of one of your nodes over SSH as root.
- Find the node with the
primary
role by running the kubectl -n prelude exec postgres-0 command.
kubectl -n prelude exec postgres-0 – chpst -u postgres repmgr cluster show --terse --compact
- Retrieve the name of the pod in which the primary node is located.
In most cases, the name of the pod is
postgres-0.postgres.prelude.svc.cluster.local
.
- Find the FQDN address of the primary node by running the kubectl -n prelude get pods command.
kubectl -n prelude get pods -o wide
- Find the database pod with the name you retrieved and get the FQDN address for the corresponding node.
- Locate the problematic node by running the kubectl -n prelude get node command.
The problematic node has a
NotReady
status.
- Log in to the Automation Orchestrator Appliance command line of the primary node over SSH as root.
- Remove the problematic node from the cluster by running the vracli cluster remove <NODE-FQDN> command.
- Log in to the Automation Orchestrator Appliance command line of the problematic node over SSH as root.
- Add the node to the cluster again by running the vracli cluster join <MASTER-DB-NODE-FQDN> command.