Automation Orchestrator High Availability

To increase the availability of the Automation Orchestrator services, start multiple Automation Orchestrator server instances in a cluster with a shared database. Automation Orchestrator works as a single instance until it is configured to work as part of a cluster.

Multiple Automation Orchestrator server instances with identical server and plug-ins configurations work together in a cluster and share one database.

All Automation Orchestrator server instances communicate with each other by exchanging heartbeats. Each heartbeat is a timestamp that the node writes to the shared database of the cluster at a certain time interval. Network problems, an unresponsive database server, or overload might cause an Automation Orchestrator cluster node to stop responding. If an active Automation Orchestrator server instance fails to send heartbeats within the failover timeout period, it is considered non-responsive. The failover timeout is equal to the value of the heartbeat interval multiplied by the number of the failover heartbeats. It serves as a definition for an unreliable node and can be customized according to the available resources and the production load.

An Automation Orchestrator node enters standby mode when it loses connection to the database, and remains in this mode until the database connection is restored. The other nodes in the cluster take control of the active work, by resuming all interrupted workflows from their last unfinished items, such as scriptable tasks or workflow invocations.

You can monitor the state of your Automation Orchestrator cluster from the System tab of the Automation Orchestrator Client dashboard. To configure the cluster heartbeat, number of failover heartbeats, and the number of active nodes, navigate to the Orchestrator Cluster Management page of the Automation Orchestrator Control Center.

For information about scalability maximums, go to Automation Orchestrator system requirements.