Failover Manager actions for various failure scenarios are described in Failure scenarios and Failover Manager actions table.

Table 1. Failure scenarios and Failover Manager actions

Scenario

Failover Manager action

Domain Manager fails

Failover Manager initiates an automatic failover to use the corresponding Standby for the failed component.

Host where the Domain Manager is running fails

Failover Manager initiates an automatic failover to use the corresponding Standby for the failed component.

SAM with Notification Cache Publishing Enabled fails or any one of its services (Tomcat, Rabbit MQ, and ElasticSearch) fail

The Failover Manager treats any single component failure as a collective failure of all components and initiates automatic failover to use the corresponding Standby components.

Broker fails

If the host where the Broker resides is running and the Broker is down, Failover Manager initiates an automatic failover to promote the Standby Broker as Active and attempts to restart the failed Active Broker.

If the attempt to restart the failed Broker is successful, then both Active and Standby Brokers maintain the same list of active Managers. The Failover Manager registers the promoted Active Broker with the Active Managers.

Host where the Broker is running fails or the Broker cannot be restarted

If the host where the Broker resides is down or the Broker is down, Failover Manager initiates an automatic failover to promote the Standby Broker as Active and attempts to restart the failed Active Broker.

If the attempt to restart the failed Broker is not successful:

  • The newly-promoted Active Broker maintains the list of active Managers. The Failover Manager registers the promoted Active Broker with the Active Managers.

Domain Manager or Broker is running but is not responding or is intermittently responding

No failover is initiated. The Failover Manager sends an email message to indicate that the component is not responding.

Intermittent network communication failure with the host where a component is running

No failover is initiated. The Failover Manager attempts to send an email message to indicate this issue.

Communication between the Active and Standby location is lost

No failover is initiated. The Failover Manager attempts to send an email message to indicate this issue.

Trap Exploder stops

Failover Manager restarts the Trap Exploder. If the two attempts fail, the Failover Manager proceeds to fail over to the Standby Trap Exploder. The Failover Manager registers the Standby Trap Exploder with the Broker and un-suspends trap forwarding in it and changes its role to Active.

Trap Exploder Failover provides more information.

The site fails (all components in Location A fail)

Failover Manager switches the components, including the Broker, to the Standby location.

In the case of a failover:

  • If a failover is initiated, Failover Manager checks the state of the corresponding Standby component prior to the failover completion. If, for any reason, the Standby component is not operational, failover is not performed.

    If the Standby component is operational, the failover continues. When the failover is completed, the Failover Manager automatically changes the configuration of the solution so that the corresponding Standby component becomes the new Active component. The Broker registration is changed to list the location of the new Active component.

    If the Failover Manager cannot reach the Broker or the Broker is down at the time of an Domain Manager failover, the Domain Manager failover is not initiated. Instead, the Failover Manager performs a Broker failover. Once the Broker failover is completed, Failover Manager continues with the Domain Manager failover.

    All clients, including Global Consoles and other components, automatically reconnect to the new Active component. If you open another Global Console, the new console connection needs to be established. To do so, specify the new Active Broker value in the Attach Manager dialog box.

    When the failed component is restarted, it automatically resumes the Standby role. Also, in the case of SAM and Adapter Platform, it is automatically configured to listen to the Active component.

  • If the Failover Manager fails, no failover is initiated.

    If the Failover Manager monitoring process exits for some reason, you are notified and you must restart Failover Manager manually.

  • If a network component fails and causes the failure of multiple components, all affected components are switched to the Standby, provided that communications between locations to the Failover Manager and to the Broker are operating correctly.

  • The Failover Manager generates email to notify administrators of a failure situation.

  • The Failover Manager periodically synchronizes configuration files from an Active component to the Standby counterpart. Passwordless communication is required for this automatic action. Set up passwordless communication between Active and Standby hosts provides instructions for configuring passwordless communication and the SSH connection.

    Additionally:

  • In a warm failover for IP Manager and Server Manager only the Active Domain Manager monitors the network.

  • Host configurations for Active and Standby components have to be the same. This is necessary because the roles of Active and Standby components will switch from one host to another when a failover occurs.