With server redundancy, each pool has a primary server and some number of secondaries. The primaries and secondaries are assigned on a per-pool basis and are generally spread out for load balancing, so a single client with multiple pools may have primary queues in more than one server.
The primary server pushes events to clients and the secondaries maintain queue backups. If the primary server fails, one of the secondaries becomes primary to provide uninterrupted event messaging.
For example, if there are six servers running and
subscription-redundancy is set to two, one server is the primary, two servers are secondary, and the remaining three do not actively participate in HA for the client. If the primary server fails, the system assigns one of the secondaries as the new primary and attempts to add another server to the secondary pool to retain the initial redundancy level. If no new secondary server is found, then the redundancy level is not satisfied but the failover procedure completes successfully. As soon as another secondary is available, it is added.
When high availability is enabled:
In stage 1 of this figure, the primary sends an event message to the client and a synchronization message to its secondary. By stage 2, the secondary and client have updated their queue and message tracking information. If the primary failed at stage two, the secondary would start sending event messages from its queue beginning with message A10. The client would discard the resend of message A10 and then process subsequent messages as usual.
By default, the primary server sends queue synchronization messages to the secondaries every second. You can change this interval with the
gfsh alter runtime command
Set the interval for queue synchronization messages as follows:
gfsh>alter runtime --message-sync-interval=2
<!-- Set sync interval to 2 seconds --> <cache ... message-sync-interval="2" />
cache = CacheFactory.create(); cache.setMessageSyncInterval(2);
The ideal setting for this interval depends in large part on your application behavior. These are the benefits of shorter and longer interval settings:
Usually, all event messages are removed from secondary subscription queues based on the primary’s synchronization messages. Occasionally, however, some messages are orphaned in the secondary queues. For example, if a primary fails in the middle of sending a synchronization message to its secondaries, some secondaries might receive the message and some might not. If the failover goes to a secondary that did receive the message, the system will have secondary queues holding messages that are no longer in the primary queue. The new primary will never synchronize on these messages, leaving them orphaned in the secondary queues.
To make sure these messages are eventually removed, the secondaries expire all messages that have been enqueued longer than the time indicated by the servers’
Set the time-to-live as follows:
<!-- Set message ttl to 5 minutes --> <cache-server port="41414" message-time-to-live="300" />
Cache cache = ...; CacheServer cacheServer = cache.addCacheServer(); cacheServer.setPort(41414); cacheServer.setMessageTimeToLive(200); cacheServer.start();