This topic explains how to configure VMware Tanzu GemFire to handle network partition.
The system uses a combination of member coordinators and system members, designated as lead members, to detect and resolve network partitioning problems.
Network partition detection works in all environments. Using multiple locators mitigates the effect of network partitioning. See Configuring Peer-to-Peer Discovery.
Network partition detection is enabled by default. The default setting in the gemfire.properties
file is
enable-network-partition-detection=true
Processes that do not have network partition detection enabled are not eligible to be the lead member, so their failure will not trigger declaration of a network partition.
All system members should have the same setting for enable-network-partition-detection
. If they do not, the system throws a GemFireConfigException
upon startup.
The property enable-network-partition-detection
must be true if you are using either partitioned or persistent regions. If you create a persistent region and enable-network-partition-detection
to set to false, you will receive the following warning message:
Creating persistent region {0} but enable-network-partition-detection is set to false.
Running with network partition detection deactivated can lead to an unrecoverable system in the event of a network split.
Configure regions you want to protect from network partitioning with a scope setting of DISTRIBUTED_ACK
or GLOBAL
. Do not use DISTRIBUTED_NO_ACK
scope. This prevents operations from being performed throughout the cluster before a network partition is detected. Note: Tanzu GemFire issues an alert if it detects DISTRIBUTED_NO_ACK
regions when network partition detection is enabled:
Region {0} is being created with scope {1} but enable-network-partition-detection is enabled in the distributed system.
This can lead to cache inconsistencies if there is a network failure.
These other configuration parameters affect or interact with network partitioning detection. Check whether they are appropriate for your installation and modify as needed.
ack-wait-threshold
(default is 15 seconds) and ack-severe-alert-threshold
(15 seconds) properties elapse before receiving a response to a message. If you modify the ack-wait-threshold
configuration value, you should modify ack-severe-alert-threshold
to match the other configuration value.cache.xml
pool read-timeout
should be set to at least three times the member-timeout
setting in the server’s gemfire.properties
file. The default pool read-timeout
setting is 10000 milliseconds.gemfire.member-weight
upon startup. For example, if you have some VMs that host a needed service, you could assign them a higher weight upon startup.By default, members that are forced out of the cluster by a network partition event will automatically restart and attempt to reconnect. Data members will attempt to reinitialize the cache. See Handling Forced Cache Disconnection Using Auto-reconnect.