In systems with distributed-ack regions, a sudden large number of distributed-no-ack operations can cause distributed-ack operations to take a long time to complete.
The distributed-no-ack
operations can come from anywhere. They may be updates to distributed-no-ack
regions or they may be other distributed-no-ack
operations, like destroys, performed on any region in the cache, including the distributed-ack
regions.
The main reasons why a large number of distributed-no-ack
messages may delay distributed-ack
operations are:
distributed-ack
is sent, the distributed-ack
operation must wait to get to the front of the line before being transmitted. Of course, the operation’s calling process is also left waiting.distributed-no-ack
messages are buffered by their threads before transmission. If many messages are buffered and then sent to the socket at once, the line for transmission might be very long.You can take these steps to reduce the impact of this problem:
conserve-sockets
to true. If enabled, each application’s threads will share sockets unless you override the setting at the thread level. Work with your application programmers to see whether you might deactivate sharing entirely or at least for the threads that perform distributed-ack
operations. These include operations on distributed-ack
regions and also netSearches
performed on regions of any distributed scope. (Note: netSearch
is only performed on regions with a data-policy of empty, normal and preloaded.) If you give each thread that performs distributed-ack
operations its own socket, you effectively let it scoot to the front of the line ahead of the distributed-no-ack
operations that are being performed by other threads. The thread-level override is done by calling the DistributedSystem.setThreadsSocketPolicy(false)
method.disable-tcp
to true in gemfire.properties), consider reducing the byteAllowance of mcast-flow-control to something smaller than the default of 3.5 megabytes.socket-buffer-size
in gemfire.properties.