By default, VMware GemFire uses the JVM heap. Heap memory is a part of memory allocated to the JVM in which all class instances and are allocated. Space allocated to the heap is reclaimed through garbage collection (GC), an automatic memory management process.
The GemFire resource manager works with your JVM’s tenured garbage collector to control heap use and protect your member from hangs and crashes due to memory overload.
The GemFire resource manager prevents the cache from consuming too much memory by evicting old data. If the garbage collector is unable to keep up, the resource manager refuses additions to the cache until the collector has freed an adequate amount of memory.
The resource manager has two threshold settings, each expressed as a percentage of the total tenured heap. Both are deactivated by default.
Eviction Threshold. Above this, the manager orders evictions for all regions with
eviction-attributes set to
lru-heap-percentage. This prompts dedicated background evictions, independent of any application threads and it also tells all application threads adding data to the regions to evict at least as much data as they add. The JVM garbage collector removes the evicted data, reducing heap use. The evictions continue until the manager determines that heap use is again below the eviction threshold.
The resource manager enforces eviction thresholds only on regions whose LRU eviction policies are based on heap percentage. Regions whose eviction policies based on entry count or memory size use other mechanisms to manage evictions. See Eviction for more detail regarding eviction policies.
Critical Threshold. Above this, all activity that might add data to the cache is refused. This threshold is set above the eviction threshold and is intended to allow the eviction and GC work to catch up. This JVM, all other JVMs in the distributed system, and all clients to the system receive
LowMemoryException for operations that would add to this critical member’s heap consumption. Activities that fetch or reduce data are allowed. For a list of refused operations, see the Javadocs for the
Critical threshold is enforced on all regions, regardless of LRU eviction policy, though it can be set to zero to deactivate its effect.
When heap use passes the eviction threshold in either direction, the manager logs an info-level message.
When heap use exceeds the critical threshold, the manager logs an error-level message. Avoid exceeding the critical threshold. Once identified as critical, the GemFire member becomes a read-only member that refuses cache updates for all of its regions, including incoming distributed updates.
For more information, see
org.apache.geode.cache.control.ResourceManager in the online API documentation.
When the manager kicks off evictions:
See also Memory Requirements for Cached Data.
The most important factor affecting garbage collection performance is total available memory.
The specific heap size you should set depends greatly on the nature of your live data and the rate and nature of the operations on your cache.
In general, you should allocate sufficient space to store all the data in the cache, with additional space recommended to handle bursts and the failures of other nodes. If you lack capacity to store all the data that in the cache, you may run into issues such as GC thrashing and excessive eviction.
To maintain low latency for CMS heaps, we recommend that you initially set the heap size to at least 2 times, and up to 3 times, the size of your live data set.
To achieve the 1ms maximum pause time promise of ZGC, you should initially set the heap size to at least 2 times, and up to 3 times, the size of your live data set.
Two key factors affect required heap size with ZGC:
Java Object Storage: With small heap sizes (less than 32GB), the amount of memory required to store Java objects increases with ZGC, typically around 40%. The actual increase is highly dependent on the nature of the Java objects.
ZGC Headroom: To maintain low latency, ZGC requires additional “headroom,” that is, unused heap beyond what is required to store the application’s Java objects. The amount of headroom required depends largely on the memory allocation rate, which in turn depends largely on the operations being performed on the cache. Heap sizes should be at least 2 to 3 times the memory usage of the cache itself, which consists of your data plus the GemFire data structures. Memory-intensive operations such as queries can increase this further. The actual headroom required is highly dependent on the rate and nature of the operations on the cache.
XmxJVM option to set each member’s heap to the required size.
-XX:SoftMaxHeapSizeslightly below the maximum expected heap consumption to protect the system from any allocation stalls, etc. If you will use GemFire eviction, see Tuning the JVM’s Garbage Collection Parameters below for additional guidance for tuning
Resource manager behavior is closely tied to the triggering of Garbage Collection (GC) activities, the use of concurrent garbage collectors in the JVM, and the number of parallel GC threads used for concurrency.
The recommendations provided here for using the manager assume you have a solid understanding of your Java VM’s heap management and garbage collection service.
The resource manager is available for use in any GemFire member, but you may not want to activate it everywhere. For some members it might be better to occasionally restart after a hang or OME crash than to evict data and/or refuse distributed caching activities. Also, members that do not risk running past their memory limits would not benefit from the overhead the resource manager consumes. Cache servers are often configured to use the manager because they generally host more data and have more data activity than other members, requiring greater responsiveness in data cleanup and collection.
For the members where you want to activate the resource manager:
The configuration terms used here are
cache.xml elements and attributes, but you can also configure through
gfsh and the
max-heapto the same value.
critical-heap-percentage threshold. This should be as as close to 100 as possible while still low enough so the manager’s response can prevent the member from hanging or getting
OutOfMemoryError. The threshold is zero (no threshold) by default. Note: When you set this threshold, it also enables a query monitoring feature that prevents most out-of-memory exceptions when executing queries or creating indexes. See Monitoring Queries for Low Memory.
eviction-heap-percentage threshold to a value lower than the critical threshold. This should be as high as possible while still low enough to prevent your member from reaching the critical threshold. The threshold is zero (no threshold) by default.
lru-heap-percentage. See Eviction. The regions you configure for eviction should have enough data activity for the evictions to be useful and should contain data your application can afford to delete or offload to disk.
gfsh>start server --name=server1 --initial-heap=30m --max-heap=30m \ --critical-heap-percentage=80 --eviction-heap-percentage=60
<cache> <region refid="REPLICATE_HEAP_LRU" /> ... <resource-manager critical-heap-percentage="80" eviction-heap-percentage="60"/> </cache>
resource-manager specification must appear after the region declarations in your cache.xml file.
Because VMware GemFire is specifically designed to manipulate data held in memory, you can optimize your application’s performance by tuning the way VMware GemFire uses the JVM heap.
See your JVM documentation for all JVM-specific settings that can be used to improve garbage collection (GC) response. Best configuration can vary depending of the use case and JVM garbage collector used.
If you are using concurrent mark-sweep (CMS) garbage collection with VMware GemFire, use the following settings to improve performance:
-Xmx, to the same values. The
gfsh start serveroptions
--max-heapaccomplish the same purpose, with the added value of providing resource manager defaults such as eviction threshold and critical threshold.
eviction-heap-percentage. You want the collector to be working when GemFire is evicting or the evictions will not result in more free memory. For example, if the
eviction-heap-percentageis set to 65, set your garbage collection to start when the heap use is no higher than 55%.
|JVM||CMS switch flag||CMS initiation (begin at heap % N)|
gfsh start server command, pass these settings with the
--J switch, for example:
The following is an example of setting Hotspot JVM for an application:
$ java app.MyApplication -Xms=30m -Xmx=30m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60
Note: Do not use the
-XX:+UseStringCache JVM configuration properties when starting up servers. These JVM options can cause issues with data corruption and compatibility.
$ gfsh start server --name=app.MyApplication --initial-heap=30m --max-heap=30m \ --J=-XX:+UseConcMarkSweepGC --J=-XX:CMSInitiatingOccupancyFraction=60
Although the garbage first (G1) garbage collector works effectively with VMware GemFire, issues can arise in some cases due to the differences between CMS and G1. For example, G1 by design is not able to set a maximum tenured heap size, so when this value is requested from the garbage collector, it reports the total heap maximum size. This impacts VMware GemFire, as the resource manager uses the maximum size of the tenured heap size to calculate the value in bytes of the eviction and critical percentages. Extensive testing is recommended before using G1 garbage collector. See your JVM documentation for all JVM-specific settings that can be used to improve garbage collection (GC) response.
Size of objects stored on a region must also be taken into account. If the primary heap objects you allocate are larger than 50 percent of the G1 region size (what are called “humongous” objects), this can cause the JVM to report
out of heap memory when it has used only 50 percent of the heap. The default G1 region size is 1 Mb; it can be increased up to 32 Mb (with values that are always a power of 2) by using the
--J-XX:G1HeapRegionSize=VALUE JVM parameter. If you are using large objects and want to use G1GC without increasing its heap region size (or if your values are larger than 16 Mb), then you could configure your VMware GemFire regions to store the large values off-heap. However, even if you do that the large off-heap values will allocate large temporary heap values that G1GC will treat as “humongous” allocations, even though they will be short lived. Consider using CMS if most of you values will result in “humongous” allocations.
Note: G1's behavior is incompatible with heap LRU eviction. If you will use heap LRU eviction in your cache, you should not use G1.
If you are using the Z garbage collector (ZGC) with VMware GemFire, use the following settings to improve performance:
Set the initial and maximum heap switches,
-Xmx, to the same values. Alternatively, use the gfsh
start server options
--max-heap to set these values.
If you will use GemFire eviction, configure ZGC to initiate collection when heap use is at least 5% lower than your setting for the resource manager
eviction-heap-percentage. The collector should work when GemFire is evicting or the evictions will not result in more free memory. For example, if the
--eviction-heap-percentage is set to
-XX:SoftMaxHeapSize to a value no higher than 60% of maximum heap size.
Note that with a heap size less than 32 GB, the cache’s heap usage may be up to 80% larger with ZGC than with CMS or G1. The magnitude of this impact is highly dependent on the nature of your cached data. For heap sizes of 32 GB or larger, ZGC does not cause this additional overhead.
In tuning the resource manager, your central focus should be keeping the member below the critical threshold. The critical threshold is provided to avoid member hangs and crashes, but because of its exception-throwing behavior for distributed updates, the time spent in critical negatively impacts the entire distributed system. To stay below critical, tune so that the GemFire eviction and the JVM’s GC respond adequately when the eviction threshold is reached.
Use the statistics provided by your JVM to make sure your memory and GC settings are sufficient for your needs.
ResourceManagerStats provide information about memory use and the manager thresholds and eviction activities.
If your application spikes above the critical threshold on a regular basis, try lowering the eviction threshold. If the application never goes near critical, you might raise the eviction threshold to gain more usable memory without the overhead of unneeded evictions or GC cycles.
The settings that will work well for your system depend on a number of factors, including these:
The size of the data objects you store in the cache: Very large data objects can be evicted and garbage collected relatively quickly. The same amount of space in use by many small objects takes more processing effort to clear and might require lower thresholds to allow eviction and GC activities to keep up.
Application behavior: Applications that quickly put a lot of data into the cache can more easily overrun the eviction and GC capabilities. Applications that operate more slowly may be more easily offset by eviction and GC efforts, possibly allowing you to set your thresholds higher than in the more volatile system.
Your choice of JVM: Each JVM has its own GC behavior, which affects how efficiently the collector can operate, how quickly it kicks in when needed, and other factors.
In this sample statistics chart in VSD, the manager’s evictions and the JVM’s GC efforts are good enough to keep heap use very close to the eviction threshold. The eviction threshold could be increased to a setting closer to the critical threshold, allowing the member to keep more data in tenured memory without the risk of overwhelming the JVM. This chart also shows the blocks of times when the manager was running cache evictions.
In this next chart, it looks like the manager’s evictions are kicking in at the right time, but the CMS garbage collector is not starting soon enough to keep memory use in check. It might be that it is not configured to start as soon as it should. It should be started just before the eviction threshold is reached. Or there might be some other issue with the garbage collection service.
These examples set the critical threshold to 85 percent of the tenured heap and the eviction threshold to 75 percent. The region
bigDataStore is configured to participate in the resource manager’s eviction activities.
gfsh>start server --name=server1 --initial-heap=30m --max-heap=30m \ --critical-heap-percentage=85 --eviction-heap-percentage=75
gfsh>create region --name=bigDataStore --type=PARTITION_HEAP_LRU
<cache> <region name="bigDataStore" refid="PARTITION_HEAP_LRU"/> ... <resource-manager critical-heap-percentage="85" eviction-heap-percentage="75"/> </cache>
resource-manager specification must appear after the region declarations in your cache.xml file.
Cache cache = CacheFactory.create(); ResourceManager rm = cache.getResourceManager(); rm.setCriticalHeapPercentage(85); rm.setEvictionHeapPercentage(75); RegionFactory rf = cache.createRegionFactory(RegionShortcut.PARTITION_HEAP_LRU); Region region = rf.create("bigDataStore");
This is one possible scenario for the configuration used in the examples:
lru-heap-percentageaction destroy is suitable.
OutOfMemoryExceptionerrors. Testing has shown that leaving 15% head room above the critical threshold when adding data to the region gives 99.5% uptime with no
OutOfMemoryExceptionerrors, when configured with the CMS garbage collector using