In a cluster with minimal contention to the concurrent threads reading or updating from the members, you can use rebalancing to dynamically increase or decrease your data and processing capacity.
Rebalancing is a member operation. It affects all partitioned regions defined by the member, regardless of whether the member hosts data for the regions. The rebalancing operation performs two tasks:
For efficiency, when starting multiple members, trigger the rebalance a single time, after you have added all members.
NoteIf you have transactions running in your system, be careful in planning your rebalancing operations. Rebalancing may move data between members, which could cause a running transaction to fail with a
TransactionDataRebalancedException
. Fixed custom partitioning prevents rebalancing altogether. All other data partitioning strategies allow rebalancing and can result in this exception unless you run your transactions and your rebalancing operations at different times.
Kick off a rebalance using one of the following:
gfsh
command. First, starting a gfsh
prompt and connect to the cluster. Then type the following command:
gfsh>rebalance
Optionally, you can specify regions to include or exclude from rebalancing, specify a time-out for the rebalance operation or just simulate a rebalance operation. Type help rebalance
or see rebalance for more information.
API call:
ResourceManager manager = cache.getResourceManager();
RebalanceOperation op = manager.createRebalanceFactory().start();
//Wait until the rebalance is complete and then get the results
RebalanceResults results = op.getResults();
//These are some of the details we can get about the run from the API
System.out.println("Took " + results.getTotalTime() + " milliseconds\n");
System.out.println("Transfered " + results.getTotalBucketTransferBytes()+ "bytes\n");
You can also just simulate a rebalance through the API, to see if it’s worth it to run:
ResourceManager manager = cache.getResourceManager();
RebalanceOperation op = manager.createRebalanceFactory().simulate();
RebalanceResults results = op.getResults();
System.out.println("Rebalance would transfer " + results.getTotalBucketTransferBytes() +" bytes ");
System.out.println(" and create " + results.getTotalBucketCreatesCompleted() + " buckets.\n");
The rebalancing operation runs asynchronously.
By default, rebalancing is performed on one partitioned region at a time. For regions that have colocated data, the rebalancing works on the regions as a group, maintaining the data colocation between the regions.
You can optionally rebalance multiple regions in parallel by setting the gemfire.resource.manager.threads
system property. Setting this property to a value greater than 1 enables Tanzu GemFire to rebalance multiple regions in parallel, any time a rebalance operation is initiated using the API.
You can optionally tell the system to move multiple buckets in parallel during a rebalance operation by setting the system property gemfire.MAX_PARALLEL_REBALANCE_OPERATIONS
to a value greater than its default of 1.
You can continue to use your partitioned regions normally while rebalancing is in progress. Read operations, write operations, and function executions continue while data is moving. If a function is executing on a local data set, you may see a performance degradation if that data moves to another host during function execution. Future function invocations are routed to the correct member.
Tanzu GemFire tries to ensure that each member has the same percentage of its available space used for each partitioned region. The percentage is configured in the partition-attributes
local-max-memory
setting.
Partitioned region rebalancing:
local-max-memory
setting to be exceeded unless LRU eviction is enabled with overflow to disk.You typically want to trigger rebalancing when capacity is increased or reduced through member startup, shut down or failure.
You may also need to rebalance when you have uneven hashing of data. Uneven hashing can occur if your keys do not have a hash code method, which ensures uniform distribution, or if you use a PartitionResolver
to colocate your partitioned region data (see Colocate Data from Different Partitioned Regions). In either case, some buckets may receive more data than others. Rebalancing can be used to even out the load between data stores by putting fewer buckets on members that are hosting large buckets.
Rebalancing solely for the purpose of restoring lost redundancy, when redundancy is being used for high availability and the region has been configured to not automatically recover redundancy after a loss, is not necessary. Instead, the restore redundancy operation should be triggered. See Restoring Redundancy in Partitioned Regions.
You can simulate the rebalance operation before moving any actual data around by executing the rebalance
command with the following option:
gfsh>rebalance --simulate
NoteIf you are using
heap_lru
for data eviction, you may notice a difference between your simulated results and your actual rebalancing results. This discrepancy can be due to the VM starting to evict entries after you execute the simulation. Then when you perform an actual rebalance operation, the operation will make different decisions based on the newer heap size.
The experimental automated rebalance feature triggers a rebalance operation based on a time schedule.