Rebalancing Partitioned Region Data

In a cluster with minimal contention to the concurrent threads reading or updating from the members, you can use rebalancing to dynamically increase or decrease your data and processing capacity.

Rebalancing is a member operation. It affects all partitioned regions defined by the member, regardless of whether the member hosts data for the regions. The rebalancing operation performs two tasks:

If the configured partition region redundancy is not satisfied, rebalancing does what it can to recover redundancy. See Configure High Availability for a Partitioned Region.
Rebalancing moves the partitioned region data buckets between host members as needed to establish the most fair balance of data and behavior across the cluster.

For efficiency, when starting multiple members, trigger the rebalance a single time, after you have added all members.

Note
If you have transactions running in your system, be careful in planning your rebalancing operations. Rebalancing may move data between members, which could cause a running transaction to fail with a TransactionDataRebalancedException. Fixed custom partitioning prevents rebalancing altogether. All other data partitioning strategies allow rebalancing and can result in this exception unless you run your transactions and your rebalancing operations at different times.

Kick off a rebalance using one of the following:

gfsh command. First, starting a gfsh prompt and connect to the cluster. Then type the following command:
```
gfsh>rebalance
```
Optionally, you can specify regions to include or exclude from rebalancing, specify a time-out for the rebalance operation or just simulate a rebalance operation. Type help rebalance or see rebalance for more information.

API call:

ResourceManager manager = cache.getResourceManager(); 
RebalanceOperation op = manager.createRebalanceFactory().start(); 
//Wait until the rebalance is complete and then get the results
RebalanceResults results = op.getResults(); 
//These are some of the details we can get about the run from the API
System.out.println("Took " + results.getTotalTime() + " milliseconds\n"); 
System.out.println("Transfered " + results.getTotalBucketTransferBytes()+ "bytes\n");

You can also just simulate a rebalance through the API, to see if it’s worth it to run:

ResourceManager manager = cache.getResourceManager(); 
RebalanceOperation op = manager.createRebalanceFactory().simulate(); 
RebalanceResults results = op.getResults(); 
System.out.println("Rebalance would transfer " + results.getTotalBucketTransferBytes() +" bytes "); 
System.out.println(" and create " + results.getTotalBucketCreatesCompleted() + " buckets.\n");

How Partitioned Region Rebalancing Works

The rebalancing operation runs asynchronously.

By default, rebalancing is performed on one partitioned region at a time. For regions that have colocated data, the rebalancing works on the regions as a group, maintaining the data colocation between the regions.

You can optionally rebalance multiple regions in parallel by setting the gemfire.resource.manager.threads system property. Setting this property to a value greater than 1 enables Tanzu GemFire to rebalance multiple regions in parallel, any time a rebalance operation is initiated using the API.

You can optionally tell the system to move multiple buckets in parallel during a rebalance operation by setting the system property gemfire.MAX_PARALLEL_REBALANCE_OPERATIONS to a value greater than its default of 1.

You can continue to use your partitioned regions normally while rebalancing is in progress. Read operations, write operations, and function executions continue while data is moving. If a function is executing on a local data set, you may see a performance degradation if that data moves to another host during function execution. Future function invocations are routed to the correct member.

Tanzu GemFire tries to ensure that each member has the same percentage of its available space used for each partitioned region. The percentage is configured in the partition-attributes local-max-memory setting.

Partitioned region rebalancing:

Does not allow the local-max-memory setting to be exceeded unless LRU eviction is enabled with overflow to disk.
Places multiple copies of the same bucket on different host IP addresses whenever possible.
Resets entry time to live and idle time statistics during bucket migration.
Replaces offline members.

When to Rebalance a Partitioned Region

You typically want to trigger rebalancing when capacity is increased or reduced through member startup, shut down or failure.

You may also need to rebalance when you have uneven hashing of data. Uneven hashing can occur if your keys do not have a hash code method, which ensures uniform distribution, or if you use a PartitionResolver to colocate your partitioned region data (see Colocate Data from Different Partitioned Regions). In either case, some buckets may receive more data than others. Rebalancing can be used to even out the load between data stores by putting fewer buckets on members that are hosting large buckets.

Rebalancing solely for the purpose of restoring lost redundancy, when redundancy is being used for high availability and the region has been configured to not automatically recover redundancy after a loss, is not necessary. Instead, the restore redundancy operation should be triggered. See Restoring Redundancy in Partitioned Regions.

How to Simulate Region Rebalancing

You can simulate the rebalance operation before moving any actual data around by executing the rebalance command with the following option:

gfsh>rebalance --simulate

Note
If you are using heap_lru for data eviction, you may notice a difference between your simulated results and your actual rebalancing results. This discrepancy can be due to the VM starting to evict entries after you execute the simulation. Then when you perform an actual rebalance operation, the operation will make different decisions based on the newer heap size.

Automated Rebalancing

The experimental automated rebalance feature triggers a rebalance operation based on a time schedule.