Restoring redundancy creates new redundant copies of buckets on members hosting the region and by default reassigns which members host the primary buckets to give better load balancing. It does not move buckets from one member to another. The reassignment of primary hosts can be prevented using the appropriate flags, as described below. See Configure High Availability for a Partitioned Region for further detail on redundancy.
For efficiency, when starting multiple members, trigger the restore redundancy a single time, after you have added all members.
Initiate a restore redundancy operation using one of the following:
gfsh
command. First, starting a gfsh
prompt and connect to the cluster. Then type the following command:
gfsh>restore redundancy
Optionally, you can specify regions to include or exclude from restoring redundancy, and prevent the operation from reassigning which members host primary copies. Type help restore redundancy
or see restore redundancy for more information.
API call:
ResourceManager manager = cache.getResourceManager();
CompletableFuture<RestoreRedundancyResults> future = manager.createRestoreRedundancyOperation()
.includeRegions(regionsToInclude)
.excludeRegions(regionsToExclude)
.shouldReassignPrimaries(false)
.start();
//Get the results
RestoreRedundancyResults results = future.get();
//These are some of the details we can get about the run from the API
System.out.println("Restore redundancy operation status is " + results.getStatus());
System.out.println("Results for each included region: " + results.getMessage());
System.out.println("Number of regions with no redundant copies: " + results.getZeroRedundancyRegionResults().size();
System.out.println("Results for region " + regionName + ": " + results.getRegionResult(regionName).getMessage();
If you have startup-recovery-delay=-1
configured for your partitioned region, you will need to perform a restore redundancy operation on your region after you restart any members in your cluster in order to recover any lost redundancy.
If you have startup-recovery-delay
set to a low number, you may need to wait extra time until the region has recovered redundancy.