This topic describes what happens during VMware Tanzu GemFire startup and shutdown and provides procedures for those operations.
When you start a member with a persistent region, the data is retrieved from disk stores to recreate the member’s persistent region. If the member does not hold all of the most recent data for the region, then other members have the data, and region creation blocks, waiting for the those other members. A partitioned region with colocated entries also blocks on start up, waiting for the entries of the colocated region to be available. A persistent gateway sender is treated the same as a colocated region, so it can also block region creation.
With a log level of info or below, the system provides messaging about the wait. Here, the disk store for server2 has the most recent data for the region, and server1 is waiting for server2.
Region /people has potentially stale data.
It is waiting for another member to recover the latest data.
My persistent id:
DiskStore ID: 6893751ee74d4fbd-b4780d844e6d5ce7
Name: server1
Location: /192.0.2.0:/home/dsmith/server1/.
Members with potentially new data:
[
DiskStore ID: 160d415538c44ab0-9f7d97bae0a2f8de
Name: server2
Location: /192.0.2.0:/home/dsmith/server2/.
]
Use the `gfsh show missing-disk-stores` command to see all disk stores
that are being waited on by other members.
When the most recent data is available, the system updates the region, logs a message, and continues the startup.
[info 2010/04/09 10:52:13.010 PDT CacheRunner <main> tid=0x1]
Done waiting for the remote data to be available.
If the member’s disk store has data for a region that is never created, the data remains in the disk store.
Each member’s persistent regions load and go online as quickly as possible, not waiting unnecessarily for other members to complete. For performance reasons, these actions occur asynchronously:
To start a system with disk stores:
Start all members with persisted data first and at the same time. Exactly how you do this depends on your members. Make sure to start members that host colocated regions, as well as persistent gateway senders.
While they are initializing their regions, the members determine which have the most recent region data, and initialize their regions with the most recent data.
For replicated regions, where you define persistence only in some of the region’s host members, start the persistent replicate members prior to the non-persistent replicate members to make sure the data is recovered from disk.
This is an example bash script for starting members in parallel. The script waits for the startup to finish. It exits with an error status if one of the jobs fails.
#!/bin/bash
ssh servera "cd /my/directory; gfsh start server --name=servera &
ssh serverb "cd /my/directory; gfsh start server --name=serverb &
STATUS=0;
for job in `jobs -p`
do
echo $job
wait $job;
JOB_STATUS=$?;
test $STATUS -eq 0 && STATUS=$JOB_STATUS;
done
exit $STATUS;
Respond to blocked members. When a member blocks waiting for more recent data from another member, the member waits indefinitely rather than coming online with stale data. Check for missing disk stores with the gfsh show missing-disk-stores
command. See Handling Missing Disk Stores.
The following lists the two possibilities for starting up a replicated persistent region after a shutdown. Assume that Member A (MA) exits first, leaving persisted data on disk for RegionP. Member B (MB) continues to run operations on RegionP, which update its disk store and leave the disk store for MA in a stale condition. MB exits, leaving the most up-to-date data on disk for RegionP.
If more than one member hosts a persistent region or queue, the order in which the various members shut down may be significant upon restart of the system. The last member to exit the system or shut down has the most up-to-date data on disk. Each member knows which other system members were online at the time of exit or shutdown. This permits a member to acquire the most recent data upon subsequent start up.
For a replicated region with persistence, the last member to exit has the most recent data.
For a partitioned region every member persists its own buckets. A shutdown using gfsh shutdown
will synchronize the disk stores before exiting, so all disk stores hold the most recent data. Without an orderly shutdown, some disk stores may have more recent bucket data than others.
The best way to shut down a system is to invoke the gfsh shutdown
command with all members running. All online data stores will be synchronized before shutting down, so all hold the most recent data copy. To shut down all members other than locators:
gfsh>shutdown
To shut down all members, including locators:
gfsh>shutdown --include-locators=true