This topic explains how to start up and shut down your VMware Tanzu GemFire system.
Determine the proper startup and shutdown procedures, and write your startup and shutdown scripts.
Well-designed procedures for starting and stopping your system can speed startup and protect your data. The processes you need to start and stop include server and locator processes and your other Tanzu GemFire applications, including clients. The procedures you use depend in part on your system’s configuration and the dependencies between your system processes.
Use the following guidelines to create startup and shutdown procedures and scripts. Some of these instructions use gfsh
.
You should follow certain order guidelines when starting your Tanzu GemFire system.
Start servers before you start their client applications. In each cluster, follow these guidelines for member startup:
If you are starting up your locators and peer members all at once, you can use the locator-wait-time
property (in seconds) upon process start up. This timeout allows peers to wait for the locators to finish starting up before attempting to join the cluster.
If the process cannot initially reach a locator, it will sleep for join-retry-sleep
milliseconds between retries until it either connects or the number of seconds specified in locator-wait-time
has elapsed. By default, locator-wait-time
is set to zero meaning that a process that cannot connect to a locator upon startup will throw an exception.
Note: You can optionally override the default timeout period for shutting down individual processes. This override setting must be specified during member startup. See Shutting Down the System for details.
This information pertains to catastrophic loss of Tanzu GemFire disk store files. If you lose disk store files, your next startup may hang, waiting for the lost disk stores to come back online. If your system hangs at startup, use the gfsh
command show missing-disk-store
to list missing disk stores and, if needed, revoke missing disk stores so your system startup can complete. You must use the Disk Store ID to revoke a disk store. These are the two commands:
gfsh>show missing-disk-stores
Disk Store ID | Host | Directory
------------------------------------ | --------- | -------------------------------------
60399215-532b-406f-b81f-9b5bd8d1b55a | excalibur | /usr/local/gemfire/deploy/disk_store1
gfsh>revoke missing-disk-store --id=60399215-532b-406f-b81f-9b5bd8d1b55a
Note: This gfsh
command requires that you be connected to the cluster via a JMX Manager node.
Shut down your Tanzu GemFire system by using either the gfsh
shutdown
command or by shutting down individual members one at a time.
If you are using persistent regions, (members are persisting data to disk), you should use the gfsh
shutdown
command to stop the running system in an orderly fashion. This command synchronizes persistent partitioned regions before shutting down, which makes the next startup of the cluster as efficient as possible.
If possible, all members should be running before you shut them down so synchronization can occur. Shut down the system using the following gfsh
command:
gfsh>shutdown
By default, the shutdown command will only shut down data nodes. If you want to shut down all nodes including locators, specify the --include-locators=true
parameter. For example:
gfsh>shutdown --include-locators=true
This will shut down all locators one by one, shutting down the manager last.
To shutdown all data members after a grace period, specify a time-out option (in seconds).
gfsh>shutdown --time-out=60
To shutdown all members including locators after a grace period, specify a time-out option (in seconds).
gfsh>shutdown --include-locators=true --time-out=60
If you are not using persistent regions, you can shut down the cluster by shutting down each member in the reverse order of their startup. (See Starting Up Your System for the recommended order of member startup.)
Shut down the cluster members according to the type of member. For example, use the following mechanisms to shut down members:
Shut down any cache servers. To shut down a server, issue the following gfsh
command:
gfsh>stop server --name=<...>
or
gfsh>stop server --dir=<server_working_dir>
Shut down any locators. To shut down a locator, issue the following gfsh
command:
gfsh>stop locator --name=<...>
or
gfsh>stop locator --dir=<locator_working_dir>
Do not use the command line kill -9
to shut down a server under ordinary circumstances. Especially on systems with a small number of members, using a kill
instead of a gfsh stop
can cause the partition detection mechanism to place the system in an end state that will wait forever to reconnect to the terminated server, and there will be no way to restart that terminated server. If a kill
command appears the only way to rid the system of a server, then kill
all the processes of the cluster or use kill -INT
, which will allow an orderly shutdown of the process.
The DISCONNECT_WAIT
command line argument sets the maximum time for each individual step in the shutdown process. If any step takes longer than the specified amount, it is forced to end. Each operation is given this grace period, so the total length of time the cache member takes to shut down depends on the number of operations and the DISCONNECT_WAIT
setting. During the shutdown process, Tanzu GemFire produces messages such as:
Disconnect listener still running
The DISCONNECT_WAIT
default is 10000 milliseconds.
To change it, set this system property on the Java command line used for member startup. For example:
gfsh>start server --J=-DDistributionManager.DISCONNECT_WAIT=<milliseconds>
Each process can have different DISCONNECT_WAIT
settings.