With mirroring enabled, Greenplum Database automatically fails over to a mirror segment when a primary segment goes down. Provided one segment instance is online per portion of data, users may not realize a segment is down. If a transaction is in progress when a fault occurs, the in-progress transaction rolls back and restarts automatically on the reconfigured set of segments.
If the entire Greenplum Database system becomes nonoperational due to a segment failure (for example, if mirroring is not enabled or not enough segments are online to access all user data), users will see errors when trying to connect to a database. The errors returned to the client program may indicate the failure. For example:
ERROR: All segment databases are unavailable
On the Greenplum Database master host, the Postgres postmaster
process forks a fault probe process, ftsprobe
. This is sometimes called the FTS (Fault Tolerance Server) process. The postmaster
process restarts the FTS if it fails.
The FTS runs in a loop with a sleep interval between each cycle. On each loop, the FTS probes each primary segment database by making a TCP socket connection to the segment database using the hostname and port registered in the gp_segment_configuration
table. If the connection succeeds, the segment performs a few simple checks and reports back to the FTS. The checks include executing a stat
system call on critical segment directories and checking for internal faults in the segment instance. If no issues are detected, a positive reply is sent to the FTS and no action is taken for that segment database.
If the connection cannot be made, or if a reply is not received in the timeout period, then a retry is attempted for the segment database. If the configured maximum number of probe attempts fail, the FTS probes the segment's mirror to ensure that it is up, and then updates the gp_segment_configuration
table, marking the primary segment "down" and setting the mirror to act as the primary. The FTS updates the gp_configuration_history
table with the operations performed.
When there is only an active primary segment and the corresponding mirror is down, the primary goes into "Change Tracking Mode." In this mode, changes to the segment are recorded, so the mirror can be synchronized without performing a full copy of data from the primary to the mirror.
The gprecoverseg
utility is used to bring up a mirror that is down. By default, gprecoverseg
performs an incremental recovery, placing the mirror into resync mode, which starts to replay the recorded changes from the primary onto the mirror. If the incremental recovery cannot be completed, the recovery fails and gprecoverseg
should be run again with the -F
option, to perform full recovery. This causes the primary to copy all of the data to the mirror.
You can see the mode—"change tracking", "resync", or "in-sync"—for each segment, as well as the status "up" or "down", in the gp_segment_configuration
table.
The gp_segment_configuration
table also has columns role
and preferred_role
. These can have values of either p
for primary or m
for mirror. The role
column shows the segment database's current role and the preferred_role
shows the original role of the segment. In a balanced system the role
and preferred_role
matches for all segments. When they do not match, there may be skew resulting from the number of active primary segments on each hardware host. To rebalance the cluster and bring all the segments into their preferred role, the gprecoverseg
command can be run with the -r
option.
There is a set of server configuration parameters that affect FTS behavior:
In addition to the fault checking performed by the FTS, a primary segment that is unable to send data to its mirror can change the status of the mirror to down. The primary queues up the data and after gp_segment_connect_timeout
seconds passes, indicates a mirror failure, causing the mirror to be marked down and the primary to go into change tracking mode.
Parent topic: Enabling High Availability and Data Consistency Features