Best practices for regular maintenance that will ensure Greenplum Database high availability and optimal performance.
Parent topic: Greenplum Database Best Practices
Greenplum Database includes utilities that are useful for monitoring the system.
gp_toolkit schema conatains several views that can be accessed using SQL commands to query system catalogs, log files, and operating environment for system status information.
gp_stats_missing view shows tables that do not have statistics and require
ANALYZE to be run.
For additional information on
gpcheckperf refer to the Greenplum Database Utility Guide. For information about the gp_
toolkit schema, see the Greenplum Database Reference Guide.
gpstate utility program displays the status of the Greenplum system, including which segments are down, master and segment configuration information (hosts, data directories, etc.), the ports used by the system, and mapping of primary segments to their corresponding mirror segments.
gpstate -Q to get a list of segments that are marked "down" in the master system catalog.
To get detailed status information for the Greenplum system, run
gpcheckperf utility tests baseline hardware performance for a list of hosts. The results can help identify hardware issues. It performs the following checks:
ddoperating system command. It reports read and write rates in megabytes per second.
gpnetbenchnetwork benchmark program (optionally
netperf)to test network performance. The test is run in one of three modes: parallel pair test (
-r N), serial pair test (
-r n), or full-matrix test (
-r M).The minimum, maximum, average, and median transfer rates are reported in megabytes per second.
To obtain valid numbers from
gpcheckperf, the database system must be stopped. The numbers from
gpcheckperf can be inaccurate even if the system is up and running with no query activity.
gpcheckperf requires a trusted host setup between the hosts involved in the performance test. It calls
gpscp, so these utilities must also be in your
PATH. Specify the hosts to check individually (
-h host1 -h host2 ...) or with
-f hosts_file, where
hosts_file is a text file containing a list of the hosts to check. If you have more than one subnet, create a separate host file for each subnet so that you can test the subnets separately.
gpcheckperf runs the disk I/O test, the memory test, and a serial pair network performance test. With the disk I/O test, you must use the
-d option to specify the file systems you want to test. The following command tests disk I/O and memory bandwidth on hosts listed in the
$ gpcheckperf -f subnet_1_hosts -d /data1 -d /data2 -r ds
-r option selects the tests to run: disk I/O (
d), memory bandwidth (
s), network parallel pair (
N), network serial pair test (
n), network full-matrix test (
M). Only one network mode can be selected per execution. See the Greenplum Database Reference Guide for the detailed
The following Linux/UNIX utilities can be used to assess host performance:
iostatallows you to monitor disk activity on segment hosts.
topdisplays a dynamic view of operating system processes.
vmstatdisplays memory usage statistics.
You can use
gpssh to run utilities on multiple hosts.
gpcheckperfat install time and periodically thereafter, saving the output to compare system performance over time.
gpcheckperfreference in the Greenplum Database Utility Guide.
netperf, netperf must be installed on each host you test. See
gpcheckperfreference for more information.