The Greenplum Database cost-based optimizer evaluates many strategies for executing a query and chooses the least costly method.
Like other RDBMS optimizers, the Greenplum optimizer takes into account factors such as the number of rows in tables to be joined, availability of indexes, and cardinality of column data when calculating the costs of alternative execution plans. The optimizer also accounts for the location of the data, preferring to perform as much of the work as possible on the segments and to minimize the amount of data that must be transmitted between segments to complete the query.
When a query runs slower than you expect, you can view the plan the optimizer selected as well as the cost it calculated for each step of the plan. This will help you determine which steps are consuming the most resources and then modify the query or the schema to provide the optimizer with more efficient alternatives. You use the SQL
EXPLAIN statement to view the plan for a query.
The optimizer produces plans based on statistics generated for tables. It is important to have accurate statistics to produce the best plan. See Updating Statistics with ANALYZE in this guide for information about updating statistics.
Parent topic: Greenplum Database Best Practices
EXPLAIN ANALYZE statements are useful tools to identify opportunities to improve query performance.
EXPLAIN displays the query plan and estimated costs for a query, but does not execute the query.
EXPLAIN ANALYZE executes the query in addition to displaying the query plan.
EXPLAIN ANALYZE discards any output from the
SELECT statement; however, other operations in the statement are performed (for example,
DELETE). To use
EXPLAIN ANALYZE on a DML statement without letting the command affect the data, explicitly use
EXPLAIN ANALYZE in a transaction (
BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;).
EXPLAIN ANALYZE runs the statement in addition to displaying the plan with additional information as follows:
An explain plan is a report detailing the steps the Greenplum Database optimizer has determined it will follow to execute a query. The plan is a tree of nodes, read from bottom to top, with each node passing its result to the node directly above. Each node represents a step in the plan, and one line for each node identifies the operation performed in that step—for example, a scan, join, aggregation, or sort operation. The node also identifies the method used to perform the operation. The method for a scan operation, for example, may be a sequential scan or an index scan. A join operation may perform a hash join or nested loop join.
Following is an explain plan for a simple query. This query finds the number of rows in the contributions table stored at each segment.
gpadmin=# EXPLAIN SELECT gp_segment_id, count(*) FROM contributions GROUP BY gp_segment_id; QUERY PLAN -------------------------------------------------------------------------------- Gather Motion 2:1 (slice2; segments: 2) (cost=0.00..431.00 rows=2 width=12) -> GroupAggregate (cost=0.00..431.00 rows=1 width=12) Group By: gp_segment_id -> Sort (cost=0.00..431.00 rows=1 width=12) Sort Key: gp_segment_id -> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=12) Hash Key: gp_segment_id -> Result (cost=0.00..431.00 rows=1 width=12) -> GroupAggregate (cost=0.00..431.00 rows=1 width=12) Group By: gp_segment_id -> Sort (cost=0.00..431.00 rows=7 width=4) Sort Key: gp_segment_id -> Table Scan on table1 (cost=0.00..431.00 rows=7 width=4) Optimizer status: PQO version 2.56.0 (14 rows)
This plan has eight nodes – Table Scan, Sort, GroupAggregate, Result, Redistribute Motion, Sort, GroupAggregate, and finally Gather Motion. Each node contains three cost estimates: cost (in sequential page reads), the number of rows, and the width of the rows.
The cost is a two-part estimate. A cost of 1.0 is equal to one sequential disk page read. The first part of the estimate is the start-up cost, which is the cost of getting the first row. The second estimate is the total cost, the cost of getting all of the rows.
The rows estimate is the number of rows output by the plan node. The number may be lower than the actual number of rows processed or scanned by the plan node, reflecting the estimated selectivity of
WHERE clause conditions. The total cost assumes that all rows will be retrieved, which may not always be the case (for example, if you use a
The width estimate is the total width, in bytes, of all the columns output by the plan node.
The cost estimates in a node include the costs of all its child nodes, so the top-most node of the plan, usually a Gather Motion, has the estimated total execution costs for the plan. This is this number that the query planner seeks to minimize.
Scan operators scan through rows in a table to find a set of rows. There are different scan operators for different types of storage. They include the following:
Join operators include the following:
enable_nestloopserver configuration parameter to OFF (default) to favor Hash Join.
enable_mergejoinsystem configuration parameter to ON.
Some query plan nodes specify motion operations. Motion operations move rows between segments when required to process the query. The node identifies the method used to perform the motion operation. Motion operators include the following:
Other operators that occur in query plans include the following:
This topic describes Greenplum Database features and programming practices that can be used to enhance system performance in some situations.
To analyze query plans, first identify the plan nodes where the estimated cost to perform the operation is very high. Determine if the estimated number of rows and cost seems reasonable relative to the number of rows for the operation performed.
If using partitioning, validate that partition elimination is achieved. To achieve partition elimination the query predicate (
WHERE clause) must be the same as the partitioning criteria. Also, the
WHERE clause must not contain an explicit value and cannot contain a subquery.
Review the execution order of the query plan tree. Review the estimated number of rows. You want the execution order to build on the smaller tables or hash join result and probe with larger tables. Optimally, the largest table is used for the final join or probe to reduce the number of rows being passed up the tree to the topmost plan nodes. If the analysis reveals that the order of execution builds and/or probes is not optimal ensure that database statistics are up to date. Running
ANALYZE will likely address this and produce an optimal query plan.
Look for evidence of computational skew. Computational skew occurs during query execution when execution of operators such as Hash Aggregate and Hash Join cause uneven execution on the segments. More CPU and memory are used on some segments than others, resulting in less than optimal execution. The cause could be joins, sorts, or aggregations on columns that have low cardinality or non-uniform distributions. You can detect computational skew in the output of the
EXPLAIN ANALYZE statement for a query. Each node includes a count of the maximum rows processed by any one segment and the average rows processed by all segments. If the maximum row count is much higher than the average, at least one segment has performed much more work than the others and computational skew should be suspected for that operator.
Identify plan nodes where a Sort or Aggregate operation is performed. Hidden inside an Aggregate operation is a Sort. If the Sort or Aggregate operation involves a large number of rows, there is an opportunity to improve query performance. A HashAggregate operation is preferred over Sort and Aggregate operations when a large number of rows are required to be sorted. Usually a Sort operation is chosen by the optimizer due to the SQL construct; that is, due to the way the SQL is written. Most Sort operations can be replaced with a HashAggregate if the query is rewritten. To favor a HashAggregate operation over a Sort and Aggregate operation ensure that the
enable_groupagg server configuration parameter is set to
When an explain plan shows a broadcast motion with a large number of rows, you should attempt to eliminate the broadcast motion. One way to do this is to use the
gp_segments_for_planner server configuration parameter to increase the cost estimate of the motion so that alternatives are favored. The
gp_segments_for_planner variable tells the query planner how many primary segments to use in its calculations. The default value is zero, which tells the planner to use the actual number of primary segments in estimates. Increasing the number of primary segments increases the cost of the motion, thereby favoring a redistribute motion over a broadcast motion. For example, setting
gp_segments_for_planner = 100000 tells the planner that there are 100,000 segments. Conversely, to influence the optimizer to broadcast a table and not redistribute it, set
gp_segments_for_planner to a low number, for example 2.
Greenplum Database aggregation extensions to the
GROUP BY clause can perform some common calculations in the database more efficiently than in application or procedure code:
GROUP BY ROLLUP(col1, col2, col3)
GROUP BY CUBE(col1, col2, col3)
GROUP BY GROUPING SETS((col1, col2), (col1, col3))
ROLLUP grouping creates aggregate subtotals that roll up from the most detailed level to a grand total, following a list of grouping columns (or expressions).
ROLLUP takes an ordered list of grouping columns, calculates the standard aggregate values specified in the
GROUP BY clause, then creates progressively higher-level subtotals, moving from right to left through the list. Finally, it creates a grand total.
CUBE grouping creates subtotals for all of the possible combinations of the given list of grouping columns (or expressions). In multidimensional analysis terms,
CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions.
You can selectively specify the set of groups that you want to create using a
GROUPING SETS expression. This allows precise specification across multiple dimensions without computing a whole
Refer to the Greenplum Database Reference Guide for details of these clauses.
Window functions apply an aggregation or ranking function over partitions of the result set—for example,
sum(population) over (partition by city). Window functions are powerful and, because they do all of the work in the database, they have performance advantages over front-end tools that produce similar results by retrieving detail rows from the database and reprocessing them.
row_number()window function produces row numbers for the rows in a partition, for example,
row_number() over (order by id).