The Greenplum Database cost-based optimizer evaluates many strategies for running a query and chooses the least costly method.
Like other RDBMS optimizers, the Greenplum optimizer takes into account factors such as the number of rows in tables to be joined, availability of indexes, and cardinality of column data when calculating the costs of alternative execution plans. The optimizer also accounts for the location of the data, preferring to perform as much of the work as possible on the segments and to minimize the amount of data that must be transmitted between segments to complete the query.
When a query runs slower than you expect, you can view the plan the optimizer selected as well as the cost it calculated for each step of the plan. This will help you determine which steps are consuming the most resources and then modify the query or the schema to provide the optimizer with more efficient alternatives. You use the SQL EXPLAIN
statement to view the plan for a query.
The optimizer produces plans based on statistics generated for tables. It is important to have accurate statistics to produce the best plan. See Updating Statistics with ANALYZE in this guide for information about updating statistics.
Parent topic: Greenplum Database Best Practices
The EXPLAIN
and EXPLAIN ANALYZE
statements are useful tools to identify opportunities to improve query performance. EXPLAIN
displays the query plan and estimated costs for a query, but does not run the query. EXPLAIN ANALYZE
runs the query in addition to displaying the query plan. EXPLAIN ANALYZE
discards any output from the SELECT
statement; however, other operations in the statement are performed (for example, INSERT
, UPDATE
, or DELETE
). To use EXPLAIN ANALYZE
on a DML statement without letting the command affect the data, explicitly use EXPLAIN ANALYZE
in a transaction (BEGIN; EXPLAIN ANALYZE ...; ROLLBACK;
).
EXPLAIN ANALYZE
runs the statement in addition to displaying the plan with additional information as follows:
An explain plan is a report detailing the steps the Greenplum Database optimizer has determined it will follow to run a query. The plan is a tree of nodes, read from bottom to top, with each node passing its result to the node directly above. Each node represents a step in the plan, and one line for each node identifies the operation performed in that step—for example, a scan, join, aggregation, or sort operation. The node also identifies the method used to perform the operation. The method for a scan operation, for example, may be a sequential scan or an index scan. A join operation may perform a hash join or nested loop join.
Following is an explain plan for a simple query. This query finds the number of rows in the contributions table stored at each segment.
gpadmin=# EXPLAIN SELECT gp_segment_id, count(*)
FROM contributions
GROUP BY gp_segment_id;
QUERY PLAN
--------------------------------------------------------------------------------
Gather Motion 2:1 (slice2; segments: 2) (cost=0.00..431.00 rows=2 width=12)
-> GroupAggregate (cost=0.00..431.00 rows=1 width=12)
Group By: gp_segment_id
-> Sort (cost=0.00..431.00 rows=1 width=12)
Sort Key: gp_segment_id
-> Redistribute Motion 2:2 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=12)
Hash Key: gp_segment_id
-> Result (cost=0.00..431.00 rows=1 width=12)
-> GroupAggregate (cost=0.00..431.00 rows=1 width=12)
Group By: gp_segment_id
-> Sort (cost=0.00..431.00 rows=7 width=4)
Sort Key: gp_segment_id
-> Seq Scan on table1 (cost=0.00..431.00 rows=7 width=4)
Optimizer: GPORCA
(14 rows)
This plan has eight nodes – Seq Scan, Sort, GroupAggregate, Result, Redistribute Motion, Sort, GroupAggregate, and finally Gather Motion. Each node contains three cost estimates: cost (in sequential page reads), the number of rows, and the width of the rows.
The cost is a two-part estimate. A cost of 1.0 is equal to one sequential disk page read. The first part of the estimate is the start-up cost, which is the cost of getting the first row. The second estimate is the total cost, the cost of getting all of the rows.
The rows estimate is the number of rows output by the plan node. The number may be lower than the actual number of rows processed or scanned by the plan node, reflecting the estimated selectivity of WHERE
clause conditions. The total cost assumes that all rows will be retrieved, which may not always be the case (for example, if you use a LIMIT
clause).
The width estimate is the total width, in bytes, of all the columns output by the plan node.
The cost estimates in a node include the costs of all its child nodes, so the top-most node of the plan, usually a Gather Motion, has the estimated total execution costs for the plan. This is this number that the query planner seeks to minimize.
Scan operators scan through rows in a table to find a set of rows. There are different scan operators for different types of storage. They include the following:
Join operators include the following:
enable_nestloop
server configuration parameter to OFF (default) to favor Hash Join.enable_mergejoin
system configuration parameter to ON.Some query plan nodes specify motion operations. Motion operations move rows between segments when required to process the query. The node identifies the method used to perform the motion operation. Motion operators include the following:
Other operators that occur in query plans include the following:
WHERE
clause.This topic describes Greenplum Database features and programming practices that can be used to enhance system performance in some situations.
To analyze query plans, first identify the plan nodes where the estimated cost to perform the operation is very high. Determine if the estimated number of rows and cost seems reasonable relative to the number of rows for the operation performed.
If using partitioning, validate that partition elimination is achieved. To achieve partition elimination the query predicate (WHERE
clause) must be the same as the partitioning criteria. Also, the WHERE
clause must not contain an explicit value and cannot contain a subquery.
Review the execution order of the query plan tree. Review the estimated number of rows. You want the execution order to build on the smaller tables or hash join result and probe with larger tables. Optimally, the largest table is used for the final join or probe to reduce the number of rows being passed up the tree to the topmost plan nodes. If the analysis reveals that the order of execution builds and/or probes is not optimal ensure that database statistics are up to date. Running ANALYZE
will likely address this and produce an optimal query plan.
Look for evidence of computational skew. Computational skew occurs during query execution when execution of operators such as Hash Aggregate and Hash Join cause uneven execution on the segments. More CPU and memory are used on some segments than others, resulting in less than optimal execution. The cause could be joins, sorts, or aggregations on columns that have low cardinality or non-uniform distributions. You can detect computational skew in the output of the EXPLAIN ANALYZE
statement for a query. Each node includes a count of the maximum rows processed by any one segment and the average rows processed by all segments. If the maximum row count is much higher than the average, at least one segment has performed much more work than the others and computational skew should be suspected for that operator.
Identify plan nodes where a Sort or Aggregate operation is performed. Hidden inside an Aggregate operation is a Sort. If the Sort or Aggregate operation involves a large number of rows, there is an opportunity to improve query performance. A HashAggregate operation is preferred over Sort and Aggregate operations when a large number of rows are required to be sorted. Usually a Sort operation is chosen by the optimizer due to the SQL construct; that is, due to the way the SQL is written. Most Sort operations can be replaced with a HashAggregate if the query is rewritten. To favor a HashAggregate operation over a Sort and Aggregate operation ensure that the enable_groupagg
server configuration parameter is set to ON
.
When an explain plan shows a broadcast motion with a large number of rows, you should attempt to eliminate the broadcast motion. One way to do this is to use the gp_segments_for_planner
server configuration parameter to increase the cost estimate of the motion so that alternatives are favored. The gp_segments_for_planner
variable tells the query planner how many primary segments to use in its calculations. The default value is zero, which tells the planner to use the actual number of primary segments in estimates. Increasing the number of primary segments increases the cost of the motion, thereby favoring a redistribute motion over a broadcast motion. For example, setting gp_segments_for_planner = 100000
tells the planner that there are 100,000 segments. Conversely, to influence the optimizer to broadcast a table and not redistribute it, set gp_segments_for_planner
to a low number, for example 2.
Greenplum Database aggregation extensions to the GROUP BY
clause can perform some common calculations in the database more efficiently than in application or procedure code:
GROUP BY ROLLUP(*col1*, *col2*, *col3*)
GROUP BY CUBE(*col1*, *col2*, *col3*)
GROUP BY GROUPING SETS((*col1*, *col2*), (*col1*, *col3*))
A ROLLUP
grouping creates aggregate subtotals that roll up from the most detailed level to a grand total, following a list of grouping columns (or expressions). ROLLUP
takes an ordered list of grouping columns, calculates the standard aggregate values specified in the GROUP BY
clause, then creates progressively higher-level subtotals, moving from right to left through the list. Finally, it creates a grand total.
A CUBE
grouping creates subtotals for all of the possible combinations of the given list of grouping columns (or expressions). In multidimensional analysis terms, CUBE
generates all the subtotals that could be calculated for a data cube with the specified dimensions.
You can selectively specify the set of groups that you want to create using a GROUPING SETS
expression. This allows precise specification across multiple dimensions without computing a whole ROLLUP
or CUBE
.
Refer to the Greenplum Database Reference Guide for details of these clauses.
Window functions apply an aggregation or ranking function over partitions of the result set—for example, sum(population) over (partition by city)
. Window functions are powerful and, because they do all of the work in the database, they have performance advantages over front-end tools that produce similar results by retrieving detail rows from the database and reprocessing them.
row_number()
window function produces row numbers for the rows in a partition, for example, row_number() over (order by id)
.