Query Plan Execution

The Greenplum Database legacy and GPORCA query optimizers generate execution plans that produce the results requested by the query. A plan is a sequence of operators, such as table scans, joins, sorts, aggregates, and data motions.

When you select a query on the Command Center Query Monitor view, a Query Details view presents a graphical representation of the execution plan.

You can switch between the graphical and texual representations of the query execution plan by selecting the Plan & Progress tab or the Texual Plan tab. In the textual format, each plan node is flagged with an arrow (->). In the graphical view, the nodes are represented by boxes that fill as the plan executes.

A query execution plan executes from the bottom up. Each node in the plan performs an operation and passes results up to the next node in the plan.

The Optimizer status: line on the Textual Plan tab reports whether the explain plan was generated using the GPORCA optimizer or the legacy query optimizer.

Slices and Gangs

Segments can work on portions of a query in parallel, each segment executing operators independently on their local data. When the plan requires exchanging data between segments, a data motion operator coordinates the data transfer between segments. The plan is divided into "slices" where these data motions occur.

A data motion node in a textual query plan identifies the slice and the number of segments participating in the motion.

Example:

->  Broadcast Motion 4:4  (slice2; segments: 4)  (cost=0.00..867.15 rows=10000 width=30)

In a broadcast motion, each segment broadcasts all of its rows for a table over the network so that every segment has a complete copy of the table. In this example, the broadcast motion marks the completion of slice2 with four segments sending and four segments receiving.

Each segment has one or more backend processes working on a slice. Backend processes working on the same slice are called a "gang".

Operators

Operators are processes that take as input database tables or the output from other operators, and perform some action to produce a transformed output.

Scan Operators

Init plan

A query that runs before the main query is optimized to find the partitions to scan.

Sequential scan

The optimizer may choose a sequential table scan if there is no index on the condition column or if most rows are expected to satisify the condition. Because each segment scans an equal portion of the data in parallel with other segments, a table scan in Greenplum Database is very efficient. A query on a partitioned table may be able to eliminate partitions to make the scan even faster.

Append-only scan

Scans rows in a row-oriented, append-optimized table.

Append-only columnar scan

Scans rows in a column-oriented, append-optimized table.

Dynamic table scan

Scans selected partitions in a partitioned table.

Function scan

A Function Scan node selects the partitions to scan. The function can be one of the following:

gp_partition_expansion - chooses all nodes
gp_partition_selection - chooses a partition with an equality expression
gp_partition_inversion - chooses partitions with a range expression

Index scan

Scans a B-tree index on a table to find rows. The rows are then retrieved from disk.

Bitmap index scan

A Bitmap Index Scan is an index scan optimized by storing rows in a bitmap instead of retrieving them from the table immediately. When the scan is complete, rows in the bitmap are retrieved with a Bitmap Heap Scan operation.

BitmapAnd and BitmapOr

Generates a new bitmap by running logical AND or OR on multiple bitmaps.

Bitmap heap scan

Retrieves rows from heap storage using a bitmap generated by a Bitmap index scan or BitmapAnd or BitmapOr operation.

Nested loop with inner sequential scan join

For each row in the first table, the operator tests every row in the second table with a sequential scan.

One table must be broadcast so that every segment can compare all rows of one table with the rows it has from the other table. This is expensive and is best used only for small tables.

Nested loop with inner index scan

For each row in the first table, the operator searches an index on the second table.

One table must be broadcast so that every segment can compare all rows of one table with the rows it has from the other table.

Append

Concatenates data sets. For example, combines rows scanned from multiple partitions.

Filter

Selects rows using a WHERE clause.

Limit

Limits the number of rows returned.

Materialize

Saves results from a subselect so that it is not necessary to process the inner table for every row in the outer table.

Join Operators

Hash join

Creates a hash table on the join key of the smaller table. Scans the larger table and looks up matching rows in the hash table. Hash join is very fast. The hash table is held in memory, so a hash join can use a lot of memory, depending on the size of the smaller table.

Sort merge join

The tables to be joined are sorted on the join attribute and then scanned in parallel to find the matching values. This is a good join method for tables that are too large to use a hash join.

Product join

Joins every qualifying row in the first table with every qualifying row in the second table. This type of join can be very expensive if spill files must be used.

Sort and Aggregate Operators

Sort

Sorts rows to prepare for operations such as an aggregation or merge join.

Group by

Groups rows by one or more columns.

Group / hash aggregate

Aggregates rows using a hash.

Motion Operators

Broadcast motion

Every segment sends its own local data to all other segment instances so that every segment instance has a complete local copy of the table.

Redistribution motion

Sends data from one table to another segment so that matching rows are located together, enabling a local join.

Gather motion

All segments send rows to the master where they are gathered into a single result set.

DML Operators

Assert

Performs constraints checking.

Split

Used for update operations.