For a partitioned table, GPORCA uses statistics of the table root partition to generate query plans. These statistics are used for determining the join order, for splitting and joining aggregate nodes, and for costing the query steps. In contrast, the Postgres-based planner uses the statistics of each leaf partition.
If you run queries on partitioned tables, you should collect statistics on the root partition and periodically update those statistics to ensure that GPORCA can generate optimal query plans. If the root partition statistics are not up-to-date or do not exist, GPORCA still performs dynamic partition elimination for queries against the table. However, the query plan might not be optimal.
Parent topic: About GPORCA
By default, running the ANALYZE
command on a root partitioned table samples the leaf partition data in the table, and stores the statistics for the root partition. ANALYZE
collects statistics on the root and leaf partitions, including HyperLogLog (HLL) statistics on the leaf partitions. ANALYZE ROOTPARTITION
collects statistics only on the root partition. The server configuration parameter optimizer_analyze_root_partition controls whether the ROOTPARTITION
keyword is required to collect root statistics for a root partitioned table. See the ANALYZE command for information about collecting statistics on partitioned tables.
Keep in mind that ANALYZE
always scans the entire table before updating the root partition statistics. If your table is very large, this operation can take a significant amount of time. ANALYZE ROOTPARTITION
also uses an ACCESS SHARE
lock that prevents certain operations, such as TRUNCATE
and VACUUM
operations, during runtime. For these reasons, you should schedule ANALYZE
operations periodically, or when there are significant changes to leaf partition data.
Follow these best practices for running ANALYZE
or ANALYZE ROOTPARTITION
on partitioned tables in your system:
ANALYZE <root_partition_table_name>
on a new partitioned table after adding initial data. Run ANALYZE <leaf_partition_table_name>
on a new leaf partition or a leaf partition where data has changed. By default, running the command on a leaf partition updates the root partition statistics if the other leaf partitions have statistics.EXPLAIN
plans against the table, or after significant changes to leaf partition data. For example, if you add a new leaf partition at some point after generating root partition statistics, consider running ANALYZE
or ANALYZE ROOTPARTITION
to update root partition statistics with the new tuples inserted from the new leaf partition.ANALYZE
or ANALYZE ROOTPARTITION
only weekly, or at some interval longer than daily.ANALYZE
with no arguments, because doing so runs the command on all database tables including partitioned tables. With large databases, these global ANALYZE
operations are difficult to monitor, and it can be difficult to predict the time needed for completion.ANALYZE <table_name>
or ANALYZE ROOTPARTITION <table_name>
operations in parallel to speed the operation of statistics collection, if your I/O throughput can support the load.analyzedb
to update table statistics. Using analyzedb
ensures that tables that were previously analyzed are not re-analyzed if no modifications were made to the leaf partition.Although creating and maintaining root partition statistics is crucial for GPORCA query performance with partitioned tables, maintaining leaf partition statistics is also important. If GPORCA cannot generate a plan for a query against a partitioned table, then the Postgres-based planner is used and leaf partition statistics are needed to produce the optimal plan for that query.
GPORCA itself also uses leaf partition statistics for any queries that access leaf partitions directly, instead of using the root partition with predicates to eliminate partitions. For example, if you know which partitions hold necessary tuples for a query, you can directly query the leaf partition itself; in this case GPORCA uses the leaf partition statistics.
If you do not intend to run queries on partitioned tables with GPORCA (setting the server configuration parameter optimizer to off
), then you can deactivate the automatic collection of statistics on the root partition of the partitioned table. The server configuration parameter optimizer_analyze_root_partition controls whether the ROOTPARTITION
keyword is required to collect root statistics for a root partitioned table. The default setting for the parameter is on
, the ANALYZE
command can collect root partition statistics without the ROOTPARTITION
keyword. You can deactivate automatic collection of root partition statistics by setting the parameter to off
. When the value is off
, you must run ANALZYE ROOTPARTITION
to collect root partition statistics.
Log into the Greenplum Database coordinator host as gpadmin
, the Greenplum Database administrator.
Set the values of the server configuration parameters. These Greenplum Database gpconfig
utility commands sets the value of the parameters to off
:
$ gpconfig -c optimizer_analyze_root_partition -v off --coordinatoronly
Restart Greenplum Database. This Greenplum Database gpstop
utility command reloads the postgresql.conf
files of the coordinator and segments without shutting down Greenplum Database.
gpstop -u