This document contains release information about VMware Greenplum Database 7.x releases. For previous versions of the release notes for Greenplum Database, go to VMware Greenplum Database Documentation. For information about Greenplum Database end of life, see VMware Greenplum Database end of life policy.
VMware Greenplum 7 software is available for download from the VMware Greenplum page on VMware Tanzu Network.
VMware Greenplum 7 is based on the open source Greenplum Database project code.
Release Date: 2023-09-28
Greenplum Database introduces substantial improvements to resource group-based resource management, such as support for Linux Control Groups v2, simplified memory management, and support for disk I/O limits per resource group. Refer to About Changes to Resource Groups for more information about what has changed in resource groups.
Index-only scans can answer queries from an index alone without accessing the table’s heap, which significantly improves query performance. In addition, covering indexes allow you to add additional columns to an index using the INCLUDE
clause, in order to make the use of index-only scans more effective. See Understanding Index-Only Scans and Covering Indexes for more details.
Unique indexes, unique constraints, and primary keys are now supported on append-optimized tables.
Fast ANALYZE
improves the speed of ANALYZE
for append-optimized tables. You do not need to enable fast ANALYZE
, this is the default and only behaviour when you analyze an append-optimized table.
Greenplum Database no longer rewrites the table when a column is added to a table (ALTER TABLE ... ADD COLUMN ...
).
PostgreSQL declarative table partitioning syntaxis now supported. See About Changes to Table Partitioning in Greenplum 7 for more details.
Multi-column most-common-value (MCV) extended statistics compute the correlation ratio and number of distinct values to generate better plans for queries that test several non-uniformly-distributed columns. This feature is helpful in estimating query memory usage and when combining the statistics from individual columns. See CREATE STATISTICS for more details.
UPSERT
operations turn INSERT
operations that would violate constraints into an UPDATE
, or ignore them. See INSERT for more details.
BRIN indexes (Block Range INdexes) use much less space in disk compared to a standard b-tree index for very large tables whose columns have some natural correlation with their physical location within the table.
Row-leval security allows database administrators to set security policies that filter which rows particular users are permitted to update or view. Refer to About Configuring Row-Level Security Policies for more information.
Just-in-Time (JIT) compilation allows you compile otherwise interpreted queries into compiled code at run time, which provides a performance improvement for long running CPU bound queries, like analytical queries.
Hash indexes are supported with the Postgres-based planner and GPORCA.
The built-in Full Text Search functionality provides data types, functions, operators, index types, and configurations for querying natural language documents. You may also search for phrases (multiple adjacent words) that appear next to each other in a specific order, or with a specified distance between the words.
Progress reporting for Monitoring Long-Running Operations during the execution of the commands ANALYZE
, CLUSTER
, CREATE INDEX
, VACUUM
, COPY
and BASE_BACKUP
.
The SQL/JSON path language is now supported.
Summary Views aggregate statistics across the Greenplum cluster which display the metrics reported by their corresponding gp_ view
.
Automatic Vacuum is now enabled by default for all databases, which automatically performs VACUUM
and ANALYZE
operations against all catalog tables, as well as runs ANALYZE
for all users tables in those databases.
Generated Columns are table columns whose content is computed from other expressions, including references to other columns in the same table, remove the need to use the INSERT
or UPDATE
commands.
Table Access Method (AM) allows you to dynamically alter the storage characteristics of an already populated table, as well as the storage options. See ALTER TABLE and CREATE ACCESS METHOD for more details.
The new schema object PROCEDURE
allows you to store procedures with transaction management. The CREATE PROCEDURE command provides functionality to execute commands like COMMIT
or ROLLBACK
inside the procedural code.
VMware Greenplum Streaming Server (GPSS) version 1.10.1 is bundled, which includes support for VMware Greenplum 7, as well as changes and bug fixes. Refer to the Greenplum Streaming Server Documentation for more information about this release and for upgrade instructions.
Greenplum package utility, gppkg v2
, allows you to install Greenplum Database extensions even when the database is not running.
Greenplum Database 7 introduces the pgvector module, which provides vector similarity search capabilities for Greenplum Database that enable searching, storing, and querying machine language-generated embeddings at large scale.
Greenplum Database 7 includes the following changes. For a comprehensive look at the key changes between Greenplum 6 and Greenplum 7, see https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-changes-6-7-landing-page.html.
Greenplum 7 internally converts external tables to foreign tables. Refer to About Changes to External Tables in Greenplum 7 for more information.
NoteThis release of
gpsupport
does not support log collection for VMware Greenplum Command Center. Thus, the-with-gpcc
option to thegpsupport gp_log_collector
tool is not supported in this release.
The gpfdist parallel file distribution utility now supports multi-threaded data compression and transmission.
The gpfdist parallel file distribution utility includes these SSL-related features and changes:
--ssl_verify_peer <boolean>
that you can specify to enable or disable gpfdist
SSL certificate authentication.off
, gpfdist
no longer requires that the certificate authority file be present on the Greenplum Database segments.The gpscp
utility has been renamed gpsync and now takes a new -a
option, which causes gpsync
to sync source and target directories in archival mode.
The pg_resetxlog
utility has been renamed to pg_resetwal
.
The command pg_dump
includes options to include or exclude leaf tables of partitioned tables.
The pg_tables
command no longer includes external tables in its output.
VMware Greenplum includes a new extension, postgresml
, which provides several new user-defined functions that allow you to use tens of thousands of pre-trained open source AI/machine_learning models in VMware Greenplum.
Functionality from the gp_array_agg
module in VMware Greenplum 6.x is now directly included in the Greenplum 7.x catalog; you do not need to install a separate module. Also, with version 7.x you can use both anynonarray
and anyarray
as input types (compared to only anynonarray
in Greenplum 6.x).
gp_toolkit
is an extension now.
The PostGIS extension supports PostGIS version 3.3.
The parallel retrieve cursor functionality available in the gp_parallel_retrieve_cursor
module in Greenplum 6 is now built-in. See Retrieving Query Results with a Parallel Retrieve Cursor.
The pgvector
extension was updated to version 0.5.0, which adds a new hnsw
index type, adds parallel index builds for the ivfflat
index type, adds the l1_distance
function and sum
aggregate, and adds element-wise multiplication for vectors. This version also improves performance for distance operations, and more. See pgvector.
The CREATE TABLE ... (LIKE ... INCLUDING <keyword> )
command now supports AM
, ENCODING
, and RELOPT
keywords to copy the access method, column encoding, and/or relation options from the source/original table.
Order-agnostic aggregates can now be designated as safe for execution on replicated slices by specifying the REPSAFE = true
parameter to the CREATE AGGREGATE command.
You can now use the CLUSTER
command on append-optimized tables over B-tree indexes.
The SELECT
SQL command now supports the SKIP LOCKED
option.
The new command IMPORT FOREIGN SCHEMA
provides support for importing a complete schema from an external database.
The ALTER <object> DEPENDS ON EXTENSION
command allows a database object to be marked as depending on an extension. The object will be dropped automatically when the extension is dropped, without needing to specify CASCADE
.
The ALTER DEFAULT PRIVILEGES command allows you to set and revoke default permissions on schemas.
You may drop multiple functions, operators, and aggregates with a single DROP
command.
The command CREATE SEQUENCE AS
allows you to create a sequence matching an integer data type. This simplifies the creation of sequences matching the range of base columns.
Some DDL commands now accept the current user (CURRENT_USER
) or the session user (SESSION_USER
) in place of a specific user name.
You must specify FUNCTION
instead of PROCEDURE in
CREATE OPERATOR` as the referenced object must be a function and not a procedure. Greenplum Database accepts the old syntax for compatibility.
The commands CREATE SERVER
, CREATE MATERIALIZED VIEW
, CREATE USER MAPPING
, and CREATE COLLATION
now accept the IF NOT EXISTS
clause.
The new commands ALTER ROUTINE
and DROP ROUTINE
allow you to alter or drop of all routine-like objects, including procedures, functions, and aggregates.
The new command ALTER INDEX ATTACH PARTITION
associates an existing index on a partition with a matching index template for its partitioned table.
The command ALTER INDEX
can set statistics-gathering targets for expression indexes.
The command CREATE AGGREGATE
now has the option OR REPLACE
.
The command CREATE AGGREGATE
now has the option FINALFUNC
which specifies the behavior of the aggregate’s finalization function; this is helpful for optimizing user-defined aggregate functions and allowing them to be specified as window functions.
The CREATE/ALTER USER ... PASSWORD
commands no longer support the UNENCRYPTED
option.
The command ALTER TABLE SET DISTRIBUTED BY
may now be used for external tables. However, you must ensure that the contents of the external/foreign tables satisfies the DISTRIBUTED BY
rules.
The CREATE TABLE
command now supports specifying a table access method with the new USING <access method>
clause.
The new command CREATE ACCESS METHOD
command allows you to create new table types. This enables the development of new table access methods, which can optimize storage for different use cases. The existing heap
access method remains the default.
You can now dynamically update the access method for a table with the ALTER TABLE
command, using the new clause SET ACCESS METHOD <access_method>
.
You can now dynamically update the following storage parameters for a table using the ALTER TABLE
command: appendoptimized
, blocksize
, orientation
, compresstype
, compresslevel
, and checksum
.
Common table expressions (CTE) now support automatic (but overridable) inlining. CTEs are automatically inlined if they have no side-effects, are not recursive, and are referenced only once in the query. You can prevent inlining by specifying MATERIALIZED
, or force inlining for multiple-referenced CTEs by specifying NOT MATERIALIZED
. In previous Greenplum releases, CTEs were never inlined and were always evaluated before the rest of the query.
VACUUM
operations now clean up any dead ranges from BRIN indexes on append-optimized tables.
VACUUM
can now identify pages containing only already frozen tuples in the table’s visibility map and skips these pages, hence reducing the cost of maintaining large tables which contain mostly unchanging data. The new VACUUM
parameter DISABLE_PAGE_SKIPPING
forces VACUUM
to run against all frozen pages in case the contents of the visibility map are suspect, which should happen only if there is a hardware or software issue causing database corruption.
The new option SKIP_LOCKED
allows VACUUM
and ANALYZE
to skip relations that cannot lock immediately due to conflicting locks.
The new option INDEX_CLEANUP
allows VACUUM
to skip index cleanup. Setting the option to false
will make VACUUM
run as quickly as possible, for example, to avoid imminent transaction ID wraparound.
VACUUM
can now avoid unnecessary heap table truncation attempts that require taking an exclusive table lock even when no truncation is possible. This enhancement avoids unnecessary query cancellations on the standby servers.
The new configuration parameter vacuum_cleanup_index_scale_factor
helps minimize unnecessary index scans during VACUUM
.
The new table and partition storage parameter vacuum_index_cleanup
lets you control whether, for a given table ot partition, VACUUM
attempts to remove index entries pointing to dead tuples.
Summary Views aggregate statistics across the Greenplum cluster which display the metrics reported by their corresponding gp_ view
.
Progress reporting system views for Monitoring Long-Running Operations during the execution of the commands ANALYZE
, CLUSTER
, CREATE INDEX
, VACUUM
, COPY
and BASE_BACKUP
.
Progress reporting for Monitoring Long-Running Operations during the execution of the commands ANALYZE
, CLUSTER
, CREATE INDEX
, VACUUM
, COPY
and BASE_BACKUP
.
New function gp_toolkit.get_column_size(oid)
and views gp_toolkit.gp_column_size
, and gp_toolkit.gp_column_size_summary
allow you to view column size and compression ratio for a given AO/AOCO table.
The pg_stat_*
and pg_statio_*
system views now provide information for append-optimized tables and their auxiliary tables.
VMware Greenplum now includes a system catalog table called pg_sequence
, which contains information about sequences. Note that some information about sequences, such as the name and the schema, is stored in the pg_class
system table.
The new gp_
system views are cluster-wide views that display from every primary segment the information reported by its corresponding pg_
system view.
The new catalog views pg_stat_wal and pg_stat_slru display WAL information and track simple least-recently-used (SLRU) caches.
The pg_backend_memory_contexts system view and supporting administration functions report memory contexts and usage for arbitrary backends. See Viewing and Logging Per-Process Memory Usage Information for more information.
Greenplum Database 7 removes the following system catalog tables and views:
pg_partition_columns
pg_partition_encoding
pg_partition_rule
pg_partition_template
pg_stat_partition_operations
In the gp_configuration_history
catalog table, the desc
column has been renamed to description
.
gp_read_error_log()
is enhanced to detect division by zero, JSON mapping, and unsupported unicode errors encountered during foreign scans of external tables. This feature is not supported for a single segment Greenplum Database cluster when the Greenplum Query Optimizer (GPORCA) is enabled.
The functions array_position()
and array_positions()
are now included.
Window functions now support all framing options shown in the SQL:2011 standard, including RANGE
distance PRECEDING/FOLLOWING
, GROUPS
mode, and frame exclusion options enhancement.
The new function gp_toolkit.__gp_aoblkdir(regclass)
helps you obtain each block directory entry for a given AO/AOCO table that had or has an index.
The pattern matching behaviour of the substring()
function is changed. In cases where the pattern can be matched in more than one way, the initial sub-pattern is now treated as matching the least possible amount of text rather than the greatest. For example, a pattern such as %#"aa*#"%
now selects the first group of a
’s from the input, not the last group.
The Greenplum Query Optimizer (GPORCA) supports index-only scans on append-optimized and append-optimized, column-oriented tables.
GPORCA supports backwards index scans.
GPORCA adds support for new server configuration parameters optimizer_enable_dynamicindexonlyscan and optimizer_enable_push_join_below_union_all.
GPORCA does not support planning or executing queries on multi-level partitioned tables.
GPORCA partially supports index-only scans and covering indexes. Refer to the GPORCA Limitations
Unsupported SQL Query Features topic for a list of unsupported features in this area.
GPORCA now supports Dynamic Partition Elimination (DPE) for right joins
GPORCA now supports planning queries that involve foreign tables. Queries on foreign tables and queries on partitioned tables that include a foreign table or external table leaf partition can now be planned by GPORCA.
GPORCA now supports the CUBE
grouping set result set.
GPORCA now supports planning and running queries that you specify with multiple grouping sets.
Altering a column type for AO/CO tables requires only rewriting the column files for the specified column instead of the whole table.
VACUUM
can now run against all auxiliary tables of an append-optimized table with the option AO_AUX_ONLY
.
The catalog table pg_attribute_encoding
now includes a new column filenum
that helps improve efficiency when altering column type for AO/CO tables.
You may now fetch a subset of columns when using the command COPY TO
from a AOCO table.
The table pg_appendonly
no longer records append-only storage options, they are now only listed under pg_class.reloptions
, which significantly reduces the size of pg_appendonly
catalog table.
You may now dynamically update an AOCO table’s column encodings, using the ALTER TABLE
command.
You many now alter a heap table with a unique index to an append-optimized table with a unique index.
When the encoding of a table column changes (ALTER TABLE ... ALTER COLUMN ... SET ENCODING
), Greenplum Database rewrites only the column data, it no longer rewrites the table.
Greenplum Database optimizes the use of snapshots when using immutable functions. It avoids taking a distributed snapshot and uses the local snapshot, resulting in improved performance for OLTP.
The sorting speed of varchar
, text,
and numeric
fields via “abbreviated” keys has been improved.
Greenplum partitions the shared hash table freelist to reduce contention on multi-CPU-socket servers.
There are new performance improvements when using atomic operations, rather than a spinlock, to protect an LWLock’s wait queue.
Greenplum reduces the WAL overhead when building a GiST, GIN, or SP-GiST index; less space on disk is now required for these WAL records and the data replays faster during crash recovery or point-in-time recovery.
You may optionally use the ICU library for collation support.
Partitioned tables now support indexes.
The default size of the sequence cache is changed from 1
(no cache) to 20
to increase the performance of insert operations on tables that are defined with a serial
data type using a sequence value.
VMware Greenplum Database 7.0 removes these features:
createlang
and droplang
utilities.analyzedb
option --skip_root_stats
.gpsys1
utility.gpperfmon
data collection agents, database, and views.ARRAY_NAME
variable.CREATEUSER/NOCREATEUSER
options from CREATE ROLE
and allied commands.gp_percentile_agg
extension.gp_quicklz_fallback
server configuration parameter to ensure backward compatibility.VMware Greenplum Database 7 deprecates the following features:
gpreload
utility is replaced by ALTER TABLE ... REPACK BY
.VMware Greenplum 7.x has these limitations:
pgbouncer
, GreenplumPython, Spark Connector, Apache Nifi Connector, GemFire Connector, Command Center and metrics collector, WLM, cluster recovery, Greenplum upgrade, Greenplum cloud offerings. Linked documentation may still refer to utilities and extensions that are not yet included in this release.