VMware Greenplum 7.x Release Notes

This document contains release information about VMware Greenplum Database 7.x releases. For previous versions of the release notes for Greenplum Database, go to VMware Greenplum Database Documentation. For information about Greenplum Database end of life, see VMware Greenplum Database end of life policy.

VMware Greenplum 7 software is available for download from the VMware Greenplum page on VMware Tanzu Network.

VMware Greenplum 7 is based on the open source Greenplum Database project code.

Release 7.0

Release Date: 2023-09-28

Key New Features

  • Greenplum Database introduces substantial improvements to resource group-based resource management, such as support for Linux Control Groups v2, simplified memory management, and support for disk I/O limits per resource group. Refer to About Changes to Resource Groups for more information about what has changed in resource groups.

  • Index-only scans can answer queries from an index alone without accessing the table’s heap, which significantly improves query performance. In addition, covering indexes allow you to add additional columns to an index using the INCLUDE clause, in order to make the use of index-only scans more effective. See Understanding Index-Only Scans and Covering Indexes for more details.

  • Unique indexes, unique constraints, and primary keys are now supported on append-optimized tables.

  • Fast ANALYZE improves the speed of ANALYZE for append-optimized tables. You do not need to enable fast ANALYZE, this is the default and only behaviour when you analyze an append-optimized table.

  • Greenplum Database no longer rewrites the table when a column is added to a table (ALTER TABLE ... ADD COLUMN ...).

  • PostgreSQL declarative table partitioning syntaxis now supported. See About Changes to Table Partitioning in Greenplum 7 for more details.

  • Multi-column most-common-value (MCV) extended statistics compute the correlation ratio and number of distinct values to generate better plans for queries that test several non-uniformly-distributed columns. This feature is helpful in estimating query memory usage and when combining the statistics from individual columns. See CREATE STATISTICS for more details.

  • UPSERT operations turn INSERT operations that would violate constraints into an UPDATE, or ignore them. See INSERT for more details.

  • BRIN indexes (Block Range INdexes) use much less space in disk compared to a standard b-tree index for very large tables whose columns have some natural correlation with their physical location within the table.

  • Row-leval security allows database administrators to set security policies that filter which rows particular users are permitted to update or view. Refer to About Configuring Row-Level Security Policies for more information.

  • Just-in-Time (JIT) compilation allows you compile otherwise interpreted queries into compiled code at run time, which provides a performance improvement for long running CPU bound queries, like analytical queries.

  • Hash indexes are supported with the Postgres-based planner and GPORCA.

  • The built-in Full Text Search functionality provides data types, functions, operators, index types, and configurations for querying natural language documents. You may also search for phrases (multiple adjacent words) that appear next to each other in a specific order, or with a specified distance between the words.

  • Progress reporting for Monitoring Long-Running Operations during the execution of the commands ANALYZE, CLUSTER, CREATE INDEX, VACUUM, COPY and BASE_BACKUP.

  • The SQL/JSON path language is now supported.

  • Summary Views aggregate statistics across the Greenplum cluster which display the metrics reported by their corresponding gp_ view.

  • Automatic Vacuum is now enabled by default for all databases, which automatically performs VACUUM and ANALYZE operations against all catalog tables, as well as runs ANALYZE for all users tables in those databases.

  • Generated Columns are table columns whose content is computed from other expressions, including references to other columns in the same table, remove the need to use the INSERT or UPDATE commands.

  • Table Access Method (AM) allows you to dynamically alter the storage characteristics of an already populated table, as well as the storage options. See ALTER TABLE and CREATE ACCESS METHOD for more details.

  • The new schema object PROCEDURE allows you to store procedures with transaction management. The CREATE PROCEDURE command provides functionality to execute commands like COMMIT or ROLLBACK inside the procedural code.

  • VMware Greenplum Streaming Server (GPSS) version 1.10.1 is bundled, which includes support for VMware Greenplum 7, as well as changes and bug fixes. Refer to the Greenplum Streaming Server Documentation for more information about this release and for upgrade instructions.

  • Greenplum package utility, gppkg v2, allows you to install Greenplum Database extensions even when the database is not running.

  • Greenplum Database 7 introduces the pgvector module, which provides vector similarity search capabilities for Greenplum Database that enable searching, storing, and querying machine language-generated embeddings at large scale.

Changed Features

Greenplum Database 7 includes the following changes. For a comprehensive look at the key changes between Greenplum 6 and Greenplum 7, see https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-changes-6-7-landing-page.html.

External Tables

Greenplum 7 internally converts external tables to foreign tables. Refer to About Changes to External Tables in Greenplum 7 for more information.

Database Utilities

Note

This release of gpsupport does not support log collection for VMware Greenplum Command Center. Thus, the -with-gpcc option to the gpsupport gp_log_collector tool is not supported in this release.

  • The gpfdist parallel file distribution utility now supports multi-threaded data compression and transmission.

  • The gpfdist parallel file distribution utility includes these SSL-related features and changes:

    • A new option named --ssl_verify_peer <boolean> that you can specify to enable or disable gpfdist SSL certificate authentication.
    • When the verify_gpfdists_cert server configuration parameter is set to off, gpfdist no longer requires that the certificate authority file be present on the Greenplum Database segments.
  • The gpscp utility has been renamed gpsync and now takes a new -a option, which causes gpsync to sync source and target directories in archival mode.

  • The pg_resetxlog utility has been renamed to pg_resetwal.

  • The command pg_dump includes options to include or exclude leaf tables of partitioned tables.

  • The pg_tables command no longer includes external tables in its output.

Additional Supplied Modules and Extensions

  • VMware Greenplum includes a new extension, postgresml, which provides several new user-defined functions that allow you to use tens of thousands of pre-trained open source AI/machine_learning models in VMware Greenplum.

  • Functionality from the gp_array_agg module in VMware Greenplum 6.x is now directly included in the Greenplum 7.x catalog; you do not need to install a separate module. Also, with version 7.x you can use both anynonarray and anyarray as input types (compared to only anynonarray in Greenplum 6.x).

  • gp_toolkit is an extension now.

  • The PostGIS extension supports PostGIS version 3.3.

  • The parallel retrieve cursor functionality available in the gp_parallel_retrieve_cursor module in Greenplum 6 is now built-in. See Retrieving Query Results with a Parallel Retrieve Cursor.

  • The pgvector extension was updated to version 0.5.0, which adds a new hnsw index type, adds parallel index builds for the ivfflat index type, adds the l1_distance function and sum aggregate, and adds element-wise multiplication for vectors. This version also improves performance for distance operations, and more. See pgvector.

SQL Commands

  • The CREATE TABLE ... (LIKE ... INCLUDING <keyword> ) command now supports AM, ENCODING, and RELOPT keywords to copy the access method, column encoding, and/or relation options from the source/original table.

  • Order-agnostic aggregates can now be designated as safe for execution on replicated slices by specifying the REPSAFE = true parameter to the CREATE AGGREGATE command.

  • You can now use the CLUSTER command on append-optimized tables over B-tree indexes.

  • The SELECT SQL command now supports the SKIP LOCKED option.

  • The new command IMPORT FOREIGN SCHEMA provides support for importing a complete schema from an external database.

  • The ALTER <object> DEPENDS ON EXTENSION command allows a database object to be marked as depending on an extension. The object will be dropped automatically when the extension is dropped, without needing to specify CASCADE.

  • The ALTER DEFAULT PRIVILEGES command allows you to set and revoke default permissions on schemas.

  • You may drop multiple functions, operators, and aggregates with a single DROP command.

  • The command CREATE SEQUENCE AS allows you to create a sequence matching an integer data type. This simplifies the creation of sequences matching the range of base columns.

  • Some DDL commands now accept the current user (CURRENT_USER) or the session user (SESSION_USER) in place of a specific user name.

  • You must specify FUNCTION instead of PROCEDURE inCREATE OPERATOR` as the referenced object must be a function and not a procedure. Greenplum Database accepts the old syntax for compatibility.

  • The commands CREATE SERVER, CREATE MATERIALIZED VIEW, CREATE USER MAPPING, and CREATE COLLATION now accept the IF NOT EXISTS clause.

  • The new commands ALTER ROUTINE and DROP ROUTINE allow you to alter or drop of all routine-like objects, including procedures, functions, and aggregates.

  • The new command ALTER INDEX ATTACH PARTITION associates an existing index on a partition with a matching index template for its partitioned table.

  • The command ALTER INDEX can set statistics-gathering targets for expression indexes.

  • The command CREATE AGGREGATE now has the option OR REPLACE.

  • The command CREATE AGGREGATE now has the option FINALFUNC which specifies the behavior of the aggregate’s finalization function; this is helpful for optimizing user-defined aggregate functions and allowing them to be specified as window functions.

  • The CREATE/ALTER USER ... PASSWORD commands no longer support the UNENCRYPTED option.

  • The command ALTER TABLE SET DISTRIBUTED BY may now be used for external tables. However, you must ensure that the contents of the external/foreign tables satisfies the DISTRIBUTED BY rules.

  • The CREATE TABLE command now supports specifying a table access method with the new USING <access method> clause.

  • The new command CREATE ACCESS METHOD command allows you to create new table types. This enables the development of new table access methods, which can optimize storage for different use cases. The existing heap access method remains the default.

  • You can now dynamically update the access method for a table with the ALTER TABLE command, using the new clause SET ACCESS METHOD <access_method>.

  • You can now dynamically update the following storage parameters for a table using the ALTER TABLE command: appendoptimized, blocksize, orientation, compresstype, compresslevel, and checksum.

  • Common table expressions (CTE) now support automatic (but overridable) inlining. CTEs are automatically inlined if they have no side-effects, are not recursive, and are referenced only once in the query. You can prevent inlining by specifying MATERIALIZED, or force inlining for multiple-referenced CTEs by specifying NOT MATERIALIZED. In previous Greenplum releases, CTEs were never inlined and were always evaluated before the rest of the query.

Database Maintenance

  • VACUUM operations now clean up any dead ranges from BRIN indexes on append-optimized tables.

  • VACUUM can now identify pages containing only already frozen tuples in the table’s visibility map and skips these pages, hence reducing the cost of maintaining large tables which contain mostly unchanging data. The new VACUUM parameter DISABLE_PAGE_SKIPPING forces VACUUM to run against all frozen pages in case the contents of the visibility map are suspect, which should happen only if there is a hardware or software issue causing database corruption.

  • The new option SKIP_LOCKED allows VACUUM and ANALYZE to skip relations that cannot lock immediately due to conflicting locks.

  • The new option INDEX_CLEANUP allows VACUUM to skip index cleanup. Setting the option to false will make VACUUM run as quickly as possible, for example, to avoid imminent transaction ID wraparound.

  • VACUUM can now avoid unnecessary heap table truncation attempts that require taking an exclusive table lock even when no truncation is possible. This enhancement avoids unnecessary query cancellations on the standby servers.

  • The new configuration parameter vacuum_cleanup_index_scale_factor helps minimize unnecessary index scans during VACUUM.

  • The new table and partition storage parameter vacuum_index_cleanup lets you control whether, for a given table ot partition, VACUUM attempts to remove index entries pointing to dead tuples.

System Functions, Catalog Tables and Views

  • Summary Views aggregate statistics across the Greenplum cluster which display the metrics reported by their corresponding gp_ view.

  • Progress reporting system views for Monitoring Long-Running Operations during the execution of the commands ANALYZE, CLUSTER, CREATE INDEX, VACUUM, COPY and BASE_BACKUP.

  • Progress reporting for Monitoring Long-Running Operations during the execution of the commands ANALYZE, CLUSTER, CREATE INDEX, VACUUM, COPY and BASE_BACKUP.

  • New function gp_toolkit.get_column_size(oid) and views gp_toolkit.gp_column_size, and gp_toolkit.gp_column_size_summary allow you to view column size and compression ratio for a given AO/AOCO table.

  • The pg_stat_* and pg_statio_* system views now provide information for append-optimized tables and their auxiliary tables.

  • VMware Greenplum now includes a system catalog table called pg_sequence, which contains information about sequences. Note that some information about sequences, such as the name and the schema, is stored in the pg_class system table.

  • The new gp_ system views are cluster-wide views that display from every primary segment the information reported by its corresponding pg_ system view.

  • The new catalog views pg_stat_wal and pg_stat_slru display WAL information and track simple least-recently-used (SLRU) caches.

  • The pg_backend_memory_contexts system view and supporting administration functions report memory contexts and usage for arbitrary backends. See Viewing and Logging Per-Process Memory Usage Information for more information.

  • Greenplum Database 7 removes the following system catalog tables and views:

    • pg_partition_columns
    • pg_partition_encoding
    • pg_partition_rule
    • pg_partition_template
    • pg_stat_partition_operations
  • In the gp_configuration_history catalog table, the desc column has been renamed to description.

  • gp_read_error_log() is enhanced to detect division by zero, JSON mapping, and unsupported unicode errors encountered during foreign scans of external tables. This feature is not supported for a single segment Greenplum Database cluster when the Greenplum Query Optimizer (GPORCA) is enabled.

  • The functions array_position() and array_positions() are now included.

  • Window functions now support all framing options shown in the SQL:2011 standard, including RANGE distance PRECEDING/FOLLOWING, GROUPS mode, and frame exclusion options enhancement.

  • The new function gp_toolkit.__gp_aoblkdir(regclass) helps you obtain each block directory entry for a given AO/AOCO table that had or has an index.

  • The pattern matching behaviour of the substring() function is changed. In cases where the pattern can be matched in more than one way, the initial sub-pattern is now treated as matching the least possible amount of text rather than the greatest. For example, a pattern such as %#"aa*#"% now selects the first group of a’s from the input, not the last group.

Greenplum Query Optimizer (GPORCA)

  • The Greenplum Query Optimizer (GPORCA) supports index-only scans on append-optimized and append-optimized, column-oriented tables.

  • GPORCA supports backwards index scans.

  • GPORCA adds support for new server configuration parameters optimizer_enable_dynamicindexonlyscan and optimizer_enable_push_join_below_union_all.

  • GPORCA does not support planning or executing queries on multi-level partitioned tables.

  • GPORCA partially supports index-only scans and covering indexes. Refer to the GPORCA Limitations Unsupported SQL Query Features topic for a list of unsupported features in this area.

  • GPORCA now supports Dynamic Partition Elimination (DPE) for right joins

  • GPORCA now supports planning queries that involve foreign tables. Queries on foreign tables and queries on partitioned tables that include a foreign table or external table leaf partition can now be planned by GPORCA.

  • GPORCA now supports the CUBE grouping set result set.

  • GPORCA now supports planning and running queries that you specify with multiple grouping sets.

Append-Optimized Tables

  • Altering a column type for AO/CO tables requires only rewriting the column files for the specified column instead of the whole table.

  • VACUUM can now run against all auxiliary tables of an append-optimized table with the option AO_AUX_ONLY.

  • The catalog table pg_attribute_encoding now includes a new column filenum that helps improve efficiency when altering column type for AO/CO tables.

  • You may now fetch a subset of columns when using the command COPY TO from a AOCO table.

  • The table pg_appendonly no longer records append-only storage options, they are now only listed under pg_class.reloptions, which significantly reduces the size of pg_appendonly catalog table.

  • You may now dynamically update an AOCO table’s column encodings, using the ALTER TABLE command.

  • You many now alter a heap table with a unique index to an append-optimized table with a unique index.

Performance

  • When the encoding of a table column changes (ALTER TABLE ... ALTER COLUMN ... SET ENCODING), Greenplum Database rewrites only the column data, it no longer rewrites the table.

  • Greenplum Database optimizes the use of snapshots when using immutable functions. It avoids taking a distributed snapshot and uses the local snapshot, resulting in improved performance for OLTP.

  • The sorting speed of varchar, text, and numeric fields via “abbreviated” keys has been improved.

  • Greenplum partitions the shared hash table freelist to reduce contention on multi-CPU-socket servers.

  • There are new performance improvements when using atomic operations, rather than a spinlock, to protect an LWLock’s wait queue.

  • Greenplum reduces the WAL overhead when building a GiST, GIN, or SP-GiST index; less space on disk is now required for these WAL records and the data replays faster during crash recovery or point-in-time recovery.

  • You may optionally use the ICU library for collation support.

  • Partitioned tables now support indexes.

  • The default size of the sequence cache is changed from 1 (no cache) to 20 to increase the performance of insert operations on tables that are defined with a serial data type using a sequence value.

Removed Features

VMware Greenplum Database 7.0 removes these features:

  • The previously-deprecated createlang and droplang utilities.
  • The Greenplum R Client (GreenplumR).
  • Greenplum MapReduce.
  • The PL/Container 3.0 Beta extension.
  • The analyzedb option --skip_root_stats.
  • The gpsys1 utility.
  • The gpperfmon data collection agents, database, and views.
  • The ARRAY_NAME variable.
  • The CREATEUSER/NOCREATEUSER options from CREATE ROLE and allied commands.
  • The gp_percentile_agg extension.
  • VMware Greenplum Database 7 removes support for the QuickLZ compression algorithm. Use the new gp_quicklz_fallback server configuration parameter to ensure backward compatibility.

Deprecated Features

VMware Greenplum Database 7 deprecates the following features:

  • gpreload utility is replaced by ALTER TABLE ... REPACK BY.

Known Issues and Limitations

VMware Greenplum 7.x has these limitations:

  • You currently cannot upgrade from a previous major version of Greenplum to Greenplum 7.
  • You cannot use resource groups to manage and limit the total CPU and memory resources for a PL/Container runtime. Container instances are limited only by system resources, and the containers may consume resources at the expense of the Greenplum Database server. Future releases of Greenplum 7 may restore functionality to manage PL/Container resources using resource groups.
  • Some VMware Greenplum utilities and extensions are not yet included or supported with this release, including: pgbouncer, GreenplumPython, Spark Connector, Apache Nifi Connector, GemFire Connector, Command Center and metrics collector, WLM, cluster recovery, Greenplum upgrade, Greenplum cloud offerings. Linked documentation may still refer to utilities and extensions that are not yet included in this release.
check-circle-line exclamation-circle-line close-line
Scroll to top icon