VMware Greenplum 7.x Release Notes

This document contains release information about VMware Greenplum 7.x releases. For previous versions of the release notes for VMware Greenplum, go to VMware VMware Greenplum Documentation. For information about VMware Greenplum end of life, see VMware VMware Greenplum end of life policy.

VMware Greenplum 7 software is available for download from the VMware Greenplum page on Broadcom Support Portal.

Note

For more information about download prerequisites, troubleshooting, and instructions, see Download Broadcom products and software.

Release 7.3

Release 7.3.1

Release Date: 2024-08-22

VMware Greenplum 7.3.1 is a minor release that resolves several issues.

Resolved Issues

N/A
Resolves an issue with VACUUM of append-optimized tables that might cause incorrect SELECT results or PANIC errors.
35572832
Resolves a panic related to the sort execution.
35573083
Resolves a PANIC error while moving query to a different resource group.
N/A
Resolves a relcache leak caused when ANALYZE was interrupted.
N/A
Resolves an issue in sampling AO_ROW tables which might result in frequent failures at ANALYZE.
35582363
Resolves an issue where regex orphaned_toast_tables_query was updated to make schema optional. Also, gpcheckcat.distpolicy.sql will be regenerated on each run of gpcheckcat mix_distribution_policy test.

Release 7.3.0

Release Date: 2024-08-02

VMware Greenplum 7.3.0 is a minor release that includes new and changed features and resolves several issues.

New Features

  • VMware Greenplum 7.3.0 introduces the Greenplum Automated Machine Learning Agent (gpMLBot), a command-line interface to assist users in utilizing Apache MADlib and PostgresML for automated data processing, hyperparameter optimization, and model management.

  • VMware Greenplum 7.3.0 introduces a feature for counting CPU cores in gp_toolkit. This includes one UDF (__gp_get_num_logical_cores) and three views (gp_toolkit.gp_num_physical_cores, gp_toolkit.gp_num_physical_cores_segments, and gp_toolkit.gp_num_physical_cores_per_host). This feature, designed for Linux, allows super users to count physical cores across the Greenplum cluster, including segment hosts and individual hosts.

Enhancements

  • VMware Greenplum 7.3.0 now supports automatic creation of dependent extensions for orafce_ext. Running the CREATE EXTENSION orafce_ext CASCADE; command automatically creates the dependent extensions as declared in the control file.

  • VMware Greenplum 7.3.0 now supports logical decoding, allowing Greenplum Database to be used as a source database in the Greenplum Change Data Capture solution.

  • VMware Greenplum 7.3.0 now supports the hostname parameter in the following input configuration files:

    • gpaddmirrors
    • gpmovemirrors
  • VMware Greenplum 7.3.0 improves UPDATE performance for gpexpand.status_detail by:

    • Using coordinator-only distribution.
    • Creating a Btree index on the table.
    • Reducing the overall number of connections for updating gpexpand.status_detail to match the number of parallel jobs.
  • VMware Greenplum 7.3.0 optimizes the management of idle processes to improve performance.

  • VMware Greenplum 7.3.0 now supports the use of analyzedb on tables containing newline or comma characters in their names.

  • VMware Greenplum 7.3.0 now supports viewing tables that failed to redistribute and tables that have been dropped:

    • Failed Redistribution: Tables that failed to redistribute are tracked in the gpexpand status tables and views.
    • Dropped Tables: Tables dropped since the setup phase are tracked in the gpexpand.expansion_progress view.

Updated Libraries

  • The Greenplum package utility gppkg has been updated to version 2.2.1.

Resolved Issues

Server

35507450
Resolves an issue where selecting from non-empty, zero-column append-optimized table could cause a segmentation fault.
N/A
Resolves a query result error that occurred when using bitmap and other indexes simultaneously.
N/A
Resolves an issue where gp_interconnect_address_type=wildcard did not function correctly.
N/A
Resolves an issue where the planner generated incorrect plans for DQA using aggfilter.
N/A
Resolves an issue with waitGxids when handling large gxid values.
N/A
Resolves a file leak issue caused by the initplan function.
N/A
Resolves an issue where DistributedTransactionId to TransactionId conversion caused incorrect values in the UDF gp_distributed_xid() and the views gp_distributed_log and gp_distributed_xacts.
N/A
Resolves an issue where active_statements settings had incorrect values in pg_resqueue.
N/A
Resolves an issue where the TRUNCATE command generated the error attempted to update invisible tuple.
N/A
Resolves an issue where ALTER DATABASE/ROLE ... SET gp_default_storage_options was not consistent across segments.
N/A
Resolves an issue where error occurs when executing an EXCEPT operation over a non-hashable column.

Query Processing

35258690
Resolves an issue where Orca did not handle Chinese characters properly when the database LC_COLLATE and LC_CTYPE were set to C.
35507324
Resolves an issue where Orca crashed when calling percentile_cont and percentile_disc on pass-by-ref datatypes, such as numeric and interval.
N/A
Resolves an issue where Orca's libpq interface for VIEW returned inaccurate resorigtbl metadata.
N/A
Resolves an issue where the H3 extension version function incorrectly displayed the version as unreleased instead of the correct released version.

Release 7.2

Release 7.2.0

Release Date: 2024-06-20

VMware Greenplum 7.2.0 is a minor release that includes new and changed features and resolves several issues.

New and Changed Features

Enhancements

  • GPORCA now supports a number of features previously only supported in the Postgres-based Planner:

    • Prepared statements
    • Functions containing query parameters
    • DISTINCT-qualified window aggregates
    • Full hash joins
  • With the 7.2.0 release, mirrorless Greenplum architectures no longer use the HA service to provide high availability of Greenplum primary segments in place of the FTS probe used in mirrored architectures. The HA service required that Greenplum state was controlled with root for multiple services, causing contention between normal cluster utilities such as gpstart/gpstop and the HA service. This issue and other usability issues are resolved by the new Postmaster service, which entirely replaces the original HA service. For more information about using the Postmaster service, see Installing the Greenplum High Availability Service.

  • VMware Greenplum 7.2.0 generates less WAL for COPY FROM on heap tables when executed in the same transaction as CREATE TABLE.

  • VMware Greenplum 7.2.0 enhances gpexpand performance in segment cleanup.

  • VMware Greenplum 7.2.0 improves pg_basebackup, pg_rewind, and rsync logging to stdout and retains recovery progress files in the log directory after successful recovery.

  • VMware Greenplum 7.2.0 sets simple progress tracking as the default for gpexpand and adds a --detailed-progress option for detailed progress tracking.

  • The gpcheckcat utility now includes a new test -- mix_distribution_policy -- which checks for tables created with legacy and non-legacy hash operations.

  • The gpsupport gp_log_collector tool now supports gathering logs for VMware Greenplum Disaster Recovery, via the new -with-gpdr-primary and -with-gpdr-recovery options.

  • VMware Greenplum now supports plan hints for: Scan, Row Estimation, Join Order and Join Types.

  • VMware Greenplum now supports index scans for append-optimized tables in comparison to previously only supporting in-memory bitmap scan.

    • VMware Greenplum 7.2.0 introduces two new server configuration parameters related to index scans of append-optimized tables:
      • gp_cpu_decompress_cost allows a user to fine-tune the cost of decompression during index scans of append-optimized tables.
      • gp_enable_ao_indexscan enables index scans on append-optimized tables.
  • VMware Greenplum 7.2.0 introduces a new server configuration parameter — gp_appendonly_compaction_segfile_limit. This parameter sets the minimum number of segment files required for inserts before the next compaction.

  • VMware Greenplum 7.2.0 re-introduces the following server configuration parameters:

    • gp_max_partition_level caps the number of levels of a partition hierarchy that can be created using classic syntax.
    • gp_resgroup_print_operator_memory_limits allows printing the memory limits for operators (in explain) assigned by the resource group's memory management.
  • VMware Greenplum 7.2.0 now supports OFFSET/LIMIT pushdown for foreign tables with data distributed across multiple remote servers when mpp_execute = 'all segments' is set.

  • The ADD COLUMN command for append-optimized column-oriented tables no longer needs to write default values for the full column.

    Caution

    If your database contains such tables, you may not be able to downgrade from future releases to VMware Greenplum 7.1 or earlier releases.

  • VMware Greenplum 7.2.0 now supports pg_attribute_encoding catalog search using syscache.

New Extensions/Modules

  • VMware Greenplum 7.2.0 introduces the pg_cron module, which provides a cron-based job scheduler that runs inside the database.

  • VMware Greenplum 7.2.0 introduces the 3DCityDb module, which enables spatial data processing.

  • VMware Greenplum 7.2.0 introduces the H3 module, which provides hexagonal hierarchical geospatial indexing.

  • VMware Greenplum 7.2.0 supports Run-length encoding (RLE) compression with the Zstandard algorith, or zstd for column-oriented tables.

  • The gpstate -e command now displays an additional field called "Startup recovery remaining bytes". This field reports the number of bytes of startup WAL archive recovery remaining for the mirror segment that is undergoing recovery before the segment is marked as "up" in the gp_segment_configuration table.

  • VMWware Greenplum 7.2.0 introduces a new extension, orafce_ext, which provides Oracle Compatibility SQL functions for manipulating RAW datatypes.

Updated Libraries

  • The pgvector module has been updated to version 0.7.0. Refer to pgvector for module and upgrade information.

  • The Python version for PL/Container and PostgresML has been updated from 3.9 to 3.11.

Changes

  • The resource group parameter MEMORY_LIMIT has been renamed to MEMORY_QUOTA.

  • The log_checkpoints server configuration parameter is now set to on by default.

  • In order to use VMware Greenplum Text with VMware Greenplum v7.2.0 and higher, you must set the default Python 3 version to 3.9 or higher.

Resolved Issues

Server

378957
Resolves an issue where the server didn't clean up the subprogram launched for COPY when used with PROGRAM on transaction abort, which causes child programs to stay connected and potentially hang downstream programs such as gprestore with --on-error-continue.
382342
Resolves an issue where segment processes may crash when running ANALYZE on ao_column tables.
383031
VMware Greenplum now reports intermediate dtx protocol errors when debug_print_full_dtm is on.
380617
Resolves an issue where queries creating query executors on the coordinator host triggered authentication errors if PGHOST/PGHOSTADDR were set in coordinator host environment.
383615
Resolves an issue that causes index creation errors following data restoration.
35213005
Resolves an issue where the server crashes when adding mirrors by using gpinitsystem.
35298442
Resolves an issue where DELETE and non-split UPDATE operations on heap tables did not obey the wait_for_replication_threshold GUC.
N/A
Resolves an issue with the gp_toolkit.gp_move_orphaned_files function where files were not moved correctly on the segment host. This fix also allows the gp_check_orphaned_files view and function to run with idle sessions, making it more convenient to use.
N/A
Resolves an issue where, when BRIN indexes are involved -- partial scans of append-optimized, column-oriented tables produced incorrect results when the scanned columns' data types differed.
N/A
Resolves an issue where tables using an enum distribution key could not be restored by gprestore.
N/A
Resolves an issue where, when restoring from VMware Greenplum 6 with triggers to VMware Greenplum 7, the triggers would cause pg_restore to fail.
N/A
Resolves an issue where setting the deadlock_timeout GUC only on the coordinator does not propagate the new value to all segments.
N/A
Resolves an issue where a memory leak occurs during the merging of leaf partition statistics in the ANALYZE process.
N/A
Resolves CVE-2024-0985: PostgreSQL non-owner REFRESH MATERIALIZED VIEW CONCURRENTLY executes arbitrary SQL.
N/A
Resolves an error that occurs when changing the access method to ao_column for a table with nine or more columns.
N/A
Resolves an issue where UPDATE statements against append-optimized tables with unique indexes could error out with "attempted to update invisible tuple".
N/A
Resolves an issue where gpstart failed to start a mirror when its primary was marked as down in the configuration.
N/A
Resolves an issue where pg_get_expr might generate incorrect syntax for heap tables.
N/A
Resolves an issue where subprograms inadvertently inherit the parent backend's file descriptors.

Query Processing

N/A
Resolves an issue where GPORCA fell back to the planner when CTE contained an outer reference.
N/A
Resolves an issue where hash aggregates always spill when their expected memory exceeds work_mem, leading to poor performance.
N/A
Resolves an issue where Orca crashed when using the WITH ORDINALITY clause.

Cluster Management

377933
Resolves an issue where the gpcheckperf utility was reporting an error when time included a comma.
371805
Resolves an issue where gpssh fails in environments that require the TERM variable to be set.

Data Flow

N/A
Resolves an issue where VMware Greenplum was erroneously reporting errors when users accessed external tables, when the table's LOCATION attribute included the | character as a delimiter for multiple location URIs.
N/A
Resolves an issue where the options column of the pg_exttable view was missing some information about log errors.

Release 7.1

Release 7.1.0

Release Date: 2024-02-09

VMware Greenplum 7.1.0 is a minor release that includes new and changed features and resolves several issues.

New and Changed Features

VMware Greenplum 7.1.0 includes these new and changed features:

  • The pgvector module was updated to version 0.5.1. Refer to pgvector for module and upgrade information.
  • The ip4r module was updated to version 2.4.2. See ip4r.
  • VMware Greenplum 7.1.0 introduces the tablefunc module, which provides various examples of functions that return tables.
  • VMWware Greenplum includes a new extension - pg_buffercache -- which gives users access to five views to obtain clusterwide shared buffer metrics: gp_buffercache, gp_buffercache_summary, gp_buffercache_usage_counts, gp_buffercache_summary_aggregated, and gp_buffercache_usage_counts_aggregated.
  • VMware Greenplum 7.1.0 adds the gp_move_orphaned_files user-defined function (UDF) to the gp_toolkit administrative schema, which moves orphaned files found by the gp_check_orphaned_files view into a file system location that you specify.
  • The gp_check_orphaned_files view in the gp_toolkit schema contains a new column - filepath -- which prints relative/absolute path of the orphaned file.
  • Greenplum package utility, gppkg, introduces a new option to specify the name of the package to migrate to another minor version of VMware Greenplum, instead of migrating all packages.
  • The gp_toolkit administrative schema now includes some objects to aid in partition maintenance: a new view -- gp_partitions, and several new user-defined functions, including: pg_partition_rank(), pg_partition_range_from(), pg_partition_range_to(), pg_partition_bound_value(), pg_partition_isdefault(), pg_partition_lowest_child(), and pg_partition_highest_child(). See The gp_toolkit Administrative Schema topic for details.
  • VMware Greenplum introduces a new utility -- pg_filedump -- which allows you to read formatted content of VMware Greenplum data files, including table, index and control files.
  • Query optimization has been fine tuned to enhance performance for queries containing multiple DQA (Distinct Qualified Aggregate) and standard aggregates. This refinement leads to substantial IO savings, resulting in improved processing speed. This optimization may not be applicable for certain specialized queries, such as scenarios in which there are multiple columns from different DQA sources within a standard aggregate, or when filters are present within the DQA.
  • The new gp_postmaster_address_family server configuration parameter tells a node which type of IP address to use when initializing a cluster.
  • Greenplum's Data Science Package for Python now includes the catboost library, a high-performance open source library for gradient boosting on decision trees.
  • VMware Greenplum now supports differential segment recovery when using input configuration files (gprecoverseg -i). In addition, you may now prepend an I, D, or F to an entry in the recover_config_file you pass to gprecoverseg -i to indicate the type of segment recovery.
  • EXPLAIN ANALYZE now shows buffer usage and I/O timings when using the BUFFERS keyword.
  • The gpstate utility now tracks data synchronization for a differential recovery with the -e option.
  • VMware Greenplum now supports the TABLESAMPLE clause for append-optimized tables, in addition to heap tables. Both BERNOULLI and SYSTEM sampling methods are now supported.
  • VMware Greenplum now supports the SYSTEM_ROWS and SYSTEM_TIME sampling methods for all tables, made available through the new tsm_system_rows and tsm_system_time modules, respectively.
  • The gppkg utility option -f now helps remove packages which have incomplete or missing files.
  • The PgBouncer connection pooler 1.21.0 is now distributed with VMware Greenplum 7.1.0, which includes support for encrypted LDAP passwords. Refer to Using the PgBouncer Connection Pooler for more details.
  • The new gprecoverseg option max-rate allows you to limit the maximum transfer bandwidth rate for a full segment recovery.
  • The gpmovemirrors utility has a new disk space check, so the utility will fail if the target host does not have enough space to accommodate the new mirrors.
  • Autovacuum now drops any orphaned temporary tables not dropped by the backends they were created on.
  • You may manually configure the location of your VMware Greenplum logs with the server configuration parameter log_directory. The gpsupport utility also supports collecting the logs from the directory set by this server configuration parameter.
  • The system view gp_stat_progress_dtx_recovery displays the progress of the Distributed Transaction (DTX) Recovery process, which may be useful to monitor the status of a coordinator recovery after a crash.
  • The new gp_autotstats_lock_wait server configuration parameter allows you to control whether ANALYZE commands triggered by automatic statistics collection will block if they cannot acquire the table lock.
  • The new optimizer_enable_right_outer_join server configuration parameter allows you to control whether GPORCA generates right outer joins. In situations in which you are observing poor performance related to right outer joins you may choose to suppress their use.
  • VMware Greenplum 7.1 now supports the VMware Greenplum Virtual Appliance. The virtual machine appliance contains everything you may need for an easy deploying of VMware Greenplum on vSphere. See VMware Greenplum on vSphere for more details.
  • The PostgresML extension now includes the pgml.train and pgml.predict functions for supervised learning.
  • You may configure one or more hosts outside your Greenplum cluster to use as a remote container host for your PL/Container workload, reducing the computing overload of the Greenplum hosts. See Configuring a Remote PL/Container for more details.
  • You can now use resource groups to manage and limit the total CPU resources for a PL/Container runtime. See PL/Container Resource Management for more details.
  • You can now download a VMware Greenplum 7 PL/Container image for R from Broadcom Support Portal under the specific Greenplum release.
  • The VACUUM command now includes the SKIP_DATABASE_STATS and ONLY_DATABASE_STATS clauses.
  • The output of the pg_config command now includes the Greenplum version.
  • This release introduces a performance improvement that reduces per-tuple CPU and memory overhead for writes to tables having BRIN indexes.
  • Users may now change the checkpoint_timeout server configuration parameter value in order to control how frequently VMware Greenplum performs automatic checkpoints.

Resolved Issues

Server

16081
Resolves an issue where pg_ctl stop -m fast on the Greenplum master hung indefinitely.
33011
Resolves an issue where intermittent network issues occurred reporting the error failed to acquire resources on one or more segments. Now Greenplum retries gang creation by default for non-recovery failures.
358819
Resolves an issue where recursive CTE (Common Table Expression) errored out due to the creation of an incorrect or non-executable plan.
367725
Resolves an issue where plpython user-defined functions (UDFs) crashed when an interconnect error occurred.
15652
Resolves an issue where successive VACUUM commands on a table in which the data had not been modified could produce distorted pg_class.reltuple estimates.
N/A
Resolves an issue where the gp_toolkit extension was created under the public schema, while its objects were created under the gp_toolkit schema; as a result, users had to drop the public schema in order to drop the extension.
N/A
Resolves an issue where serializing tuples needed to read a field before setting its value.
N/A
Resolves an issue caused by moving memory context from an accounting node to another.
N/A
Resolves an issue where the ic-proxy process did not provide an error message when the peer listener failed. It now displays the error ERROR: SetupInterconnect: We are in IC_PROXY mode, but IC-Proxy Listener failed, please check.
N/A
Resolves an issue where changing the access method from append-optimized row-oriented to column-oriented could fail when the row-oriented table contained pg_attribute_encoding entries.
N/A
Resolves an issue that caused queries using append-optmimized tables and bitmap indexes to run for a very long time with 100% CPU consumption.
N/A
Resolves an issue where append-optimized tables had an array type when changing the access method from heap to append-optimized. Also, when changing from append-optimized to heap, the array type is now automatically created.
N/A
Resolves an issue where a query hung at shareinput_writer_waitdone. The fix also includes a new server configuration parameter debug_shareinput_xslice, which allows you to print cross-slice input scan information to the server log.

Query Processing

369259
Resolves out-of-memory issues with the new gp_max_system_slices server configuration parameter. This parameter allows you to limit the maximum number of slices a query can use, preventing queries that might provoke memory issues from running in the first place.

Cluster Management

N/A
Resolves an issue where the gpstop utility errored out when there was an empty PID file in the gpstop.lock directory.

Data Flow

16564
Resolves an issue where gpfdist --help displayed incorrect information for ssl_verify_peer.
16769
Resolves an issue where SET DISTRIBUTED REPLICATED was enabled for the ALTER EXTERNAL TABLE command. This clause has now been disabled for the command.

Additional Supplied Modules and Extensions

15093
Resolves an issue where pg_stat_statements generated the message: WARNING: unrecognized node type.
N/A
Resolves an issue where python 3.9 packages failed to upgrade using gppkg.
N/A
Resolves an issue where gppkg sync generated the error can not truncate directory due to a change in the directory file type.

Release 7.0

Release 7.0.0

Release Date: 2023-09-28

Key New Features

  • VMware Greenplum introduces substantial improvements to resource group-based resource management, such as support for Linux Control Groups v2, simplified memory management, and support for disk I/O limits per resource group. Refer to About Changes to Resource Groups for more information about what has changed in resource groups.

  • Index-only scans can answer queries from an index alone without accessing the table's heap, which significantly improves query performance. In addition, covering indexes allow you to add additional columns to an index using the INCLUDE clause, in order to make the use of index-only scans more effective. See Understanding Index-Only Scans and Covering Indexes for more details.

  • Unique indexes, unique constraints, and primary keys are now supported on append-optimized tables.

  • Fast ANALYZE improves the speed of ANALYZE for append-optimized tables. You do not need to enable fast ANALYZE, this is the default and only behaviour when you analyze an append-optimized table.

  • VMware Greenplum no longer rewrites the table when a column is added to a table (ALTER TABLE ... ADD COLUMN ...).

  • PostgreSQL declarative table partitioning syntaxis now supported. See About Changes to Table Partitioning in Greenplum 7 for more details.

  • Multi-column most-common-value (MCV) extended statistics compute the correlation ratio and number of distinct values to generate better plans for queries that test several non-uniformly-distributed columns. This feature is helpful in estimating query memory usage and when combining the statistics from individual columns. See CREATE STATISTICS for more details.

  • UPSERT operations turn INSERT operations that would violate constraints into an UPDATE, or ignore them. See INSERT for more details.

    Note

    This operation is available for heap tables only.

  • BRIN indexes (Block Range INdexes) use much less space in disk compared to a standard b-tree index for very large tables whose columns have some natural correlation with their physical location within the table.

  • Row-leval security allows database administrators to set security policies that filter which rows particular users are permitted to update or view. Refer to About Configuring Row-Level Security Policies for more information.

  • Just-in-Time (JIT) compilation allows you compile otherwise interpreted queries into compiled code at run time, which provides a performance improvement for long running CPU bound queries, like analytical queries.

  • Hash indexes are supported with the Postgres-based planner and GPORCA.

  • The built-in Full Text Search functionality provides data types, functions, operators, index types, and configurations for querying natural language documents. You may also search for phrases (multiple adjacent words) that appear next to each other in a specific order, or with a specified distance between the words.

  • Progress reporting for Monitoring Long-Running Operations during the execution of the commands ANALYZE, CLUSTER, CREATE INDEX, VACUUM, COPY and BASE_BACKUP.

  • The SQL/JSON path language is now supported.

  • Summary Views aggregate statistics across the Greenplum cluster which display the metrics reported by their corresponding gp_ view.

  • Automatic Vacuum is now enabled by default for all databases, which automatically performs VACUUM and ANALYZE operations against all catalog tables, as well as runs ANALYZE for all users tables in those databases.

  • Generated Columns are table columns whose content is computed from other expressions, including references to other columns in the same table, remove the need to use the INSERT or UPDATE commands.

  • Table Access Method (AM) allows you to dynamically alter the storage characteristics of an already populated table, as well as the storage options. See ALTER TABLE and CREATE ACCESS METHOD for more details.

  • The new schema object PROCEDURE allows you to store procedures with transaction management. The CREATE PROCEDURE command provides functionality to execute commands like COMMIT or ROLLBACK inside the procedural code.

  • VMware Greenplum Streaming Server (GPSS) version 1.10.1 is bundled, which includes support for VMware Greenplum 7, as well as changes and bug fixes. Refer to the Greenplum Streaming Server Documentation for more information about this release and for upgrade instructions.

  • Greenplum package utility, gppkg v2, allows you to install VMware Greenplum extensions even when the database is not running.

  • VMware Greenplum 7 introduces the pgvector module, which provides vector similarity search capabilities for VMware Greenplum that enable searching, storing, and querying machine language-generated embeddings at large scale.

Changed Features

VMware Greenplum 7 includes the following changes. For a comprehensive look at the key changes between Greenplum 6 and Greenplum 7, see https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/install_guide-changes-6-7-landing-page.html.

External Tables

Greenplum 7 internally converts external tables to foreign tables. Refer to About Changes to External Tables in Greenplum 7 for more information.

Database Utilities
Note

This release of gpsupport does not support log collection for VMware Greenplum Command Center. Thus, the -with-gpcc option to the gpsupport gp_log_collector tool is not supported in this release.

  • The gpfdist parallel file distribution utility now supports multi-threaded data compression and transmission.

  • The gpfdist parallel file distribution utility includes these SSL-related features and changes:

    • A new option named --ssl_verify_peer <boolean> that you can specify to enable or disable gpfdist SSL certificate authentication.
    • When the verify_gpfdists_cert server configuration parameter is set to off, gpfdist no longer requires that the certificate authority file be present on the VMware Greenplum segments.
  • The gpscp utility has been renamed gpsync and now takes a new -a option, which causes gpsync to sync source and target directories in archival mode.

  • The pg_resetxlog utility has been renamed to pg_resetwal.

  • The command pg_dump includes options to include or exclude leaf tables of partitioned tables.

  • The pg_tables command no longer includes external tables in its output.

Additional Supplied Modules and Extensions
  • VMware Greenplum includes a new extension, postgresml, which provides several new user-defined functions that allow you to use tens of thousands of pre-trained open source AI/machine_learning models in VMware Greenplum.

  • Functionality from the gp_array_agg module in VMware Greenplum 6.x is now directly included in the Greenplum 7.x catalog; you do not need to install a separate module. Also, with version 7.x you can use both anynonarray and anyarray as input types (compared to only anynonarray in Greenplum 6.x).

  • gp_toolkit is an extension now.

  • The PostGIS extension supports PostGIS version 3.3.

  • The parallel retrieve cursor functionality available in the gp_parallel_retrieve_cursor module in Greenplum 6 is now built-in. See Retrieving Query Results with a Parallel Retrieve Cursor.

  • The pgvector extension was updated to version 0.5.0, which adds a new hnsw index type, adds parallel index builds for the ivfflat index type, adds the l1_distance function and sum aggregate, and adds element-wise multiplication for vectors. This version also improves performance for distance operations, and more. See pgvector.

SQL Commands
  • The CREATE TABLE ... (LIKE ... INCLUDING <keyword> ) command now supports AM, ENCODING, and RELOPT keywords to copy the access method, column encoding, and/or relation options from the source/original table.

  • Order-agnostic aggregates can now be designated as safe for execution on replicated slices by specifying the REPSAFE = true parameter to the CREATE AGGREGATE command.

  • You can now use the CLUSTER command on append-optimized tables over B-tree indexes.

  • The SELECT SQL command now supports the SKIP LOCKED option.

  • The new command IMPORT FOREIGN SCHEMA provides support for importing a complete schema from an external database.

  • The ALTER <object> DEPENDS ON EXTENSION command allows a database object to be marked as depending on an extension. The object will be dropped automatically when the extension is dropped, without needing to specify CASCADE.

  • The ALTER DEFAULT PRIVILEGES command allows you to set and revoke default permissions on schemas.

  • You may drop multiple functions, operators, and aggregates with a single DROP command.

  • The command CREATE SEQUENCE AS allows you to create a sequence matching an integer data type. This simplifies the creation of sequences matching the range of base columns.

  • Some DDL commands now accept the current user (CURRENT_USER) or the session user (SESSION_USER) in place of a specific user name.

  • You must specify FUNCTION instead of PROCEDURE in CREATE OPERATOR` as the referenced object must be a function and not a procedure. VMware Greenplum accepts the old syntax for compatibility.

  • The commands CREATE SERVER, CREATE MATERIALIZED VIEW, CREATE USER MAPPING, and CREATE COLLATION now accept the IF NOT EXISTS clause.

  • The new commands ALTER ROUTINE and DROP ROUTINE allow you to alter or drop of all routine-like objects, including procedures, functions, and aggregates.

  • The new command ALTER INDEX ATTACH PARTITION associates an existing index on a partition with a matching index template for its partitioned table.

  • The command ALTER INDEX can set statistics-gathering targets for expression indexes.

  • The command CREATE AGGREGATE now has the option OR REPLACE.

  • The command CREATE AGGREGATE now has the option FINALFUNC which specifies the behavior of the aggregate's finalization function; this is helpful for optimizing user-defined aggregate functions and allowing them to be specified as window functions.

  • The CREATE/ALTER USER ... PASSWORD commands no longer support the UNENCRYPTED option.

  • The command ALTER TABLE SET DISTRIBUTED BY may now be used for external tables. However, you must ensure that the contents of the external/foreign tables satisfies the DISTRIBUTED BY rules.

  • The CREATE TABLE command now supports specifying a table access method with the new USING <access method> clause.

  • The new command CREATE ACCESS METHOD command allows you to create new table types. This enables the development of new table access methods, which can optimize storage for different use cases. The existing heap access method remains the default.

  • You can now dynamically update the access method for a table with the ALTER TABLE command, using the new clause SET ACCESS METHOD <access_method>.

  • You can now dynamically update the following storage parameters for a table using the ALTER TABLE command: appendoptimized, blocksize, orientation, compresstype, compresslevel, and checksum.

  • Common table expressions (CTE) now support automatic (but overridable) inlining. CTEs are automatically inlined if they have no side-effects, are not recursive, and are referenced only once in the query. You can prevent inlining by specifying MATERIALIZED, or force inlining for multiple-referenced CTEs by specifying NOT MATERIALIZED. In previous Greenplum releases, CTEs were never inlined and were always evaluated before the rest of the query.

Database Maintenance
  • VACUUM operations now clean up any dead ranges from BRIN indexes on append-optimized tables.

  • VACUUM can now identify pages containing only already frozen tuples in the table's visibility map and skips these pages, hence reducing the cost of maintaining large tables which contain mostly unchanging data. The new VACUUM parameter DISABLE_PAGE_SKIPPING forces VACUUM to run against all frozen pages in case the contents of the visibility map are suspect, which should happen only if there is a hardware or software issue causing database corruption.

  • The new option SKIP_LOCKED allows VACUUM and ANALYZE to skip relations that cannot lock immediately due to conflicting locks.

  • The new option INDEX_CLEANUP allows VACUUM to skip index cleanup. Setting the option to false will make VACUUM run as quickly as possible, for example, to avoid imminent transaction ID wraparound.

  • VACUUM can now avoid unnecessary heap table truncation attempts that require taking an exclusive table lock even when no truncation is possible. This enhancement avoids unnecessary query cancellations on the standby servers.

  • The new configuration parameter vacuum_cleanup_index_scale_factor helps minimize unnecessary index scans during VACUUM.

  • The new table and partition storage parameter vacuum_index_cleanup lets you control whether, for a given table ot partition, VACUUM attempts to remove index entries pointing to dead tuples.

System Functions, Catalog Tables, Views, and Functions
  • Summary Views aggregate statistics across the Greenplum cluster which display the metrics reported by their corresponding gp_ view.

  • Progress reporting system views for Monitoring Long-Running Operations during the execution of the commands ANALYZE, CLUSTER, CREATE INDEX, VACUUM, COPY and BASE_BACKUP.

  • Progress reporting for Monitoring Long-Running Operations during the execution of the commands ANALYZE, CLUSTER, CREATE INDEX, VACUUM, COPY and BASE_BACKUP.

  • New function gp_toolkit.get_column_size(oid) and views gp_toolkit.gp_column_size, and gp_toolkit.gp_column_size_summary allow you to view column size and compression ratio for a given AO/AOCO table.

  • The pg_stat_* and pg_statio_* system views now provide information for append-optimized tables and their auxiliary tables.

  • The new catalog function pg_stat_get_backend_subxact() allows you to check all the sub-transactions in a specified backend.

  • The new catalog function gp_get_suboverflowed_backends() allows you to check all backends with overflowed sub-transactions.

  • VMware Greenplum now includes a system catalog table called pg_sequence, which contains information about sequences. Note that some information about sequences, such as the name and the schema, is stored in the pg_class system table.

  • The new gp_ system views are cluster-wide views that display from every primary segment the information reported by its corresponding pg_ system view.

  • The new catalog views pg_stat_wal and pg_stat_slru display WAL information and track simple least-recently-used (SLRU) caches.

  • The pg_backend_memory_contexts system view and supporting administration functions report memory contexts and usage for arbitrary backends. See Viewing and Logging Per-Process Memory Usage Information for more information.

  • VMware Greenplum 7 removes the following system catalog tables and views:

    • pg_partition_columns
    • pg_partition_encoding
    • pg_partition_rule
    • pg_partition_template
    • pg_stat_partition_operations
  • In the gp_configuration_history catalog table, the desc column has been renamed to description.

  • gp_read_error_log() is enhanced to detect division by zero, JSON mapping, and unsupported unicode errors encountered during foreign scans of external tables. This feature is not supported for a single segment VMware Greenplum cluster when the Greenplum Query Optimizer (GPORCA) is enabled.

  • The functions array_position() and array_positions() are now included.

  • Window functions now support all framing options shown in the SQL:2011 standard, including RANGE distance PRECEDING/FOLLOWING, GROUPS mode, and frame exclusion options enhancement.

  • The new function gp_toolkit.__gp_aoblkdir(regclass) helps you obtain each block directory entry for a given AO/AOCO table that had or has an index.

  • The pattern matching behaviour of the substring() function is changed. In cases where the pattern can be matched in more than one way, the initial sub-pattern is now treated as matching the least possible amount of text rather than the greatest. For example, a pattern such as %#"aa*#"% now selects the first group of a's from the input, not the last group.

Greenplum Query Optimizer (GPORCA)
  • The Greenplum Query Optimizer (GPORCA) supports index-only scans on append-optimized and append-optimized, column-oriented tables.

  • GPORCA supports backwards index scans.

  • GPORCA adds support for new server configuration parameters optimizer_enable_dynamicindexonlyscan and optimizer_enable_push_join_below_union_all.

  • GPORCA does not support planning or executing queries on multi-level partitioned tables.

  • GPORCA partially supports index-only scans and covering indexes. Refer to the GPORCA Limitations Unsupported SQL Query Features topic for a list of unsupported features in this area.

  • GPORCA now supports Dynamic Partition Elimination (DPE) for right joins

  • GPORCA now supports planning queries that involve foreign tables. Queries on foreign tables and queries on partitioned tables that include a foreign table or external table leaf partition can now be planned by GPORCA.

  • GPORCA now supports the CUBE grouping set result set.

  • GPORCA now supports planning and running queries that you specify with multiple grouping sets.

Append-Optimized Tables
  • Altering a column type for AO/CO tables requires only rewriting the column files for the specified column instead of the whole table.

  • VACUUM can now run against all auxiliary tables of an append-optimized table with the option AO_AUX_ONLY.

  • The catalog table pg_attribute_encoding now includes a new column filenum that helps improve efficiency when altering column type for AO/CO tables.

  • You may now fetch a subset of columns when using the command COPY TO from a AOCO table.

  • The table pg_appendonly no longer records append-only storage options, they are now only listed under pg_class.reloptions, which significantly reduces the size of pg_appendonly catalog table.

  • You may now dynamically update an AOCO table's column encodings, using the ALTER TABLE command.

  • You many now alter a heap table with a unique index to an append-optimized table with a unique index.

Performance
  • When the encoding of a table column changes (ALTER TABLE ... ALTER COLUMN ... SET ENCODING), VMware Greenplum rewrites only the column data, it no longer rewrites the table.

  • VMware Greenplum optimizes the use of snapshots when using immutable functions. It avoids taking a distributed snapshot and uses the local snapshot, resulting in improved performance for OLTP.

  • The sorting speed of varchar, text, and numeric fields via "abbreviated" keys has been improved.

  • Greenplum partitions the shared hash table freelist to reduce contention on multi-CPU-socket servers.

  • There are new performance improvements when using atomic operations, rather than a spinlock, to protect an LWLock's wait queue.

  • Greenplum reduces the WAL overhead when building a GiST, GIN, or SP-GiST index; less space on disk is now required for these WAL records and the data replays faster during crash recovery or point-in-time recovery.

  • You may optionally use the ICU library for collation support.

  • Partitioned tables now support indexes.

  • The default size of the sequence cache is changed from 1 (no cache) to 20 to increase the performance of insert operations on tables that are defined with a serial data type using a sequence value.

Removed Features

VMware VMware Greenplum 7.0 removes these features:

  • The previously-deprecated createlang and droplang utilities.
  • The Greenplum R Client (GreenplumR).
  • Greenplum MapReduce.
  • The PL/Container 3.0 Beta extension.
  • The analyzedb option --skip_root_stats.
  • The gpsys1 utility.
  • The gpperfmon data collection agents, database, and views.
  • The ARRAY_NAME variable.
  • The CREATEUSER/NOCREATEUSER options from CREATE ROLE and allied commands.
  • The gp_percentile_agg extension.
  • VMware VMware Greenplum 7 removes support for the QuickLZ compression algorithm. Use the new gp_quicklz_fallback server configuration parameter to ensure backward compatibility.

Deprecated Features

VMware VMware Greenplum 7 deprecates the following features:

  • gpreload utility is replaced by ALTER TABLE ... REPACK BY.

Known Issues and Limitations

VMware Greenplum 7.x has these limitations:

  • You currently cannot upgrade from a previous major version of Greenplum to Greenplum 7.

  • The UPSERT operation is not supported for append-optimized (AO/CO) tables.

  • You cannot use resource groups to manage and limit the total CPU and memory resources for a PL/Container runtime. Container instances are limited only by system resources, and the containers may consume resources at the expense of the VMware Greenplum server. Future releases of Greenplum 7 may restore functionality to manage PL/Container resources using resource groups.

  • Some VMware Greenplum utilities and extensions are not yet included or supported with this release, including: GreenplumPython, Spark Connector, Apache Nifi Connector, GemFire Connector, Command Center and metrics collector, WLM, cluster recovery, Greenplum upgrade, Greenplum cloud offerings. Linked documentation may still refer to utilities and extensions that are not yet included in this release.

  • When using the VMware Greenplum PostGIS extension, converting XML-like formats such as structs(GML, KML, MAR21) to a Geometry type can trigger the following error: “invalid KML/GML/MARC21 representation", even with valid inputs. As a workaround, use one of the two libxml2 versions listed below and then restart the database:

    • For EL8: libxml2-2.9.7-16.el8_8.1.x86_64
    • For EL9: libxml2-2.9.13-3.el9_2.1.x86_64
check-circle-line exclamation-circle-line close-line
Scroll to top icon