This document contains release information for Tanzu Greenplum Text 3.x.
Tanzu Greenplum Text runs on Red Hat Enterprise Linux 5.2, 6.x, and 7.x.
Tanzu Greenplum Text runs with Greenplum Database version 4.3.6 or higher, Greenplum Database 5, or Greenplum Database 6.
Release Date: December 17, 2021
Tanzu Greenplum Text 3.8.1 is a maintenance release that introduces a change and resolve issues.
Tanzu Greenplum Text 3.8.1 bundles version 2.16.0 of the log4j2
library to mitigate CVE-2021-44228 and CVE-2021-45046.
log4j2
library to version 2.16.0.Release Date: November 17, 2021
Tanzu Greenplum Text 3.8.0 is a minor release that introduces new and changed features and resolves issues.
Tanzu Greenplum Text can use a temporary directory that you specify when it extracts or processes intermediate files during install, deploy, upgrade, downgrade, expand, and recover operations. You can set the -t <temp-dir>
option on these commands when disk space or permissions issues prevent the use of /tmp
for this purpose.
Tanzu Greenplum Text now supports highlighting keywords in the results of range and wildcard searches.
Tanzu Greenplum Text no longer requires gptext.enable_terms()
to highlight a field. Greenplum Text can now highlight fields that meet any of the following configuration conditions:
stored=true
).storeOffsetsWithPositions=true
and termVectors=true
).termVectors=true
, termPositions=true
, and termOffsets=true
). This configuration is equivalent to invoking gptext.enable_terms()
on the field.Tanzu Greenplum Text introduces a new disk-efficient highlighting function, gptext.highlight_instant_content()
. You can use this function to create an index without highlighting, and only highlight search results.
Tanzu Greenplum Text can now index documents that reside on S3-compatible storage such as MinIO.
value "nnn" is out of range for type integer
when it tried to create an index when the oid
of the source table was larger than int4
.Release Date: June 22, 2021
Tanzu Greenplum Text 3.7.0 is a minor release that introduces new features, changes features, and resolves issues.
Tanzu Greenplum Text 3.7.0 introduces the ability to downgrade the Tanzu Greenplum Text version via the installer. For details, see Downgrading to Version 3.6.0. In addition, it introduces a new utility -- gptext-downgrade
-- to be used to continue the downgrade process after errors are resolved. For details, see the gptext-downgrade
reference page.
The gptext-state
utility can now display the status of all Solr nodes, with the new nodes
parameter. For details, see the gptext-state
reference page.
The gptext-state
utility now returns two new statistics:
docs_capacity
-- reports how close a user is to reaching the maximum number of stored documents in an index
max_supported_docs
-- reports the maximum number of documents a user can store in an index
For details, see the gptext-state
reference page.
gptext-auth status
command to report whether password authentication is enabled. For details, see the gptext-auth
reference page.The gptext-upgrade
utility no longer includes the -f | --file
parameter because Tanzu Greenplum Text does not rely on the user to provide the upgrade file location.
(176715390) Resolved an issue where the gptext-auth change-password
command was prompting for password information that had already been provided.
(174164033) Resolved an issue where Tanzu Greenplum Text replica recovery could result in data loss.
(31047) Resolved an issue where gptext-recover -f |--force
could not recover a Solr node when GPTEXT_CUSTOM_CONFIG_DIR
is defined while installing.
GPText 3.6.0 is a minor release.
GPText 3.6.0 introduces user password authentication for the SolrCloud User Interface (UI). Users can now enable password authentication during a new installation, or use the new gptext-auth
utility to enable authentication for an existing cluster. For details, see Enabling User Authentication.
The gptext-rebalance
utility now includes the leader
option that rebalances the leaders across the shards of an index. For details, see Rebalancing Replicas and Replica Leaders.
During the GPText installation process users can now set the SolrCloud nodes' timezone. The timezone setting is reflected in the Solr log timestamps. For details, see Set Installation Parameters in the installation guide.
GPText 3.6.0 enables you to start or stop one single node or a group of SolrCloud cluster nodes. For details, see Starting or Stopping SolrCloud Nodes.
This release includes enhancements that enable future releases to downgrade to GPText version 3.6.0.
formdataUploadLimitInKB
to 10240000 KB, to avoid hitting limits when transferring records between nodes.GPText 3.5.0 is a minor release.
This release introduces the gptext-shard
utility which enables users to increase or delete a group of shards in an index. The number of shards in the group depends on the number of segments in the cluster. You can apply multiple shard increases, up to a recommended limit of 500 total shards per cluster. For more information see Altering shard number per index.
Improves the efficiency of fetching field values, and significantly reduces the response time of Solr.
In configurations with compositeID router, the GPText query option distrib.singlePass
now defaults to true
. This option reduces data transfer times when querying large data amounts across the Solr nodes.
GPText 3.4.5 is a maintenance release. It contains the following fixes and changes:
mdw.prod.vmware.com
would be trimmed to mdw
. This caused the GPText installation to fail in environments where the DNS server expected hostnames with the Fully Qualified Domain Name (FQDN).GPText 3.4.4 is a maintenance release. It contains the following fixes and changes:
compositeId
in GPText 3.1.0. To address this problem, GPText 3.4.4 restores the Solr default router to implicit
. To use the compositeId
router, you must specify a non-zero value for gptext.idx_num_shards
.timestamp without timezone
and timestamp with timezone
.GPText 3.4.3 is a maintenance release. It contains the following fixes:
The gptext-rebalance node
command, a Beta feature, is now deprecated. Use the gptext-rebalance index
command for similar results.
GPText now by default keeps up to 100 Solr logs on each Solr node.
Fixed an issue with upgrading GPText.
GPText 3.4.2 is a maintenance release. It contains the following fixes.
Support for Greenplum Database 6.5.0 and later. GPText 3.4.2 is not compatible with Greenplum Database versions earlier than 6.5.0 due to an ABI compatibility issue.
Fixes that allow GPText to support symbolic links.
GPText 3.4.1 is a maintenance release. It contains fixes for these issues.
gptext-migrator
failed to install the GPText shared library after upgrading Greenplum Database.
The gptext-state
utility could take a long time to return results if a replica was in recovery mode. This issue is resolved.
Solr now uses log4j 2.11. The log4j configuration file name is now log4j2.xml
instead of log4j.properties
.
The GPText installer will display an error message and exit if the operating system version and Greenplum Database version are not compatible with the installer version.
Setting the value of the GPTXTHOME
environment variable to a symlink to the GPText installation directory, caused GPText to fail in some cases. Upgrading GPText failed because the GPTXTHOME
environment variable was inconsistent with the value in the gptxtenvs.conf
file. This issue has been fixed. GPText now supports creating a symbolic link to the installation directory.
Upgraded Apache Solr to version 7.4.0.
(Beta) The new gptext-rebalance node
command rebalances the GPText cluster by relocating replicas to new nodes. Use this command after you add hosts or nodes to the cluster with gptext-expand
. See gptext-rebalance
in the Utility Reference for help using this utility. Note: gptext-rebalance node
may fail because of a Solr bug. See Known Issues. You can use the gptext-rebalance index
command to work around this issue.
The gptext-expand
command has a new -b
(--binary-only
) option that is used to copy only the GPText installation directories to the new hosts without starting Solr nodes on the new hosts. Using this option allows you to verify GPText operation with the expanded cluster. The -b
option can only be used with the -H
option and the new hosts must be Greenplum Database hosts.
The gptext-start
and gptext-stop
utilities now check for zombie Solr processes and report them if found. The gptext-stop
utility verifies that all Solr processes are closed.
Improved ZooKeeper stability.
When indexing external documents in a directory using gptext.index_external_dir()
, if one document failed to be added to the index, other documents in the same directory could fail. This is fixed. GPText now sends a request to Solr to get the files to be indexed and then indexes them individually with the gptext.index_external()
function.
A GPText query could time out when ZooKeeper was under heavy load, leaving the ZooKeeper connection handle in an invalid state in the Greenplum Database session. The query would fail with an error message invalid zhandle state
, and it was necessary to start a new session to continue using GPText. Now, after a ZooKeeper timeout, GPText retries the query ten times in the following five minutes before the query fails. The retry attempts are not visible to users, but they are logged. See also "Improved ZooKeeper stability" in the New Features and Enhancements section.
With a very large index, the number of documents could exceed the maximum value of the integer data type, causing the gptext.index_size()
function to return an "integer out of range" error. This has been fixed. The function now returns a bigint type.
GPText 3.3.1 can be installed on a Greenplum Database 6 system with Java 8.
A GPText binary distribution has been added to Tanzu Network for Red Hat 7/CentOS 7 with Greenplum Database 6.
Note: The "Greenplum Text 3.3.1 for RHEL 7" distribution is for Greenplum Database 6.x only. Download the RHEL 6 distribution if you are installing GPText into a Greenplum Database 5.x system.
Following are differences using GPText with Greenplum Database 6 than with earlier Greenplum Database releases:
The custom_variable_classes
server configuration parameter has been removed in Greenplum Database 6. With earlier Greenplum Database versions, it was necessary to add 'gptext'
to this parameter in order to set GPText configuration parameters. Greenplum Database 6 allows you to set configuration parameters in a database session without declaring a variable class.
In Greenplum Database 4 and 5, the default output format for the binary data type bytea
is the PostgreSQL escape format, a sequence of ASCII characters with escape sequences where bytes cannot be represented with ASCII. In Greenplum Database 6, the default output format is the hex format, which represents each byte with hexadecimal digits. In Greenplum Database 5, the hex output format can be specified by setting the bytea_output
configuration parameter to hex
. To produce the same output in Greenplum Database 4, 5, and 6, you can set the bytea_output
configuration parameter to escape
.
A new optional installation parameter, GPTEXT_CUSTOM_CONFIG_DIR
, can be set in the gptext_install_config
file to specify a directory to store custom configuration files.
By default, GPText saves custom configuration files under the $GPTEXTHOME/share/
directory on each Solr host, for example $GPTEXTHOME/share/external_
.
To specify a different directory to store external configuration files, before you run the GPText installer, uncomment the GPTEXT_CUSTOM_CONFIG_DIR
parameter in the gptext_install_config
file and specify the full path to the directory. For example:
GPTEXT_CUSTOM_CONFIG_DIR="/home/gpadmin/config_dir"
The gpadmin user must have the OS permissions required to create the directory.
If the parameter is set, the GPText installer will create the custom configuration directory on every Solr host. Configuration files you upload using the gptext-external upload
command will be stored under this directory on every Solr host to allow Solr to access the external document source from every host. For example if the GPTEXT_CUSTOM_CONFIG_DIR
parameter is set to /home/gpadmin/config_dir
when you install GPText, an s3 configuration with the name s3_conf
will be saved in the directory /home/gpadmin/config_dir/external_source/s3/s3_conf
on each host.
GPText 3.2.0 enables lemmatizing terms in GPText indexes. You can define Solr analysis chains that include the Apache OpenNLP parts-of-speech filter and the new GPText WordNetLemmatizer filter, which replaces terms with the root form of the term. The WordNetLemmatizer filter uses a lexical database from the Princeton University WordNet® project to determine the root form.
GPText now saves configuration files gptext.conf
, gptxtenvs.conf
, and zookeeper.conf
only in the Greenplum Database master and standby master directories. The gptext.conf
file is no longer saved in each segment data directory.
By default, GPText creates one Solr index shard for each Greenplum Database primary segment. You can now specify a smaller number of shards by setting the gptext.idx_num_shards
parameter to the number of shards you want before you create the index. This works for both regular GPText indexes and external indexes.
In GPtext 3.2.0, when gptext.idx_num_shards
is set to the default (0), GPText configures the index to use the Solr implicit
router, with one shard per Greenplum Database segment. When the gptext.idx_num_shards
parameter is changed to the number of shards desired, GPText creates the index using the Solr compositeId
router to route documents to shards. The compositeId
router does not support duplicate IDs, so if you set the if_check_id_uniqueness
argument to false when you call the gptext.create_index()
function the implicit
router is used, and the index will have one shard per Greenplum Database segment. Note: The Solr default router is restored to implicit
in GPText versions 3.4.4 and higher to address performance issues.
The content_id
column is removed from the output of the gptext.index_status()
and gptext.index_summary()
functions, since Greenplum Database segments are not always associated with a single index shard.
See Specifying the Number of Shards for more information about this feature.
When using the -f
(--force
) option, the gptext-recover
utility now verifies that there are no indexes in a red state before proceeding. If any index is down, the utility exits.
Apache ZooKeeper included with GPText 3.2.0 has been upgraded to version 3.4.11. This ZooKeeper release includes bug fixes that resolve an inconsistent cluster issue with GPText(MPP-29742).
The new gptext.list_field_types()
function lists the field types defined in the managed-schema
configuration file for an index.
The new gptext.get_field_type()
function displays the index and query analyzer chains for a field type in JSON format.
The new gptext.analyzer()
function shows the index or query analyzer chain output for a given field type and input text. This function is useful for testing and debugging analyzer chains interactively without modifying the index.
GPText includes OpenNLP libraries and analyzer classes to classify indexed terms' parts-of-speech (POS), and to recognize named entities, such as the names of persons, locations, and organizations (NER). GPText saves NER terms in the field's terms vector, prepended with a code to identify the type of entity recognized. This allows searching documents by entity type.
The new gptext.ner_terms()
function lists NER-tagged terms for documents that match a query.
GPText includes the OpenNLP models for the English language. You can download models for other languages from the OpenNLP web site and use them with GPText.
The first argument of the gptext.terms()
function, an anytable data type, has been made optional.
Fixed an error where the gptext.partition_status()
function displayed partition information for an index after it was dropped.
GPText 3.1.0 includes Apache Solr 7.3. See the following release documents for information about the Solr 7.3 release.
Following are GPText changes and Solr usage notes related to the Solr 7.3 upgrade.
GPText server-side components are rebuilt and tested with the new Solr JAR files.
The managed-schema
, solrconfig.xml
and other collection configuration files are updated.
The top-level <highlighting>
element in solrconfig.xml
is now officially deprecated in favor of the equivalent <searchComponent>
syntax. This element has been out of use in default Solr installations for several releases already.
The legacyCloud
parameter now defaults to false. If an entry for a replica does not exist in state.json
, that replica will not be registered. This may affect users who bring up replicas and they are automatically registered as a part of a shard. It is possible to revert to the old behavior by setting the property legacyCloud=true
in the cluster properties by running the following command in the GPText installation directory:
$ ./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:2181 -cmd clusterprop -name legacyCloud -val true
With earlier Solr releases, if you drop an index while a Solr node with a replica of the index is down, when the down node comes back on-line, the index comes back and cannot be deleted. Solr 7 fixes this bug. The GPText workaround for this bug is removed.
PointFields are default numeric types. Solr has implemented *PointField types across the board, to replace Trie* based numeric fields. All Trie* fields are now considered deprecated, and will be removed in Solr 8. If you are using Trie* fields in your schema, you should consider moving to PointFields as soon as feasible. Changing to the new PointField types will require you to re-index your data.
The following spatial-related fields have been deprecated: LatLonType GeoHashField FieldType SpatialTermQueryPrefixTreeFieldType Use one of these field types instead: LatLonPointSpatialField SpatialRecursivePrefixTreeField RptWithGeometrySpatialField
To improve parameter consistency in the Collections API, the parameter names fromNode
for the MOVEREPLICA command and source, and target
for the REPLACENODE command have been deprecated and replaced with sourceNode
and targetNode
instead. The old names will continue to work for backwards compatibility, but they will be removed in Solr 8.
The replica core name has changed from <collection_name>_shard#_replica#
to <collection_name>_shard#_replica_<node_type>#
. For example, demo.wikipedia.articles_shard0_replica1
becomes demo.wikipedia.articles_shard0_replica_n1
.
GPText 3.0.0 allows adding documents stored in Amazon Web Services S3 buckets to a GPText external index. This enhancement includes changes to enable uploading AWS credentials to ZooKeeper and support for the s3
document source type for the gptext.external_login()
, gptext.external_logout()
, gptext.index_external()
, and gptext.index_external_dir()
GPText functions.
The gptext-state
utility with the --index
(-i
) option now includes the date and time the GPText index was last modified.
See the Apache Jira for known issues in Apache Solr.
Following are known issues in GPText. Workarounds are provided when available.
[171486503] Solr 7.4 uses Log4j version 2.11. In Log4j 2.11 the configuration file name changed from log4j.properties
to log4j.xml
, but the file name was not changed in GPText 3.4.0. Due to this issue, no new lines are added to solr.log
.
This issue is fixed in GPText 3.4.1.
Workaround:
Upgrading to GPText 3.4.1 fixes the issue. If you are unable to upgrade to 3.4.1, you can follow these steps to manually fix the issue in your GPText 3.4.0 system:
Download the new log4j2.xml
configuration file from https://raw.githubusercontent.com/apache/lucene-solr/master/solr/server/resources/log4j2.xml.
Copy the log4j2.xml
file to every solr data directory, for example /data/gptext/solr0/
.
Update the startup parameters file solr.in.sh
in every solr data directory.
Change this line:
LOG4J_PROPS=/data/gptext/solr0/log4j.properties
to:
LOG4J_PROPS=/data/gptext/solr0/log4j2.xml
Restart GPText.
$ gptext-start -r
You will find the following files in the Solr log directory, for example /data/gptext/solr0/logs
:
Verify that messages are now written to the solr.log
file by executing a GPText operation such as gptext-state
.
(Fixed in GPText 3.4.1) When you upgrade Greenplum Database and then migrate your existing GPText installation to the new Greenplum Database installation, the gptext-migrator
utility in some cases fails to install the GPText UDF library to the new Greenplum Database $GPHOME/lib/postgresql
directory. gptext-migrator
outputs the message
[INFO]:-UDF libraries are installed in $GPTXTHOME/lib, don't need to migrate.
The message is correct only if you installed GPText binaries to a shared drive following the Optional Two-Part GPText Installation installation method.
Workaround:
Create a host file containing a list of all Greenplum Database hosts.
Make sure the gpadmin user has write permission in the $GPHOME/lib/postgresql
directory of the new Greenplum Database installation directory on every Greenplum Database host.
Use the gpscp
utility to copy the GPText UDF library from the old Greenplum Database installation to the new Greenplum Database installation.
$ gpscp -f hostfile /usr/local/greenplum-db-<old-version>/lib/postgresql/gptext*.so \
=:/usr/local/greenplum-db-<new-version>/lib/postgresql/
See Upgrading GPText for more inforaation.
The gptext-rebalance node
command (beta) may fail with a message ERROR: Utilize Node failed
due to a Solr bug. (See SOLR 13240.) You can use the gptext-rebalance index
command to work around the issue. NOTE: The gptext-rebalance node
command is deprecated.
Solr does not return all fields when the fl
Solr search option contains a wildcard that matches field names. For example, given a table with columns contenta
and contentb
, specifying fl=contenta,contentb,(sum,1,1)
correctly returns three fields. Specifying fl=cont*,sum(1,1)
correctly returns contenta
and contentb
, but omits the pseudo-field sum(1,1)
.
Specifying a wildcard to match all fields (fl=*,sum(1,1)
) also omits the pseudo-field.
If Solr fails to load an index because of a configuration file error, and then the index is dropped without first correcting the configuration file error, the index cannot be recreated until GPText is restarted. This can happen if you edit managed-schema
or solrconfig.xml
and introduce an XML syntax error or a typo in configuration values.
Workaround:
gptext-config
utility to edit the file and fix the error. Dropping the index without first correcting the error is not recommended.gptext-start -r
to restart GPText.When there is a large number of Solr cores, Solr Cloud can fail to restart successfully, with error messages indicating failure to elect leaders for shards. This is a known Solr issue; see https://issues.apache.org/jira/browse/SOLR-5990 in the Apache Solr Jira for an example. Because of this issue, it is recommended to avoid designing GPText applications that create large numbers of indexes, shards, and replicas. The number of cores you can create before you observe this behavior is hardware dependent, so you should test to determine your system's limits. You can create and successfully operate a larger numbers of indexes than can be restarted successfully later, so be sure to test restarting GPText to determine a practical limit.
In Greenplum Database versions before Greenplum Database 6, if the custom_variable_classes
Greenplum Database server configuration parameter does not include the value "gptext", attempting to set a GPText configuration parameter returns an error message, for example:
mydb-# set gptext.replication_factor = 4;
WARNING: Please logon again to make GUC setting take effect. (GucValue.h:301)
WARNING: Please logon again to make GUC setting take effect. (GucValue.h:301)
ERROR: unrecognized configuration parameter "gptext.replication_factor"
In GPText 2.0, in addition to the error message, the value of the configuration parameter persisted in ZooKeeper is zero, replacing the previous value of the parameter.
mydb-# show gptext.replication_factor;
gptext.replication_factor
----------------------------
0
Beginning with GPText 2.1, the error message is still generated, however the value saved in ZooKeeper is the value specified in the set
command, 4 in the preceding example.
To prevent the error message, before setting any GPText configuration parameters, use the gpconfig
command-line utility to set the custom_variable_classes
configuration parameter:
$ gpconfig -c custom_variable_classes -v 'gptext'
In Greenplum Database 6.0, the custom_variable_classes
configuration parameter is removed and custom parameters can be set without errors.