Tanzu Greenplum text administration includes security considerations, monitoring Solr index statistics, managing whether indexes in a cluster are read-only or writeable, managing and monitoring ZooKeeper, and troubleshooting.

Viewing the Cluster Configuration

Tanzu Greenplum Text deploys Apache ZooKeeper and Apache Solr nodes on hosts in your VMware Greenplum network. Each node is a JVM server process listening for requests from other nodes. Use the gptext-state config command to list the host and port for each ZooKeeper and Solr node and the memory configuration for Solr nodes.

$ gptext-state configs
20181112:12:38:26:018080 gptext-state:mdw:gpadmin-[INFO]:-Execute GPText state ...
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Check zookeeper cluster state ...
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Cluster Configurations.
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:----------------------------------------------------------
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-JVM Min  |  Max    		 Xms1024M  |  Xmx2048M
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Node information
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:----------------------------------
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   Host   Node Name         Port    Solr Dir
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw1   sdw1_solr:18983   18983   /data/gptext/solr0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw1   sdw1_solr:18984   18984   /data/gptext/solr1
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw2   sdw2_solr:18983   18983   /data/gptext/solr0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw2   sdw2_solr:18984   18984   /data/gptext/solr1
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Zookeeper information
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:----------------------------------
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   Host   Port   Zookeeper Dir
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   mdw    2189   /data/zoo/zoo0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw2   2189   /data/zoo/zoo0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw1   2189   /data/zoo/zoo0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Done.

You don't need these details to use the Tanzu Greenplum text functions and utilities, but the information can be useful for monitoring and troubleshooting the cluster. For example, you can access the Solr Admin UI by browsing to the URL http://<hostname>:<port> on any Solr node. See Using the Solr Administration Interface for information about the Solr Admin UI.

Changing Tanzu Greenplum Text Server Configuration Parameters

Configuration parameters used with Tanzu Greenplum text are built-in to Tanzu Greenplum text with default values. You set new values for the parameters in a VMware Greenplum session using the SET command, the same way you set VMware Greenplum session parameters. When you enter the SET command Tanzu Greenplum text updates the value in ZooKeeper so that the change persists between database sessions.

Note: The custom_variable_classes configuration parameter is removed in VMware Greenplum 6. You can set custom variables in a database session without error, so this step is not needed for VMware Greenplum 6.

With VMware Greenplum 4.x and 5.x, a one-time VMware Greenplum configuration change is needed so that VMware Greenplum allows you to set and display Tanzu Greenplum text configuration parameters. Until you have performed this step, any attempt to set a Tanzu Greenplum text parameter results in an "Unrecognized configuration parameter" error. You must declare a custom variable class for Tanzu Greenplum text.

As the gpadmin user, enter the following commands in a shell:

$ gpconfig -c custom_variable_classes -v 'gptext'
$ gpstop -u

Once this step is completed, you can view and set Tanzu Greenplum text configuration parameters in psql.

To view Tanzu Greenplum text configuration parameters, you first need to fetch them from ZooKeeper into your VMware Greenplum session by executing the gptext.version() UDF.

=# SELECT gptext.version();
                       version
------------------------------------------------------
 Greenplum Text Analytics 3.2.0
(1 row)

Then you can use the SHOW command to display values of the parameters, for example:

=# SHOW gptext.idx_num_shards;
 gptext.idx_num_shards
-----------------------
 0
(1 row)

See VMware Tanzu Greenplum Text Configuration Parameters for a complete list of configuration parameters.

Tanzu Greenplum text uses the current values of the configuration parameters when you create a new index, so changing a configuration parameter affects new indexes, but does not affect existing indexes.

Change the values of Tanzu Greenplum text configuration variables using the SET command in a session with a database that contains the Tanzu Greenplum text schema. The following example sets values for three configuration parameters in a psql session:

=# set gptext.idx_buffer_size=10485760;
SET
=# set gptext.idx_delim='|';
SET
=# set gptext.extension_factor=5;
SET

You can view the new value of a configuration parameter that you have set using the SHOW command:

=# show gptext.idx_delim;
 gptext.idx_delim 
------------------
 |
(1 row)

Security and Tanzu Greenplum Text Indexes

Tanzu Greenplum text security is based on VMware Greenplum security. Your privileges to execute Tanzu Greenplum text functions depend on your privileges for the database table that is the source for the index. For example, if you have SELECT privileges for a table in the VMware Greenplum, then you have SELECT privileges for an index generated from that table.

Executing Tanzu Greenplum text functions requires one of OWNER, SELECT, INSERT, UPDATE, or DELETE privileges, depending on the function. The OWNER is the person who created the table and has all privileges. See the VMware Greenplum Administrator Guide for information about setting privileges.

Enabling User Authentication

The gptext-auth utility enables and deactivates user password authentication for a single user account for the SolrCloud cluster web user interface (UI).

To avoid disruption, enable SolrCloud web authentication during the Tanzu Greenplum text installation phase, by editing the gptext_install_config file. See Install the Tanzu Greenplum text Binary Distribution.

Note: Enabling authentication on a running cluster, changing the password, or deactivating authentication triggers a Tanzu Greenplum text cluster reboot.

The following options are available:

  • Enable password authentication

    $ gptext-auth enable-password --username <username> --password <password>
    

    or input the password on the terminal, similar to:

    $ gptext-auth enable-password --username <username>
    Please input password:
    

    The command asks for user input (y or n) before continuing. --username is optional and if not provided, the default user account is solr.

    Note

    Enabling authentication triggers a restart of the Tanzu Greenplum text cluster.

  • Deactivate password authentication

    $ gptext-auth disable-password 
    
    Note

    Deactivating authentication triggers a restart of the Tanzu Greenplum text cluster.

  • Change password

    $ gptext-auth change-password --old-password <oldpassword> --new-password <newpassword>
    

    or

    $ gptext-auth change-password 
    Please input old password:
    Please input new password:
    
    Note

    Changing the password triggers a restart of the Tanzu Greenplum text cluster.

See the gptext-auth reference page for more information about the command options.

ZooKeeper Administration

Apache ZooKeeper enables coordination between the Apache Solr and Tanzu Greenplum text distributed processes through a shared namespace that resembles a file system. In ZooKeeper, a node (called a znode) can contain data, like a file, and can have child znodes, like a directory. ZooKeeper replicates data between multiple instances deployed as a cluster to provide a highly available, fault-tolerant service. Both Solr and Tanzu Greenplum text store configuration files and share status by writing data to ZooKeeper znodes. Tanzu Greenplum text stores information in the /gptext znode. The configuration files for a Tanzu Greenplum text index are in the /gptext/configs/<index-name> znode.

The number of ZooKeeper instances in the cluster determines how many ZooKeeper node failures the cluster can tolerate and still remain active. The service remains available as long as a clear majority of the non-failed nodes are able to communicate with each other. To tolerate a failure of n nodes the cluster must have 2n+1 nodes. A cluster of five nodes, for example, can tolerate two failed nodes.

ZooKeeper is very fast for read requests because it stores data in memory. If ZooKeeper begins to swap memory to disk, Solr and Tanzu Greenplum text performance will decrease and could experience failures, so it is critical to allocate sufficient memory to the ZooKeeper Java processes. To avoid ZooKeeper instances competing with VMware Greenplum segments for memory, you should deploy the ZooKeeper instances and VMware Greenplum segments on different hosts. The ZooKeeper and VMware Greenplum hosts must be on the same network and accessible with passwordless SSH by the gpadmin user. You can use the VMware Greenplum gpssh-exkeys utility to share SSH keys between ZooKeeper and VMware Greenplum hosts.

You must start the ZooKeeper cluster before you start Tanzu Greenplum text. When you start Tanzu Greenplum text, the Solr nodes each load the replicas for indexes they manage. With large numbers of indexes, shards, and replicas, starting up the cluster can generate a very high, atypical load on ZooKeeper. It can take a long time to get all indexes loaded and some ZooKeeper requests may time out waiting for responses. Using the gptext-start --slow_start option starts Solr nodes one at a time, providing a more ordered start-up and limiting the number of concurrent ZooKeeper requests.

The Tanzu Greenplum text command-line utility zkManager can be used to monitor the ZooKeeper cluster. If the ZooKeeper cluster is bound to Tanzu Greenplum text, you can also start and stop the cluster using zkManager.

Checking ZooKeeper Status

Use the zkManager utility from the command line to check the ZooKeeper cluster status. The utility lists the hosts, ports, latency, and follower/leader mode for each ZooKeeper instance. If a node is down, its mode is listed as Down.

To check the ZooKeeper cluster status, run the zkManager state command.

$ zkManager state
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-Execute zookeeper state process.
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-Check zookeeper cluster state ...
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-   Host   port   Latency min/avg/max   Mode
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   2189   0/0/22                follower
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   2190   0/0/29                leader
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   2188   0/0/27                follower
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-Done.

In a database session, you can use the gptext.zookeeper_hosts() function to list the ZooKeeper hosts.

=# SELECT * FROM gptext.zookeeper_hosts();
  host  | port
--------+------
 gpdb51 | 2188
 gpdb51 | 2189
 gpdb51 | 2190
(3 rows)

Starting and Stopping the ZooKeeper Cluster

If the ZooKeeper cluster was installed by the Tanzu Greenplum text installer, the zkManager utility can start or stop the ZooKeeper cluster. To start the cluster, run the zkManager start command.

$ zkManager start
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-Execute zookeeper start process
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-Starting Zookeeper:
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-   Host   Zookeeper Dir
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo0
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo1
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo2
20171016:16:14:48:017845 zkManager:gpdb:gpadmin-[INFO]:-Check zookeeper cluster state ...
20171016:16:14:53:017845 zkManager:gpdb:gpadmin-[INFO]:-Done.

To stop ZooKeeper, run the zkManager stop command.

$ zkManager stop
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-Execute zookeeper stop process.
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-Stop Zookeeper:
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-   Host   Zookeeper Dir
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo0
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo1
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo2
20171016:16:14:09:016499 zkManager:gpdb:gpadmin-[INFO]:-Done.

See the zkManager reference for more information.

Checking SolrCloud Status

You can check the status of the SolrCloud cluster and indexes by running the gptext-state utility from the command line.

To check the state of the Tanzu Greenplum text nodes and each index, run the gptext-state utility with the -D (--details) option. Example:

$ gptext-state -D
20180615:16:09:24:031986 gptext-state:mdw:gpadmin-[INFO]:-Execute GPText state ...
20180615:16:09:25:031986 gptext-state:mdw:gpadmin-[INFO]:-Check zookeeper cluster state ...
20180615:16:09:25:031986 gptext-state:mdw:gpadmin-[INFO]:-Check GPText cluster status...
20180615:16:09:25:031986 gptext-state:mdw:gpadmin-[INFO]:-Current GPText Version: 3.0.0
20180615:16:09:25:031986 gptext-state:mdw:gpadmin-[INFO]:-All nodes are up and running.
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:------------------------------------------------
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-Index state details.
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:------------------------------------------------
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-   database   index name                state
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-   demo       demo.twitter.message      Green
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-   demo       demo.wikipedia.articles   Green
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-Done.

This command reports the status of the Tanzu Greenplum text nodes and status of each Tanzu Greenplum text index.

Run gptext-state list to view just the indexes.

The gptext-state healthcheck command checks the Tanzu Greenplum text configuration files, the index status, required disk space, user privileges, and index and database consistency. By default, the required disk space check passes if there is at least 20% disk free. You can set a different disk free threshold using the --disk_free option. For example:

[gpadmin@gpdb-sandbox ~]$ gptext-state healthcheck --disk_free=25
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Execute healthcheck on GPText cluster!
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText config files ...
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText index status ...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required disk space...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required user privileges...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for indexes and database consistency...
20160629:15:45:27:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:27:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.

See the gptext-state utility reference for additional options.

Starting or Stopping SolrCloud Nodes

From Tanzu Greenplum text 3.6.0 you may start and stop individual Solrcloud nodes, or a group of nodes.

To stop a Solrcloud node, run the gptext-stop command:

$ gptext-stop --nodes "mdw:18983_solr, sdw1:18983_solr"

Where:

  • -n|--nodes is a comma separated list of nodes to stop. The node name is specified in the format <host>:<port>_solr.

The gptext-stop command is interactive and requires y or n user input to continue, similar to:

$ gptext-stop -n "test-server3:18983_solr, test-server3:18984_solr"
20210120:03:34:36:010966 gptext-stop:test-server:gpadmin-[INFO]:-Execute GPText cluster stop.
20210120:03:34:36:010966 gptext-stop:test-server:gpadmin-[INFO]:-Check zookeeper cluster state ...
20210120:03:34:37:010966 gptext-stop:test-server:gpadmin-[WARNING]:-Stop some of the Solr nodes might make some indices turns into yellow/red state. Replica recovery is expected after the nodes are up, please make sure there is no new data indexing during the nodes restart.
Solr nodes will be stopped. Do you want to continue ? (y/n): y

To start a Solrcloud node, run the gptext-start command:

$ gptext-start --nodes "mdw:18983_solr, sdw1:18983_solr"

Where:

  • -n|--nodes is a comma separated list of nodes to start. The node name is specified in the format <host>:<port>_solr.

Recovering Tanzu Greenplum text Nodes

Use the gptext-recover utility to recover down Tanzu Greenplum text nodes, for example after a failed VMware Greenplum segment host is recovered.

With no arguments, the gptext-recover utility discovers down Tanzu Greenplum text nodes and restarts them.

With the -f (or --force) option, if a Tanzu Greenplum text node cannot be restarted and no shards are down, the node is deleted and created again on the same host. Missing replicas are added and the failed node and failed replicas are removed. If the index is in a red state gptext-recover -f will print a message and exit.

The -H (--new_hosts) option allows recreating down Tanzu Greenplum text nodes on new hosts that replace failed hosts. The down Tanzu Greenplum text nodes are deleted and recreated on the new hosts. The argument to the -H option is a comma-separated list of the new hosts that are to replace the failed hosts. The number of new hosts must match the number of failed hosts. If shards are down, it advises reindexing. If only some replicas are down, it recreates the replicas on the new hosts and updates gptext.conf.

The -r option recovers replicas, but does not attempt to recover any down nodes.

Note: Before recovering Tanzu Greenplum text nodes on newly added hosts, ensure that the following Tanzu Greenplum text prerequisites have been installed on the host:

  • Java 1.8
  • Python 2.6
  • The Linux lsof utility

Managing Index Read-Only Mode

In maintenance scenarios, you may want to block any updates to an index while still being able to search the index's contents. To this end, Greenplum Text allows you to set the indexes of your choice to be read-only, via the gptext-readonly utility. When an index is set to read-only, Greenplum Text blocks the processing of new updates and reloads the index. This guarantees that all in-flight updates and background merges are properly committed and finished. Once read-only mode is unset on these indexes, they resume accepting updates.

Viewing Solr Index Statistics

You can view Solr index statistics by running the gptext-state utility from the command line.

To list all Tanzu Greenplum text indexes, enter the following command at the command line:

gptext-state list

A command line that retrieves all statistics for an index:

gptext-state --index demo.wikipedia.articles

A command line that retrieves the number of documents in an index:

gptext-state --index demo.wikipedia.articles --stats_columns=num_docs

A command line that retrieves num_docs, index size, and the date and time last_modified:

gptext-state --index demo.wikipedia.articles --stats_columns num_docs,size,last_modified

Backing Up and Restoring Tanzu Greenplum Text Indexes

With the gptext-backup management utility, you can back up a Tanzu Greenplum text index so that, if needed, you can quickly recover from a failure. The backup can be restored to the same Tanzu Greenplum text system or to another system with the same number of VMware Greenplum segments.

The gptext-backup management utility backs up an index and its configuration files to either a shared file system, which must be mounted on and writable by each host in the VMware Greenplum cluster, or to local storage on the VMware Greenplum master and segment hosts.

Backing Up to a Shared File System

To back up on a shared file system, use the -p (--path) command-line option to specify the location of a directory on the mounted file system and the -n (--name) option to provide a name for the backup. Specify the index to backup with the -i (--index) option.

$ gptext-backup -i <index-name> -p <path> --n <backup-name>

The gptext-backup utility then checks that:

  • the Tanzu Greenplum text cluster is up
  • the shared file system is valid
  • the backup name specified with the -n option does not already exist in the directory specified with the -p option

The utility creates the new directory and then saves one copy of each index shard to that directory, along with the index's configuration files from ZooKeeper.

To save the configuration files only, with no data, add the -c (--backup_conf) command-line option.

To restore an index from a shared file system, use the gptext-restore management utility. The Tanzu Greenplum text system you restore to must be on a VMware Greenplum cluster with the same number of segments. The database and schema for the index must be present.

The -i (--index) option specifies the name of the Tanzu Greenplum text index that will be restored. If the index exists, you must first drop it with the gptext.drop_index() user-defined function.

The -p (--path) option specifies the location of the directory containing the backup files—the directory that gptext-backup created on the shared file system.

$ gptext-restore -i <index-name> -p <path>

You can add the -c option to restore only the configuration files to ZooKeeper and create an empty Tanzu Greenplum text index, without restoring any saved index data.

Backing Up to Local Storage

To back up to local storage on the VMware Greenplum cluster, add the local keyword to the gptext-backup command-line.

A local Tanzu Greenplum text backup has a unique name constructed by appending a timestamp to the index name. You do not use the -n option with local backups.

$ gptext-backup local -i <index-name>

On the master host, in the master data directory by default, the backup utility saves a JSON file with backup metadata and a directory containing the index's configuration files from ZooKeeper.

The utility backs up each index shard on the VMware Greenplum segment host with the Tanzu Greenplum text node that manages the shard's lead replica. By default, the shard backup files are saved in a segment data directory.

The gptext-backup command output reports the locations of all backup files.

You can add the -p (--path) option to the gptext-backup command to specify a local directory where the backup will be saved. The directory must be present on every VMware Greenplum host and must be writeable by the gpadmin user.

$ gptext-backup local -i <index-name> -p <path>

The backup files will be saved in the specified directory on each host instead of in the VMware Greenplum master and segment data directories.

To restore a backup saved to local storage, add the local keyword to the gptext-restore command-line and specify the path to the backup directory on the master host.

$ gptext-restore local -p <path>

The <path> is the full path to the directory the gptext-backup command created on the master host, including the timestamp, for example $MASTER_DATA_DIRECTORY/demo.twitter.message_2018-05-08T15:32:21.397779.

See the gptext-backup refernce for syntax and examples for running gptext-backup. See the gptext-restore reference for syntax and examples for running gptext-restore.

Expanding the Tanzu Greenplum Text Cluster

The gptext-expand management utility adds Tanzu Greenplum text nodes to the cluster. There are two ways to add nodes:

  • Add Tanzu Greenplum text nodes to existing hosts in the cluster. This option increases the number of Tanzu Greenplum text nodes on each host.
  • Add Tanzu Greenplum text nodes to new hosts added by using the VMware Greenplum gpexpand management utility to expand the VMware Greenplum system.

Adding Tanzu Greenplum text Nodes to Existing Segment Hosts

To add nodes to existing segment hosts, run the gptext-expand utility with a command like the following:

gptext-expand -e -p /data1/nodes,/data2/nodes

This example adds two Tanzu Greenplum text nodes to each host.

The -e (--existing) option specifies that nodes are to be added to existing hosts.

The -p (--expand_paths) option provides a list of directories where the new nodes' data directories are to be created. These should be the same directories that contain the VMware Greenplum segment data directories and existing Tanzu Greenplum text data directories. The number of directories in the list is the number of new nodes that are added.

A directory can be repeated in the directory list multiple times to increase the number of new Tanzu Greenplum text nodes to create. For example, if there is currently one Tanzu Greenplum text node per host in the /data1/nodes directory, you could add three nodes with a command like the following:

gptext-expand -e -p /data1/nodes,/data2/nodes,/data2/nodes

This adds one node to the /data1/nodes directory and two nodes to the /data2/nodes directory so there are two Tanzu Greenplum text nodes in each directory.

Adding Tanzu Greenplum text nodes affects new indexes, but not existing indexes. Replicas for new indexes will be distributed across all of the nodes, including both old nodes and the newly created nodes. Replicas for indexes that existed before running gptext-expand are not automatically moved. You can use the gptext-rebalance command to relocate replicas to new nodes.

Adding Tanzu Greenplum Text Nodes to New Hosts

Check that the following Tanzu Greenplum text prerequisites are installed on each new host added to the VMware Greenplum cluster:

  • Java 1.8
  • Python 2.6 or greater
  • Linux lsof utility

New hosts must be reachable by all hosts in the Tanzu Greenplum text cluster, including existing hosts and the new hosts you are adding.

After expanding the VMware Greenplum cluster with the gpexpand management utility, call gptext-expand with the -H (--new_hosts) option and a list of the new hosts on which to install Tanzu Greenplum text:

gptext-expand -H newhost1,newhost2

The gptext-expand utility installs Tanzu Greenplum text binaries on the new hosts and then creates new Tanzu Greenplum text nodes on the new hosts.

Newly created indexes will automatically be distributed among the new nodes. You can use the gptext-rebalance command to relocate replicas to new nodes.

Rebalancing Replicas and Replica Leaders

When expanding the Tanzu Greenplum text cluster with new indexes, rebalance the replicas to the new nodes, and rebalance the replica leaders.

Use gptext-rebalance index to rebalance the replicas for a specific index across all Tanzu Greenplum text nodes.

$ gptext-rebalance index -i demo.public.test 

See the gptext-rebalance reference for more details about the options and the rebalance rules.

When some SolrCloud cluster nodes have more replica leaders than other nodes, use the gptext-rebalance leader command to balance the leaders across the nodes.

To verify the state of the leaders in an index called demo.public.test, use a SQL command like:

SELECT index_name, core, node_name, is_leader 
FROM gptext.index_status()
WHERE index_name='demo.public.test';

The output is similar to:

    index_name     |                core                 |     node_name      | is_leader
-------------------+-------------------------------------+--------------------+-----------
 demo.public.test | demo.public.test_shard0_replica_n1 | gpadmin:18983_solr | t
 demo.public.test | demo.public.test_shard0_replica_n2 | gpadmin:18984_solr | f
 demo.public.test | demo.public.test_shard1_replica_n4 | gpadmin:18984_solr | f
 demo.public.test | demo.public.test_shard1_replica_n7 | gpadmin:18983_solr | t

In this example, node 18983_solr contains two replicas and node 18984_solr none. Rebalance the leaders across the nodes using:

$ gptext-rebalance leader -i demo.public.test

The leaders are spread across the nodes similar to:

    index_name     |                core                 |     node_name      | is_leader
-------------------+-------------------------------------+--------------------+-----------
 demo.public.test | demo.public.test_shard0_replica_n1 | gpadmin:18983_solr | f
 demo.public.test | demo.public.test_shard0_replica_n2 | gpadmin:18984_solr | t
 demo.public.test | demo.public.test_shard1_replica_n4 | gpadmin:18984_solr | f
 demo.public.test | demo.public.test_shard1_replica_n7 | gpadmin:18983_solr | t

Troubleshooting

Tanzu Greenplum text errors are of the following types:

  • Solr errors
  • gptext errors

Most of the Solr errors are self-explanatory.

gptext errors are caused by misuse of a function or utility. They provide a message that tells you when you have used an incorrect function or argument.

Monitoring Logs

You can examine the VMware Greenplum and Solr logs for more information if errors occur. VMware Greenplum logs reside in:

segment-directory/pg-log

Solr logs reside in:

<GPDB path>/solr/logs

Determining Segment Status with gptext-state

Use the gptext-state utility to determine if any primary or mirror segments are down. See gptext-state in the VMware Tanzu Greenplum Text Management Utilities Reference.

check-circle-line exclamation-circle-line close-line
Scroll to top icon