VMware Greenplum Text administration includes security considerations, monitoring Solr index statistics, managing and monitoring ZooKeeper, and troubleshooting.

Viewing the Cluster Configuration

VMware Greenplum Text deploys Apache ZooKeeper and Apache Solr nodes on hosts in your VMware Greenplum network. Each node is a JVM server process listening for requests from other nodes. Use the gptext-state config command to list the host and port for each ZooKeeper and Solr node and the memory configuration for Solr nodes.

$ gptext-state configs
20181112:12:38:26:018080 gptext-state:mdw:gpadmin-[INFO]:-Execute GPText state ...
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Check zookeeper cluster state ...
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Cluster Configurations.
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:----------------------------------------------------------
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-JVM Min  |  Max    		 Xms1024M  |  Xmx2048M
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Node information
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:----------------------------------
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   Host   Node Name         Port    Solr Dir
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw1   sdw1_solr:18983   18983   /data/gptext/solr0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw1   sdw1_solr:18984   18984   /data/gptext/solr1
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw2   sdw2_solr:18983   18983   /data/gptext/solr0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw2   sdw2_solr:18984   18984   /data/gptext/solr1
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Zookeeper information
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:----------------------------------
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   Host   Port   Zookeeper Dir
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   mdw    2189   /data/zoo/zoo0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw2   2189   /data/zoo/zoo0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-   sdw1   2189   /data/zoo/zoo0
20181112:12:38:27:018080 gptext-state:mdw:gpadmin-[INFO]:-Done.

You don't need these details to use the VMware Greenplum Text functions and utilities, but the information can be useful for monitoring and troubleshooting the cluster. For example, you can access the Solr Admin UI by browsing to the URL http://<hostname>:<port> on any Solr node. See Using the Solr Administration Interface for information about the Solr Admin UI.

Changing VMware Greenplum Text Server Configuration Parameters

Configuration parameters used with VMware Greenplum Text are built-in to VMware Greenplum Text with default values. You set new values for the parameters in a VMware Greenplum session using the SET command, the same way you set VMware Greenplum session parameters. When you enter the SET command VMware Greenplum Text updates the value in ZooKeeper so that the change persists between database sessions.

Note: The custom_variable_classes configuration parameter is removed in VMware Greenplum 6. You can set custom variables in a database session without error, so this step is not needed for VMware Greenplum 6.

With VMware Greenplum 4.x and 5.x, a one-time VMware Greenplum configuration change is needed so that VMware Greenplum allows you to set and display VMware Greenplum Text configuration parameters. Until you have performed this step, any attempt to set a VMware Greenplum Text parameter results in an "Unrecognized configuration parameter" error. You must declare a custom variable class for VMware Greenplum Text.

As the gpadmin user, enter the following commands in a shell:

$ gpconfig -c custom_variable_classes -v 'gptext'
$ gpstop -u

Once this step is completed, you can view and set VMware Greenplum Text configuration parameters in psql.

To view VMware Greenplum Text configuration parameters, you first need to fetch them from ZooKeeper into your VMware Greenplum session by executing the gptext.version() UDF.

=# SELECT gptext.version();
                       version
------------------------------------------------------
 Greenplum Text Analytics 3.2.0
(1 row)

Then you can use the SHOW command to display values of the parameters, for example:

=# SHOW gptext.idx_num_shards;
 gptext.idx_num_shards
-----------------------
 0
(1 row)

See VMware Greenplum Text Configuration Parameters for a complete list of configuration parameters.

VMware Greenplum Text uses the current values of the configuration parameters when you create a new index, so changing a configuration parameter affects new indexes, but does not affect existing indexes.

Change the values of VMware Greenplum Text configuration variables using the SET command in a session with a database that contains the VMware Greenplum Text schema. The following example sets values for three configuration parameters in a psql session:

=# set gptext.idx_buffer_size=10485760;
SET
=# set gptext.idx_delim='|';
SET
=# set gptext.extension_factor=5;
SET

You can view the new value of a configuration parameter that you have set using the SHOW command:

=# show gptext.idx_delim;
 gptext.idx_delim 
------------------
 |
(1 row)

Security and VMware Greenplum Text Indexes

VMware Greenplum Text security is based on VMware Greenplum security. Your privileges to execute VMware Greenplum Text functions depend on your privileges for the database table that is the source for the index. For example, if you have SELECT privileges for a table in the VMware Greenplum, then you have SELECT privileges for an index generated from that table.

Executing VMware Greenplum Text functions requires one of OWNER, SELECT, INSERT, UPDATE, or DELETE privileges, depending on the function. The OWNER is the person who created the table and has all privileges. See the VMware Greenplum Administrator Guide for information about setting privileges.

Enabling User Authentication

The gptext-auth utility enables and deactivates user password authentication for a single user account for the SolrCloud cluster web user interface (UI).

To avoid disruption, enable SolrCloud web authentication during the VMware Greenplum Text installation phase, by editing the gptext_install_config file. See Install the VMware Greenplum Text Binary Distribution.

Note: Enabling authentication on a running cluster, changing the password, or deactivating authentication triggers a VMware Greenplum Text cluster reboot.

The following options are available:

  • Enable password authentication

    $ gptext-auth enable-password --username <username> --password <password>
    

    or input the password on the terminal, similar to:

    $ gptext-auth enable-password --username <username>
    Please input password:
    

    The command asks for user input (y or n) before continuing. --username is optional and if not provided, the default user account is solr.

    NOTE: Enabling authentication triggers a restart of the VMware Greenplum Text cluster.

  • Deactivate password authentication

    $ gptext-auth disable-password 
    

    NOTE: Deactivating authentication triggers a restart of the VMware Greenplum Text cluster.

  • Change password

    $ gptext-auth change-password --old-password <oldpassword> --new-password <newpassword>
    

    or

    $ gptext-auth change-password 
    Please input old password:
    Please input new password:
    

    NOTE: Changing the password triggers a restart of the VMware Greenplum Text cluster.

See the gptext-auth reference page for more information about the command options.

ZooKeeper Administration

Apache ZooKeeper enables coordination between the Apache Solr and VMware Greenplum Text distributed processes through a shared namespace that resembles a file system. In ZooKeeper, a node (called a znode) can contain data, like a file, and can have child znodes, like a directory. ZooKeeper replicates data between multiple instances deployed as a cluster to provide a highly available, fault-tolerant service. Both Solr and VMware Greenplum Text store configuration files and share status by writing data to ZooKeeper znodes. VMware Greenplum Text stores information in the /gptext znode. The configuration files for a VMware Greenplum Text index are in the /gptext/configs/<index-name> znode.

The number of ZooKeeper instances in the cluster determines how many ZooKeeper node failures the cluster can tolerate and still remain active. The service remains available as long as a clear majority of the non-failed nodes are able to communicate with each other. To tolerate a failure of n nodes the cluster must have 2n+1 nodes. A cluster of five nodes, for example, can tolerate two failed nodes.

ZooKeeper is very fast for read requests because it stores data in memory. If ZooKeeper begins to swap memory to disk, Solr and VMware Greenplum Text performance will decrease and could experience failures, so it is critical to allocate sufficient memory to the ZooKeeper Java processes. To avoid ZooKeeper instances competing with VMware Greenplum segments for memory, you should deploy the ZooKeeper instances and VMware Greenplum segments on different hosts. The ZooKeeper and VMware Greenplum hosts must be on the same network and accessible with passwordless SSH by the gpadmin user. You can use the VMware Greenplum gpssh-exkeys utility to share SSH keys between ZooKeeper and VMware Greenplum hosts.

You must start the ZooKeeper cluster before you start VMware Greenplum Text. When you start VMware Greenplum Text, the Solr nodes each load the replicas for indexes they manage. With large numbers of indexes, shards, and replicas, starting up the cluster can generate a very high, atypical load on ZooKeeper. It can take a long time to get all indexes loaded and some ZooKeeper requests may time out waiting for responses. Using the gptext-start --slow_start option starts Solr nodes one at a time, providing a more ordered start-up and limiting the number of concurrent ZooKeeper requests.

The VMware Greenplum Text command-line utility zkManager can be used to monitor the ZooKeeper cluster. If the ZooKeeper cluster is bound to VMware Greenplum Text, you can also start and stop the cluster using zkManager.

Checking ZooKeeper Status

Use the zkManager utility from the command line to check the ZooKeeper cluster status. The utility lists the hosts, ports, latency, and follower/leader mode for each ZooKeeper instance. If a node is down, its mode is listed as Down.

To check the ZooKeeper cluster status, run the zkManager state command.

$ zkManager state
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-Execute zookeeper state process.
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-Check zookeeper cluster state ...
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-   Host   port   Latency min/avg/max   Mode
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   2189   0/0/22                follower
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   2190   0/0/29                leader
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   2188   0/0/27                follower
20171016:12:59:47:026338 zkManager:gpdb:gpadmin-[INFO]:-Done.

In a database session, you can use the gptext.zookeeper_hosts() function to list the ZooKeeper hosts.

=# SELECT * FROM gptext.zookeeper_hosts();
  host  | port
--------+------
 gpdb51 | 2188
 gpdb51 | 2189
 gpdb51 | 2190
(3 rows)

Starting and Stopping the ZooKeeper Cluster

If the ZooKeeper cluster was installed by the VMware Greenplum Text installer, the zkManager utility can start or stop the ZooKeeper cluster. To start the cluster, run the zkManager start command.

$ zkManager start
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-Execute zookeeper start process
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-Starting Zookeeper:
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-   Host   Zookeeper Dir
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo0
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo1
20171016:16:14:46:017845 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo2
20171016:16:14:48:017845 zkManager:gpdb:gpadmin-[INFO]:-Check zookeeper cluster state ...
20171016:16:14:53:017845 zkManager:gpdb:gpadmin-[INFO]:-Done.

To stop ZooKeeper, run the zkManager stop command.

$ zkManager stop
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-Execute zookeeper stop process.
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-Stop Zookeeper:
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:------------------------------------------------
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-   Host   Zookeeper Dir
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo0
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo1
20171016:16:14:08:016499 zkManager:gpdb:gpadmin-[INFO]:-   gpdb   /data/master/zoo2
20171016:16:14:09:016499 zkManager:gpdb:gpadmin-[INFO]:-Done.

See the zkManager reference for more information.

Checking SolrCloud Status

You can check the status of the SolrCloud cluster and indexes by running the gptext-state utility from the command line.

To check the state of the VMware Greenplum Text nodes and each index, run the gptext-state utility with the -D (--details) option. Example:

$ gptext-state -D
20180615:16:09:24:031986 gptext-state:mdw:gpadmin-[INFO]:-Execute GPText state ...
20180615:16:09:25:031986 gptext-state:mdw:gpadmin-[INFO]:-Check zookeeper cluster state ...
20180615:16:09:25:031986 gptext-state:mdw:gpadmin-[INFO]:-Check GPText cluster status...
20180615:16:09:25:031986 gptext-state:mdw:gpadmin-[INFO]:-Current GPText Version: 3.0.0
20180615:16:09:25:031986 gptext-state:mdw:gpadmin-[INFO]:-All nodes are up and running.
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:------------------------------------------------
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-Index state details.
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:------------------------------------------------
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-   database   index name                state
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-   demo       demo.twitter.message      Green
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-   demo       demo.wikipedia.articles   Green
20180615:16:09:26:031986 gptext-state:mdw:gpadmin-[INFO]:-Done.

This command reports the status of the VMware Greenplum Text nodes and status of each VMware Greenplum Text index.

Run gptext-state list to view just the indexes.

The gptext-state healthcheck command checks the VMware Greenplum Text configuration files, the index status, required disk space, user privileges, and index and database consistency. By default, the required disk space check passes if there is at least 20% disk free. You can set a different disk free threshold using the --disk_free option. For example:

[gpadmin@gpdb-sandbox ~]$ gptext-state healthcheck --disk_free=25
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Execute healthcheck on GPText cluster!
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText config files ...
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:24:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Check GPText index status ...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required disk space...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for required user privileges...
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:25:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Checking for indexes and database consistency...
20160629:15:45:27:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-GOOD
20160629:15:45:27:669652 gptext-state:gpdb-sandbox:gpadmin-[INFO]:-Done.

See the gptext-state utility reference for additional options.

Starting or Stopping SolrCloud Nodes

From VMware Greenplum Text 3.6.0 you may start and stop individual Solrcloud nodes, or a group of nodes.

To stop a Solrcloud node, run the gptext-stop command:

$ gptext-stop --nodes "mdw:18983_solr, sdw1:18983_solr"

Where:

  • -n|--nodes is a comma separated list of nodes to stop. The node name is specified in the format <host>:<port>_solr.

The gptext-stop command is interactive and requires y or n user input to continue, similar to:

$ gptext-stop -n "test-server3:18983_solr, test-server3:18984_solr"
20210120:03:34:36:010966 gptext-stop:test-server:gpadmin-[INFO]:-Execute GPText cluster stop.
20210120:03:34:36:010966 gptext-stop:test-server:gpadmin-[INFO]:-Check zookeeper cluster state ...
20210120:03:34:37:010966 gptext-stop:test-server:gpadmin-[WARNING]:-Stop some of the Solr nodes might make some indices turns into yellow/red state. Replica recovery is expected after the nodes are up, please make sure there is no new data indexing during the nodes restart.
Solr nodes will be stopped. Do you want to continue ? (y/n): y

To start a Solrcloud node, run the gptext-start command:

$ gptext-start --nodes "mdw:18983_solr, sdw1:18983_solr"

Where:

  • -n|--nodes is a comma separated list of nodes to start. The node name is specified in the format <host>:<port>_solr.

Recovering VMware Greenplum Text Nodes

Use the gptext-recover utility to recover down VMware Greenplum Text nodes, for example after a failed VMware Greenplum segment host is recovered.

With no arguments, the gptext-recover utility discovers down VMware Greenplum Text nodes and restarts them.

With the -f (or --force) option, if a VMware Greenplum Text node cannot be restarted and no shards are down, the node is deleted and created again on the same host. Missing replicas are added and the failed node and failed replicas are removed. If the index is in a red state gptext-recover -f will print a message and exit.

The -H (--new_hosts) option allows recreating down VMware Greenplum Text nodes on new hosts that replace failed hosts. The down VMware Greenplum Text nodes are deleted and recreated on the new hosts. The argument to the -H option is a comma-separated list of the new hosts that are to replace the failed hosts. The number of new hosts must match the number of failed hosts. If shards are down, it advises reindexing. If only some replicas are down, it recreates the replicas on the new hosts and updates gptext.conf.

The -r option recovers replicas, but does not attempt to recover any down nodes.

Note: Before recovering VMware Greenplum Text nodes on newly added hosts, ensure that the following VMware Greenplum Text prerequisites have been installed on the host:

  • Java 1.8
  • Python 2.6
  • The Linux lsof utility

Viewing Solr Index Statistics

You can view Solr index statistics by running the gptext-state utility from the command line.

To list all VMware Greenplum Text indexes, enter the following command at the command line:

gptext-state list

A command line that retrieves all statistics for an index:

gptext-state --index demo.wikipedia.articles

A command line that retrieves the number of documents in an index:

gptext-state --index demo.wikipedia.articles --stats_columns=num_docs

A command line that retrieves num_docs, index size, and the date and time last_modified:

gptext-state --index demo.wikipedia.articles --stats_columns num_docs,size,last_modified

Backing Up and Restoring VMware Greenplum Text Indexes

With the gptext-backup management utility, you can back up a VMware Greenplum Text index so that, if needed, you can quickly recover from a failure. The backup can be restored to the same VMware Greenplum Text system or to another system with the same number of VMware Greenplum segments.

The gptext-backup management utility backs up an index and its configuration files to either a shared file system, which must be mounted on and writable by each host in the VMware Greenplum cluster, or to local storage on the VMware Greenplum master and segment hosts.

Backing Up to a Shared File System

To back up on a shared file system, use the -p (--path) command-line option to specify the location of a directory on the mounted file system and the -n (--name) option to provide a name for the backup. Specify the index to backup with the -i (--index) option.

$ gptext-backup -i <index-name> -p <path> --n <backup-name>

The gptext-backup utility then checks that:

  • the VMware Greenplum Text cluster is up
  • the shared file system is valid
  • the backup name specified with the -n option does not already exist in the directory specified with the -p option

The utility creates the new directory and then saves one copy of each index shard to that directory, along with the index's configuration files from ZooKeeper.

To save the configuration files only, with no data, add the -c (--backup_conf) command-line option.

To restore an index from a shared file system, use the gptext-restore management utility. The VMware Greenplum Text system you restore to must be on a VMware Greenplum cluster with the same number of segments. The database and schema for the index must be present.

The -i (--index) option specifies the name of the VMware Greenplum Text index that will be restored. If the index exists, you must first drop it with the gptext.drop_index() user-defined function.

The -p (--path) option specifies the location of the directory containing the backup files—the directory that gptext-backup created on the shared file system.

$ gptext-restore -i <index-name> -p <path>

You can add the -c option to restore only the configuration files to ZooKeeper and create an empty VMware Greenplum Text index, without restoring any saved index data.

Backing Up to Local Storage

To back up to local storage on the VMware Greenplum cluster, add the local keyword to the gptext-backup command-line.

A local VMware Greenplum Text backup has a unique name constructed by appending a timestamp to the index name. You do not use the -n option with local backups.

$ gptext-backup local -i <index-name>

On the master host, in the master data directory by default, the backup utility saves a JSON file with backup metadata and a directory containing the index's configuration files from ZooKeeper.

The utility backs up each index shard on the VMware Greenplum segment host with the VMware Greenplum Text node that manages the shard's lead replica. By default, the shard backup files are saved in a segment data directory.

The gptext-backup command output reports the locations of all backup files.

You can add the -p (--path) option to the gptext-backup command to specify a local directory where the backup will be saved. The directory must be present on every VMware Greenplum host and must be writeable by the gpadmin user.

$ gptext-backup local -i <index-name> -p <path>

The backup files will be saved in the specified directory on each host instead of in the VMware Greenplum master and segment data directories.

To restore a backup saved to local storage, add the local keyword to the gptext-restore command-line and specify the path to the backup directory on the master host.

$ gptext-restore local -p <path>

The <path> is the full path to the directory the gptext-backup command created on the master host, including the timestamp, for example $MASTER_DATA_DIRECTORY/demo.twitter.message_2018-05-08T15:32:21.397779.

See the gptext-backup refernce for syntax and examples for running gptext-backup. See the gptext-restore reference for syntax and examples for running gptext-restore.

Expanding the VMware Greenplum Text Cluster

The gptext-expand management utility adds VMware Greenplum Text nodes to the cluster. There are two ways to add nodes:

  • Add VMware Greenplum Text nodes to existing hosts in the cluster. This option increases the number of VMware Greenplum Text nodes on each host.
  • Add VMware Greenplum Text nodes to new hosts added by using the VMware Greenplum gpexpand management utility to expand the VMware Greenplum system.

Adding VMware Greenplum Text Nodes to Existing Segment Hosts

To add nodes to existing segment hosts, run the gptext-expand utility with a command like the following:

gptext-expand -e -p /data1/nodes,/data2/nodes

This example adds two VMware Greenplum Text nodes to each host.

The -e (--existing) option specifies that nodes are to be added to existing hosts.

The -p (--expand_paths) option provides a list of directories where the new nodes' data directories are to be created. These should be the same directories that contain the VMware Greenplum segment data directories and existing VMware Greenplum Text data directories. The number of directories in the list is the number of new nodes that are added.

A directory can be repeated in the directory list multiple times to increase the number of new VMware Greenplum Text nodes to create. For example, if there is currently one VMware Greenplum Text node per host in the /data1/nodes directory, you could add three nodes with a command like the following:

gptext-expand -e -p /data1/nodes,/data2/nodes,/data2/nodes

This adds one node to the /data1/nodes directory and two nodes to the /data2/nodes directory so there are two VMware Greenplum Text nodes in each directory.

Adding VMware Greenplum Text nodes affects new indexes, but not existing indexes. Replicas for new indexes will be distributed across all of the nodes, including both old nodes and the newly created nodes. Replicas for indexes that existed before running gptext-expand are not automatically moved. You can use the gptext-rebalance command to relocate replicas to new nodes.

Adding VMware Greenplum Text Nodes to New Hosts

Check that the following VMware Greenplum Text prerequisites are installed on each new host added to the VMware Greenplum cluster:

  • Java 1.8
  • Python 2.6 or greater
  • Linux lsof utility

New hosts must be reachable by all hosts in the VMware Greenplum Text cluster, including existing hosts and the new hosts you are adding.

After expanding the VMware Greenplum cluster with the gpexpand management utility, call gptext-expand with the -H (--new_hosts) option and a list of the new hosts on which to install VMware Greenplum Text:

gptext-expand -H newhost1,newhost2

The gptext-expand utility installs VMware Greenplum Text binaries on the new hosts and then creates new VMware Greenplum Text nodes on the new hosts.

Newly created indexes will automatically be distributed among the new nodes. You can use the gptext-rebalance command to relocate replicas to new nodes.

Rebalancing Replicas and Replica Leaders

When expanding the VMware Greenplum Text cluster with new indexes, rebalance the replicas to the new nodes, and rebalance the replica leaders.

Use gptext-rebalance index to rebalance the replicas for a specific index across all VMware Greenplum Text nodes.

$ gptext-rebalance index -i demo.public.test 

See the gptext-rebalance reference for more details about the options and the rebalance rules.

When some SolrCloud cluster nodes have more replica leaders than other nodes, use the gptext-rebalance leader command to balance the leaders across the nodes.

To verify the state of the leaders in an index called demo.public.test, use a SQL command like:

SELECT index_name, core, node_name, is_leader 
FROM gptext.index_status()
WHERE index_name='demo.public.test';

The output is similar to:

    index_name     |                core                 |     node_name      | is_leader
-------------------+-------------------------------------+--------------------+-----------
 demo.public.test | demo.public.test_shard0_replica_n1 | gpadmin:18983_solr | t
 demo.public.test | demo.public.test_shard0_replica_n2 | gpadmin:18984_solr | f
 demo.public.test | demo.public.test_shard1_replica_n4 | gpadmin:18984_solr | f
 demo.public.test | demo.public.test_shard1_replica_n7 | gpadmin:18983_solr | t

In this example, node 18983_solr contains two replicas and node 18984_solr none. Rebalance the leaders across the nodes using:

$ gptext-rebalance leader -i demo.public.test

The leaders are spread across the nodes similar to:

    index_name     |                core                 |     node_name      | is_leader
-------------------+-------------------------------------+--------------------+-----------
 demo.public.test | demo.public.test_shard0_replica_n1 | gpadmin:18983_solr | f
 demo.public.test | demo.public.test_shard0_replica_n2 | gpadmin:18984_solr | t
 demo.public.test | demo.public.test_shard1_replica_n4 | gpadmin:18984_solr | f
 demo.public.test | demo.public.test_shard1_replica_n7 | gpadmin:18983_solr | t

Troubleshooting

VMware Greenplum Text errors are of the following types:

  • Solr errors
  • gptext errors

Most of the Solr errors are self-explanatory.

gptext errors are caused by misuse of a function or utility. They provide a message that tells you when you have used an incorrect function or argument.

Monitoring Logs

You can examine the VMware Greenplum and Solr logs for more information if errors occur. VMware Greenplum logs reside in:

segment-directory/pg-log

Solr logs reside in:

<GPDB path>/solr/logs

Determining Segment Status with gptext-state

Use the gptext-state utility to determine if any primary or mirror segments are down. See gptext-state in the VMware Greenplum Text Management Utilities Reference.

check-circle-line exclamation-circle-line close-line
Scroll to top icon