Prerequisites

The VMware Greenplum Text installation includes the installation of Apache Solr Cloud and, optionally, Apache ZooKeeper.

If you are installing a new VMware Greenplum Text release into an existing VMware Greenplum Text system, follow the instructions in Upgrading VMware Greenplum Text instead.

Following are VMware Greenplum Text installation prerequisites.

  • Install and configure your VMware Greenplum system, version 4.3.6 or higher. See Installing and Upgrading Greenplum.
  • VMware Greenplum Text runs on Red Hat Enterprise Linux or CentOS 5.x, 6.x, 7.x, or 8.x.
  • VMware Greenplum Text cannot be installed onto a shared NFS mount.
  • Install a JRE 1.8 on all hosts in the cluster.
  • Ensure that nc (netcat) is installed on all Greenplum cluster hosts (yum install nc).
  • Installing lsof on all cluster hosts is recommended (sudo yum install lsof).
  • VMware Greenplum Text nodes can be installed on the VMware Greenplum cluster hosts alongside the Greenplum segments or on additional, non-database hosts accessible on the Greenplum cluster network. All hosts participating in the VMware Greenplum Text system must have the same operating system and configuration and have passwordless-ssh access for the gpadmin user. See the VMware Greenplum Installation Guide for instructions to configure hosts.
  • If you plan to place VMware Greenplum Text nodes on the VMware Greenplum segment hosts, ensure that you reserve memory for VMware Greenplum Text use when you configure VMware Greenplum. To determine the memory to set aside for VMware Greenplum Text, multiply the number of VMware Greenplum Text nodes to create on each Greenplum segment host by the JVM maximum size. Subtract this memory from the physical RAM when calculating the value for the VMware Greenplum gp_vmem_protect_limit server configuration parameter. See the VMware Greenplum server configuration parameter gp_vmem_protect_limit in the VMware Greenplum reference documentation for recommended memory calculation formulas or visit the GPDB Virtual Memory Calculator web site.
  • Apache Solr requires a ZooKeeper cluster with at minimum three nodes. You can install a "binding" ZooKeeper cluster with VMware Greenplum Text on the Greenplum cluster hosts, or you can use an existing ZooKeeper cluster. When deployed alongside VMware Greenplum segments, ZooKeeper performance can be affected under heavy database load. For best performance, install a ZooKeeper cluster with at least three nodes (five nodes recommended) on separate hosts with network connectivity to the Greenplum network. See ZooKeeper Best Practices for more information about optimizing ZooKeeper performance.

Note: VMware Greenplum Text uses a temporary directory to extract or process intermediate files during installation and deployment. The default location is the /tmp directory. You can choose to specify an alternate directory if disk space or permissions issues in your environment prevent the use of /tmp for this purpose by providing the -t <temp-dir> option to the commands.

Install the VMware Greenplum Text Binary Distribution

  1. On the Greenplum master host, extract the VMware Greenplum Text distribution file. For example:

    $ cd /home/gpadmin
    $ tar xvfz greenplum-text-<version>-<platform>.tar.gz
    

    This creates the directory greenplum-text-<version>-<platform> containing the files: gptext_install_config and the VMware Greenplum Text installation binary, which has a name in the format greenplum-text-<version>-<platform>.bin.

  2. If necessary, grant execute permission to the VMware Greenplum Text binary. For example:

    $ chmod +x /home/gpadmin/greenplum-text-<version>-<platform>.bin
    
  3. If you are installing VMware Greenplum Text in a parent directory that is not writable by the gpadmin user, you must create the installation directories on each VMware Greenplum Text host machine and set ownership and permissions to allow the gpadmin user write access to the directories.

    For example, if you are installing VMware Greenplum Text in the default directory, /usr/local/greenplum-text-<version>, execute these commands on each host as root (or as gpadmin using sudo):

    mkdir /usr/local/greenplum-text-<version>
    mkdir /usr/local/greenplum-solr
    chown gpadmin:gpadmin /usr/local/greenplum-text-<version>
    chmod 775 /usr/local/greenplum-text-<version>
    chown gpadmin:gpadmin /usr/local/greenplum-solr
    chmod 775 /usr/local/greenplum-solr
    

    Note: You can use the VMware Greenplum gpssh command-line utility to execute these commands in parallel on all hosts if the gpadmin user has sudo privilege or if the root user has passwordless SSH access to all hosts. See the gpssh command reference in the VMware Greenplum utilities documentation for details.

    Complete the remaining steps as the gpadmin user.

  4. Edit the gptext_install_config file to set parameters for the installation. See Set Installation Parameters for details. Review the user authentication setup for the SolrCloud web user interface, using GPTEXT_ENABLE_USER_AUTH. Enabling user authentication after VMware Greenplum Text installation, and when the cluster is running, is a disruptive process.

  5. Run the VMware Greenplum Text installation binary as gpadmin on the master server:

    $ ./greenplum-text-<version>-<platform>.bin -c <gptext_install_config>
    
  6. Accept the license agreement and respond to the installer's prompts.

Optional Two-Part VMware Greenplum Text Installation

The VMware Greenplum Text two-part installation installs and deploys the VMware Greenplum Text software in separate steps. This gives you the option to install the software files to a read-only, shared directory mounted on all VMware Greenplum Text hosts in the cluster, rather than installing the software on every VMware Greenplum Text host.

If you install the VMware Greenplum Text software onto a shared drive, you must set the GPTEXT_CUSTOM_CONFIG_DIR parameter in the installation configuration file. This parameter specifies a writable directory that exists on every VMware Greenplum Text host where VMware Greenplum Text can store configuration files for external data sources. See VMware Greenplum Text installation parameters for more information about this parameter.

Run the VMware Greenplum Text installation in two parts by following the steps in this section.

  1. Prepare VMware Greenplum Text installation directories as described in steps 1 through 3 in Install the VMware Greenplum Text Binaries.

  2. Run the VMware Greenplum Text installation binary as gpadmin on the master server:

    $ ./greenplum-text-<version>.bin -b
    

    Note that the -c <gptext_install_config> option is omitted.

  3. Source the VMware Greenplum Text environment script in the VMware Greenplum Text installation directory:

    $ source <gptext-install-dir>/greenplum-text_path.sh
    
  4. Edit the gptext_install_config file to set parameters for the VMware Greenplum Text deployment. See Set Installation Parameters for details. Be sure to uncomment and set the GPTEXT_CUSTOM_CONFIG_DIR parameter if you installed the software on a read-only drive. Also review the user authentication setup for the SolrCloud web user interface, using GPTEXT_ENABLE_USER_AUTH. Enabling user authentication after VMware Greenplum Text installation, and when the cluster is running, is a disruptive process.

  5. Deploy the VMware Greenplum Text cluster with the gptext-deploy command. The command requires the -c option to specify the installation configuration file. Also include the -m option because you installed the VMware Greenplum Text software to a shared drive mounted on all VMware Greenplum Text hosts. If you do not include -m, gptext-deploy copies the VMware Greenplum Text software to all VMware Greenplum Text hosts.

    $ gptext-deploy -m -c <gptext_install_config>
    

Set Installation Parameters

A VMware Greenplum Text configuration file named gptext_install_config contains parameters to configure the VMware Greenplum Text installation. Edit the file and set the parameters as described in the following section.

Note: The GPTEXT_HOSTS and DATA_DIRECTORY installation parameters determine the number of VMware Greenplum Text nodes that are deployed.

The maximum number of VMware Greenplum Text nodes supported is 960. The best practice recommendation is to deploy fewer VMware Greenplum Text nodes with more memory rather than to divide the memory available to VMware Greenplum Text among a larger number of VMware Greenplum Text nodes. For example, if there are eight primary segments per host in the VMware Greenplum cluster, you should test with two or four VMware Greenplum Text nodes per host, adjusting the JAVA_OPTS installation parameter to divide the memory reserved for VMware Greenplum Text among them.

VMware Greenplum Text installation parameters

GPTEXT_HOSTS
An array of host names that determines the number of hosts on which to install VMware Greenplum Text. You may use the constant "ALLSEGHOSTS" to install VMware Greenplum Text on all VMware Greenplum segment hosts. VMware Greenplum Text hosts must be passwordless ssh-accessible by the gpadmin user from all other hosts in the Greenplum Cluster.
declare -a GPTEXT_HOSTS=(gptext_h1 gptext_h2 gptext_h3)
GPTEXT_HOSTS="ALLSEGHOSTS"
If you use the constant "ALLSEGHOSTS", the number of VMware Greenplum Text node hosts is the same as the number of Greenplum segment hosts. If GPTEXT_HOSTS is set to an array of host names, the length of the array is the number of VMware Greenplum Text node hosts.
DATA_DIRECTORY
An array of directory paths where VMware Greenplum Text data directories are to be created. The number of directories in the array determines the number of VMware Greenplum Text nodes that will be created on each physical host. If GPTEXT_HOSTS lists multiple interfaces per host, the VMware Greenplum Text nodes are spread evenly across the interface addresses.
declare -a DATA_DIRECTORY=(/data/primary /data/primary)
GPTEXT_CUSTOM_CONFIG_DIR
The path to a directory where VMware Greenplum Text stores uploaded external data source configuration files and custom libraries. If you do not set this parameter, the default is to store these files in the `share` subdirectory of the VMware Greenplum Text installation directory. If you do specify a directory with this parameter, the directory is created on every Solr host in the cluster, and external configuration files and custom libraries will be stored there, leaving the VMware Greenplum Text installation directory free from application data.
JAVA_OPTS
Sets the minimum and maximum memory each SolrCloud JVM can use.
JAVA_OPTS="-Xms1024M -Xmx2048M"
GPTEXT_ENABLE_USER_AUTH
Set this parameter to true to enable user authentication for the SolrCloud web user interface. The default user account is `solr`.
GPTEXT_ENABLE_USER_AUTH=True
GPTEXT_ADMIN_PWD
The password value for the SolrCloud web user interface. The password is applied to the default account `solr` when `GPTEXT_ENABLE_USER_AUTH=True`.
GPTEXT_ADMIN_PWD=mypassword
GPTEXT_ADMIN_USER
The value for the SolrCloud web user account. By default set to `solr`. Change this value to a user account of your preference. You may only specify a single SolrCloud web user account.
GPTEXT_ADMIN_USER=solr
GPTEXT_PORT_BASE
GP_MAX_PORT_LIMIT
Set a range of port numbers available to VMware Greenplum Text nodes. VMware Greenplum Text finds unused ports in the specified range.
GPTEXT_PORT_BASE=18983
GP_MAX_PORT_LIMIT=28983
SOLR_TIMEZONE
Sets the timezone for the Solr cluster. You may use three timezone formats:
1. GMT+offset, like SOLR_TIMEZONE="GMT+8"
2. GMT+/-long offset, like SOLR_TIMEZONE="GMT+0800".
3. TZ name, like SOLR_TIMEZONE="Asia/Shanghai". See [List of TZ database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) for a full list of the possible TZ name values.
SOLR_TIMEZONE="Asia/Tokyo"
If the timezone is not set, VMware Greenplum Text defaults to the timezone of the master host.
ZOO_CLUSTER
Whether to deploy a VMware Greenplum Text binding ZooKeeper cluster or use an existing ZooKeeper cluster. If set to "BINDING" the installation deploys a ZooKeeper cluster. To use an existing ZooKeeper cluster, set this parameter to a list of ZooKeeper nodes in the format "host1:port,host2:port,host3:port".
ZOO_CLUSTER="BINDING"
ZOO_HOSTS
If ZOO_CLUSTER is set to "BINDING", this parameter is an array of the hosts where the ZooKeeper nodes are to be installed. The array must contain 3, 5, or 7 host names, for example ZOO_HOSTS=(sdw1 sdw2 swd3 sdw4 sdw5). If you are using a single host for ZooKeeper, specify it multiple times, for example, ZOO_HOSTS=(sdw1 sdw1 sdw1).
declare -a ZOO_HOSTS=(sdw1 sdw2 sdw3 sdw4 sdw5)
ZOO_DATA_DIR
The ZooKeeper data directory, required when ZOO_CLUSTER is set to "BINDING".
ZOO_DATA_DIR="/data/master/"
ZOO_GPTXTNODE
The node path in ZooKeeper for VMware Greenplum Text. This parameter is required whether ZOO_CLUSTER is set to "BINDING" or a list of hosts.
ZOO_GPTXTNODE="gptext"
ZOO_PORT_BASE
ZOO_MAX_PORT_LIMIT
A range of port numbers to use for the ZooKeeper cluster. Unused ports are allocated from within this range. The range must contain at least 4000 port numbers.
ZOO_PORT_BASE=2188
ZOO_MAX_PORT_LIMIT=12188
GPTEXT_JAVA_HOME
The home directory of the Java installation to run for ZooKeeper and Solr processes. If not set, the JRE specified in the PATH and JAVA_HOME environment variables will be used.
GPTEXT_JAVA_HOME=/usr/java/jdk1.8.0_131

Starting VMware Greenplum Text

First, make sure the VMware Greenplum Text command-line utilities are in your path by sourcing the VMware Greenplum and VMware Greenplum Text environment scripts. It is important to source the VMware Greenplum Text environment script each time you source the VMware Greenplum script. For example:

$ source /usr/local/greenplum-db-<version>/greenplum_path.sh
$ source /usr/local/greenplum-text-<version>/greenplum-text_path.sh

To use VMware Greenplum Text in a database, you must first use the gptext-installsql management utility to install the VMware Greenplum Text user-defined functions and other objects in the database:

$ gptext-installsql database [database2 ... ]

The VMware Greenplum Text objects are created in the gptext schema.

The ZooKeeper cluster must be running before you start VMware Greenplum Text. If you installed a bound ZooKeeper cluster, start it with the zkManager command-line utility.

$ zkManager start

Start VMware Greenplum Text with the gptext-start utility.

$ gptext-start

Configure VMware Greenplum

VMware Greenplum Text configuration parameters are saved in ZooKeeper. You can, however, view and set VMware Greenplum Text configuration parameters in a VMware Greenplum session using the SHOW and SET commands.

If you are using VMware Greenplum 4.3.x or 5.x, you must first declare the VMware Greenplum Text custom variable class by adding it to the VMware Greenplum custom_variable_classes configuration parameter. The custom_variable_classes parameter is removed in VMware Greenplum 6, so this step is unnecessary if you have VMware Greenplum 6.

The custom_variable_classes configuration parameter is a comma-separated list of class names. It is unset by default. To see if any custom variable classes have already been configured, run this gpconfig command at the command line.

$ gpconfig -s custom_variable_classes

If no custom variable classes have been set, set the parameter with the following command.

$ gpconfig -c custom_variable_classes -v 'gptext'
[gpadmin@gpsne ~]$ gpconfig -c custom_variable_classes -v 'gptext'
20171029:12:29:11:028199 gpconfig:gpsne:gpadmin-[INFO]:-completed successfully

If other classes have been configured, add gptext to the existing list, separated by a comma.

Run gpstop -u to have VMware Greenplum reload the configuration file.

View or set VMware Greenplum Text Configuration Parameters

When you want to view or set VMware Greenplum Text configuration parameters in a psql session, first execute the gptext.version() function to load the VMware Greenplum Text configuration parameters into the session.

=#  SELECT gptext.version();
           version
--------------------------------
 Greenplum Text Analytics 3.2.0
(1 row)

=# SHOW gptext.idx_delim;
 gptext.idx_delim
------------------
 ,
(1 row)

See Setting VMware Greenplum Text Configuration Parameters for more about VMware Greenplum Text configuration parameters.

Uninstalling VMware Greenplum Text

To uninstall VMware Greenplum Text, run the gptext-uninstall utility. You must have superuser permissions on all databases with VMware Greenplum Text schemas to run gptext-uninstall.

gptext-uninstall runs only if there is at least one database with a VMware Greenplum Text schema.

Execute:

$ gptext-uninstall
check-circle-line exclamation-circle-line close-line
Scroll to top icon