Describes how to install the Greenplum Database software binaries on all of the hosts that will comprise your Greenplum Database system, how to enable passwordless SSH for the gpadmin
user, and how to verify the installation.
Perform the following tasks in order:
Parent topic: Installing and Upgrading Greenplum
You must install Greenplum Database on each host machine of the Greenplum Database system.
VMware distributes the Greenplum Database software as a downloadable package that you install on each host system with the operating system's package management system.
Open source Greenplum Database releases are available as: source code tarballs, RPM installers for CentOS, and DEB packages for Debian and Ubuntu. See https://greenplum.org for links to source code and instructions to compile Greenplum Database from source, and for links to download pre-built binaries in RPM and DEB format. For the Ubuntu operating system, Greenplum also offers a binary that can be installed via the apt-get
command with the Ubuntu Personal Package Archive system.
NoteFor more information about download prerequisites, troubleshooting, and instructions, see Download Broadcom products and software.
You can download the Greenplum Database Server software package from Broadcom Support Portal under the desired Greemplum release.
Be sure to note the name and the file system location of the downloaded file.
VMware generates a SHA2 fingerprint for each Greenplum Database software download available from Broadcom Support Portal. This fingerprint enables you to verify that your downloaded file is unaltered from the original.
Follow the instructions in Verifying the VMware Greenplum Software Download to verify the integrity of the Greenplum Database Server software.
Before you begin installing Greenplum Database, be sure you have completed the steps in Configuring Your Systems to configure each of the master, standby master, and segment host machines for Greenplum Database.
ImportantAfter installing Greenplum Database, you must set Greenplum Database environment variables. See Setting Greenplum Environment Variables.
See Example Ansible Playbook for an example script that shows how you can automate creating the gpadmin
user and installing the Greenplum Database.
Follow these instructions to install Greenplum Database from a pre-built binary.
ImportantYou require sudo or root user access to install from a pre-built RPM or DEB file.
Download and copy the Greenplum Database package to the gpadmin
user's home directory on the master, standby master, and every segment host machine. The distribution file name has the format greenplum-db-<version>-<platform>.rpm
for RHEL, CentOS, and Oracle Linux systems, or greenplum-db-<version>-<platform>.deb
for Ubuntu systems, where <platform>
is similar to rhel7-x86_64
(Red Hat 7 64-bit).
NoteFor Oracle Linux installations, download and install the
rhel7-x86_64
distribution files.
With sudo (or as root), install the Greenplum Database package on each host machine using your system's package manager software.
For RHEL/CentOS systems, run the yum
command:
$ sudo yum install ./greenplum-db-<version>-<platform>.rpm
For Ubuntu systems, run the apt
command:
$ sudo apt install ./greenplum-db-<version>-<platform>.deb
The yum
or apt
command automatically installs software dependencies, copies the Greenplum Database software files into a version-specific directory under /usr/local
, /usr/local/greenplum-db-<version>
, and creates the symbolic link /usr/local/greenplum-db
to the installation directory.
Change the owner and group of the installed files to gpadmin
:
$ sudo chown -R gpadmin:gpadmin /usr/local/greenplum*
$ sudo chgrp -R gpadmin /usr/local/greenplum*
On RHEL/CentOS systems, you can use the rpm
command with the --prefix
option to install Greenplum Database to a non-default directory (instead of under /usr/local
). Note, however, that using rpm
does not automatically install Greenplum Database dependencies; you must manually install dependencies to each host system.
Follow these instructions to install Greenplum Database to a specific directory.
ImportantYou require sudo or root user access to install from a pre-built RPM file.
Download and copy the Greenplum Database package to the gpadmin
user's home directory on the master, standby master, and every segment host machine. The distribution file name has the format greenplum-db-<version>-<platform>.rpm
for RHEL and CentOS systems, or greenplum-db-<version>-<platform>.deb
for Ubuntu systems, where <platform>
is similar to rhel7-x86_64
(Red Hat 7 64-bit).
Manually install the Greenplum Database dependencies to each host system:
$ sudo yum install apr apr-util bash bzip2 curl krb5 libcurl libevent \
libxml2 libyaml zlib openldap openssh openssl openssl-libs perl readline rsync R sed tar zip
Use rpm
with the --prefix
option to install the Greenplum Database package to your chosen installation directory on each host machine:
$ sudo rpm --install ./greenplum-db-<version>-<platform>.rpm --prefix=<directory>
The rpm
command copies the Greenplum Database software files into a version-specific directory under your chosen <directory>
, <directory>/greenplum-db-<version>
, and creates the symbolic link <directory>/greenplum-db
to the versioned directory.
Change the owner and group of the installed files to gpadmin
:
$ sudo chown -R gpadmin:gpadmin <directory>/greenplum*
NoteAll example procedures in the Greenplum Database documentation assume that you installed to the default directory, which is
/usr/local
. If you install to a non-default directory, substitute that directory for/usr/local
.
If you install to a non-default directory using rpm
, you will need to continue using rpm
(and of yum
) to perform minor version upgrades; these changes are covered in the upgrade documentation.
The gpadmin
user on each Greenplum host must be able to SSH from any host in the cluster to any other host in the cluster without entering a password or passphrase (called "passwordless SSH"). If you enable passwordless SSH from the master host to every other host in the cluster ("1-n passwordless SSH"), you can use the Greenplum Database gpssh-exkeys
command-line utility to enable passwordless SSH from every host to every other host ("n-n passwordless SSH").
Log in to the master host as the gpadmin
user.
Source the path
file in the Greenplum Database installation directory.
$ source /usr/local/greenplum-db-<version>/greenplum_path.sh
NoteAdd the above
source
command to thegpadmin
user's.bashrc
or other shell startup file so that the Greenplum Database path and environment variables are set whenever you log in asgpadmin
.
Use the ssh-copy-id
command to add the gpadmin
user's public key to the authorized_hosts
SSH file on every other host in the cluster.
$ ssh-copy-id smdw
$ ssh-copy-id sdw1
$ ssh-copy-id sdw2
$ ssh-copy-id sdw3
. . .
This enables 1-n passwordless SSH. You will be prompted to enter the gpadmin
user's password for each host. If you have the sshpass
command on your system, you can use a command like the following to avoid the prompt.
$ SSHPASS=<password> sshpass -e ssh-copy-id smdw
In the gpadmin
home directory, create a file named hostfile_exkeys
that has the machine configured host names and host addresses (interface names) for each host in your Greenplum system (master, standby master, and segment hosts). Make sure there are no blank lines or extra spaces. Check the /etc/hosts
file on your systems for the correct host names to use for your environment. For example, if you have a master, standby master, and three segment hosts with two unbonded network interfaces per host, your file would look something like this:
mdw
mdw-1
mdw-2
smdw
smdw-1
smdw-2
sdw1
sdw1-1
sdw1-2
sdw2
sdw2-1
sdw2-2
sdw3
sdw3-1
sdw3-2
Run the gpssh-exkeys
utility with your hostfile_exkeys
file to enable n-n passwordless SSH for the gpadmin
user.
$ gpssh-exkeys -f hostfile_exkeys
To make sure the Greenplum software was installed and configured correctly, run the following confirmation steps from your Greenplum master host. If necessary, correct any problems before continuing on to the next task.
Log in to the master host as gpadmin
:
$ su - gpadmin
Use the gpssh
utility to see if you can log in to all hosts without a password prompt, and to confirm that the Greenplum software was installed on all hosts. Use the hostfile_exkeys
file you used to set up passwordless SSH. For example:
$ gpssh -f hostfile_exkeys -e 'ls -l /usr/local/greenplum-db-<version>'
If the installation was successful, you should be able to log in to all hosts without a password prompt. All hosts should show that they have the same contents in their installation directories, and that the directories are owned by the gpadmin
user.
If you are prompted for a password, run the following command to redo the ssh key exchange:
$ gpssh-exkeys -f hostfile_exkeys
greenplum_path.sh
— This file contains the environment variables for Greenplum Database. See Setting Greenplum Environment Variables.gpinitsystem
configuration files and host files that can be modified and used when installing and initializing a Greenplum Database system.