R Data Science Library Package

R packages are modules that contain R functions and data sets. Greenplum Database provides a collection of data science-related R libraries that can be used with the Greenplum Database PL/R language. You can download these libraries in .gppkg format from Broadcom Support Portal under the specific Greenplum release.

Note

For more information about download prerequisites, troubleshooting, and instructions, see Download Broadcom products and software.

This chapter contains the following information:

For information about the Greenplum Database PL/R Language, see Greenplum PL/R Language Extension.

Parent topic: Installing Optional Extensions

R Data Science Libraries

Libraries provided in the R Data Science package include:

  • abind

  • adabag

  • arm

  • assertthat

  • BH

  • bitops

  • car

  • caret

  • caTools

  • coda

  • colorspace

  • compHclust

  • curl

  • data.table

  • DBI

  • dichromat

  • digest

  • dplyr

  • e1071

  • flashClust

  • forecast

  • foreign

  • gdata

  • ggplot2

  • glmnet

  • gplots

  • gtable

  • gtools

  • hms

  • hybridHclust

  • igraph

  • labeling

  • lattice

  • lazyeval

  • lme4

  • lmtest

  • magrittr

  • MASS

  • Matrix

  • MCMCpack

  • minqa

  • MTS

  • munsell

  • neuralnet

  • nloptr

  • nnet

  • pbkrtest

  • plyr

  • quantreg

  • R2jags

  • R6

  • randomForest

  • RColorBrewer

  • Rcpp

  • RcppEigen

  • readr

  • reshape2

  • rjags

  • RobustRankAggreg

  • ROCR

  • rpart

  • RPostgreSQL

  • sandwich

  • scales

  • SparseM

  • stringi

  • stringr

  • survival

  • tibble

  • tseries

  • zoo

Installing the R Data Science Library Package

Before you install the R Data Science Library package, make sure that your Greenplum Database is running, you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY and $GPHOME environment variables are set.

  1. Locate the R Data Science library package that you built or downloaded.

    The file name format of the package is DataScienceR-<version>-relhel<N>-x86_64.gppkg.

  2. Copy the package to the Greenplum Database master host.

  3. Follow the instructions in Verifying the Greenplum Database Software Download to verify the integrity of the Greenplum Procedural Languages R Data Science Package software.

  4. Use the gppkg command to install the package. For example:

    $ gppkg -i DataScienceR-<version>-relhel<N>-x86_64.gppkg
    

    gppkg installs the R Data Science libraries on all nodes in your Greenplum Database cluster. The command also sets the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your greenplum_path.sh file.

  5. Restart Greenplum Database. You must re-source greenplum_path.sh before restarting your Greenplum cluster:

    $ source /usr/local/greenplum-db/greenplum_path.sh
    $ gpstop -r
    

The Greenplum Database R Data Science Modules are installed in the following directory:

$GPHOME/ext/DataScienceR/library

Note: rjags libraries are installed in the $GPHOME/ext/DataScienceR/extlib/lib directory. If you want to use rjags and your $GPHOME is not /usr/local/greenplum-db, you must perform additional configuration steps to create a symbolic link from $GPHOME to /usr/local/greenplum-db on each node in your Greenplum Database cluster. For example:

$ gpssh -f all_hosts -e 'ln -s $GPHOME /usr/local/greenplum-db'
$ gpssh -f all_hosts -e 'chown -h gpadmin /usr/local/greenplum-db'

Uninstalling the R Data Science Library Package

Use the gppkg utility to uninstall the R Data Science Library package. You must include the version number in the package name you provide to gppkg.

To determine your R Data Science Library package version number and remove this package:

$ gppkg -q --all | grep DataScienceR
DataScienceR-<version>
$ gppkg -r DataScienceR-<version>

The command removes the R Data Science libraries from your Greenplum Database cluster. It also removes the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your greenplum_path.sh file to their pre-installation values.

Re-source greenplum_path.sh and restart Greenplum Database after you remove the R Data Science Library package:

$ . /usr/local/greenplum-db/greenplum_path.sh
$ gpstop -r 

Note: When you uninstall the R Data Science Library package from your Greenplum Database cluster, any UDFs that you have created that use R libraries installed with this package will return an error.

check-circle-line exclamation-circle-line close-line
Scroll to top icon