VMware Tanzu Greenplum Text Documentation

VMware Tanzu Greenplum text enables processing mass quantities of raw text data (such as social media feeds or e-mail databases) into mission-critical information that guides business and project decisions. Tanzu Greenplum text joins the Tanzu Greenplum Database massively parallel-processing database server with Apache SolrCloud enterprise search.

Tanzu Greenplum text includes powerful text search as well as support for text analysis. Greenplum Text supports business decision making by offering:

Multiple kinds of data: Tanzu Greenplum text supports both semi-structured and unstructured data searches, which exponentially increases the kinds of information you can find.
Multiple document sources: Tanzu Greenplum text can index documents stored in Tanzu Greenplum Database tables or documents retrieved from external stores, such as HTTP or FTP servers, Amazon S3 or other S3-compatible storage, or Hadoop hdfs. Most document formats are recognized automatically.
Less schema dependence: Tanzu Greenplum text does not require static schemas to successfully locate information; schemas can change or be quite simple and still return targeted results.
Natural language text processing: Tanzu Greenplum text provides NLP capabilities with the integrated Apache OpenNLP toolkit.
Text analytics: You can use Apache MADlib in Tanzu Greenplum Database for advanced machine learning, graph, statistics and analytics in Tanzu Greenplum Database.