Tanzu Greenplum Text enables processing mass quantities of raw text data (such as social media feeds or e-mail databases) into mission-critical information that guides business and project decisions. Tanzu Greenplum Text joins the Greenplum Database massively parallel-processing database server with Apache SolrCloud enterprise search.
Tanzu Greenplum Text includes powerful text search as well as support for text analysis. Tanzu Greenplum Text supports business decision making by offering:
Multiple kinds of data: Tanzu Greenplum Text supports both semi-structured and unstructured data searches, which exponentially increases the kinds of information you can find.
Multiple document sources: Tanzu Greenplum Text can index documents stored in Greenplum Database tables or documents retrieved from external stores, such as HTTP or FTP servers, Amazon S3 or other S3-compatible storage, or Hadoop hdfs. Most document formats are recognized automatically.
Less schema dependence: Tanzu Greenplum Text does not require static schemas to successfully locate information; schemas can change or be quite simple and still return targeted results.
Natural language text processing: Tanzu Greenplum Text provides NLP capabilities with the integrated Apache OpenNLP toolkit.
Text analytics: You can use Apache MADlib in Greenplum Database for advanced machine learning, graph, statistics and analytics in Greenplum Database.