VMware Greenplum Text enables processing mass quantities of raw text data (such as social media feeds or e-mail databases) into mission-critical information that guides business and project decisions. VMware Greenplum Text joins the Greenplum Database massively parallel-processing database server with Apache SolrCloud enterprise search.
VMware Greenplum Text includes powerful text search as well as support for text analysis. Greenplum Text supports business decision making by offering:
Multiple kinds of data: Greenplum Text supports both semi-structured and unstructured data searches, which exponentially increases the kinds of information you can find.
Multiple document sources: Greenplum Text can index documents stored in Greenplum Database tables or documents retrieved from external stores, such as HTTP or FTP servers, Amazon S3 or other S3-compatible storage, or Hadoop hdfs. Most document formats are recognized automatically.
Less schema dependence: Greenplum Text does not require static schemas to successfully locate information; schemas can change or be quite simple and still return targeted results.
Natural language text processing: Greenplum Text provides NLP capabilities with the integrated Apache OpenNLP toolkit.
Text analytics: You can use Apache MADlib in Greenplum Database for advanced machine learning, graph, statistics and analytics in Greenplum Database.