This topic describes how VMware GemFire integrates with VMware GemFire Search.
VMware GemFire Search is a search engine that provides indexing and searching capabilities when used with VMware GemFire. GemFire Search is built on the widely-used Java full-text search engine Apache Lucene® version 9. GemFire Search query and index definitions use the Lucene name in syntax and APIs.
This topic requires you to have some familiarity with Apache Lucene’s indexing and search capabilities. For more information about Apache Lucene, see the Apache Lucene website.
VMware GemFire Search integration:
For more details, see the GemFire Search Javadocs for the classes and interfaces that implement Apache Lucene indexes and searches, including the following:
LuceneService
LuceneSerializer
LuceneIndexFactory
LuceneQuery
LuceneQueryFactory
LuceneQueryProvider
LuceneResultStruct
Minimum JDK version required: JDK11
Join queries between regions are not supported.
Queries on multiple indexes within one region are not supported. You can create multiple indexes in a region, but each query must only use one index.
GemFire Search indexes are stored in on-heap memory only.
GemFire Search queries from within transactions are not supported. On an attempt to query from within a transaction, a LuceneQueryException
is thrown, issuing an error message on the client (accessor) similar to the following:
Exception in thread "main" org.apache.geode.cache.lucene.LuceneQueryException:
Lucene Query cannot be executed within a transaction
...
GemFire Search does not allow mixed type fields. GemFire Search does not allow a field’s data to consist of both numeric and String field types. For example, if an index on the field SSN has the following entries:
Object_1 object_1
has String SSN = “1111”Object_2 object_2
has Integer SSN = 1111Object_3 object_3
has Float SSN = 1111.0Then during indexing it will throw exception, because numeric value can not mix with String value for the same field.
The “@” symbol must be escaped. GemFire Search uses the “@” character as a minimum-should-match
operator. When querying for email using KeywordAnalyzer
, you must escape the “@” character. For example, "field1:john\@example.com"
.
The order of server creation with respect to index and region creation is important. The cluster configuration service cannot work if servers are created after index creation, but before region creation, because GemFire Search indexes are propagated to the cluster configuration after region creation. To start servers at multiple points within the start-up process, use this ordering:
An invalidate operation on a region entry does not invalidate a corresponding GemFire Search index entry. A query on a GemFire Search index that contains values that have been invalidated can return results that no longer exist. Therefore, do not combine entry invalidation with queries on GemFire Search indexes.
GemFire Search indexes are not supported for regions that have eviction configured with a local destroy. Eviction can be configured with overflow to disk, but only the region data is overflowed to disk, not the GemFire Search index. On an attempt to create a region with eviction configured to do local destroy (with a GemFire Search index), an UnsupportedOperationException
is thrown, issuing an error message similar to the following:
[error 2017/05/02 16:12:32.461 PDT <main> tid=0x1]
java.lang.UnsupportedOperationException:
Exception in thread "main" java.lang.UnsupportedOperationException:
Lucene indexes on regions with eviction and action local destroy are not supported
...
Backups should only be made for regions with GemFire Search indexes when there are no puts
, updates
, or deletes
in progress. A backup might cause an inconsistency between region data and a GemFire Search index. Both the region operation and the associated index operation cause disk operations, but these disk operations are not done atomically. Therefore, if a backup is taken between the persisted write to a region and the resulting persisted write to the GemFire Search index, then the backup represents inconsistent data in the region and GemFire Search index.
You can install VMware GemFire Search on a GemFire Server or on a GemFire client.
To install VMware GemFire Search on a GemFire server:
Click I agree to Terms and Conditions. Click the HTTPS Download icon next to VMware GemFire Search. This downloads the VMware GemFire Search .gfm file.
Do one of the following:
Set the GEMFIRE_EXTENSIONS_REPOSITORY_PATH
environment variable to the VMware GemFire Search extension path. For example, if your vmware-gemfire-search-VERSION.gfm
file is located in /gemfire-extensions, run the following command:
export GEMFIRE_EXTENSIONS_REPOSITORY_PATH=/gemfire-extensions
Copy the downloaded file to the extensions directory of your GemFire installation. By default, the extensions folder is the vmware-gemfire-XXX/extensions
directory of your GemFire installation.
Start the locator:
gfsh>start locator --name locator1
Start the server:
gfsh>start server --name server1
You can use Maven to install VMware GemFire Search on a GemFire client.
To use VMware GemFire Search on a GemFire client, you must add the appropriate dependencies.
Add the following to the pom.xml
file:
<repositories>
<repository>
<id>gemfire-release-repo</id>
<name>GemFire Release Repository</name>
<url>https://packages.broadcom.com/artifactory/gemfire/</url>
</repository>
</repositories>
To access the artifacts, add an entry to your .m2/settings.xml
file:
<settings>
<servers>
<server>
<id>gemfire-release-repo</id>
<username>EXAMPLE-USERNAME</username>
<password>MY-PASSWORD</password>
</server>
</servers>
</settings>
Where:
EXAMPLE-USERNAME
is your support.broadcom.com user name.MY-PASSWORD
is the Access Token you copied in step 3 in Prerequisites.Add the dependencies to the project by adding the following to your pom.xml
file.
<dependencies>
<dependency>
<groupId>com.vmware.gemfire</groupId>
<artifactId>gemfire-search</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
Add the following to the build.gradle
file:
repositories {
maven {
credentials {
username "$gemfireRepoUsername"
password "$gemfireRepoPassword"
}
url = uri("https://packages.broadcom.com/artifactory/gemfire/")
}
}
Add the following to the local .gradle/gradle.properties
or project gradle.properties
file:
gemfireRepoUsername=MY-USERNAME
gemfireRepoPassword=MY-PASSWORD
Where:
EXAMPLE-USERNAME
is your support.broadcom.com username.MY-PASSWORD
is the Access Token you copied in step 3 in Prerequisites.Add the dependencies to the project by adding the following to your build.gradle
file.
dependencies {
implementation "com.vmware.gemfire:gemfire-search:1.1.0"
}
To upgrade from VMware GemFire Search v1.0 to v1.1, you must destroy any existing Lucene indexes before upgrading, then recreate the indexes after upgrading.
gfsh
to destroy the Lucene indexes. See Destroying an Index.gfsh
to recreate the indexes. See Creating an Index.To upgrade to VMware GemFire 10.1 from VMware GemFire 9.x by restarting the cluster with its previously-generated cluster configuration, you must first use gfsh
to destroy the Lucene indexes, then recreate the indexes after upgrading.
Specifically:
gfsh
to destroy the Lucene indexes. See Destroying an Index.gfsh
to recreate the indexes. See Creating an Index.Note: Rolling upgrades from previous GemFire versions to GemFire 10.1 with GemFire Search deployed are not supported.
When upgrading to VMware GemFire 10.1 from VMware GemFire 10.0, you do not need to destroy or recreate the indexes after upgrading.
You can interact with GemFire Search indexes through a Java API, through the gfsh
command-line utility, or by means of the cache.xml
configuration file.
When you create a GemFire Search index, you must provide three pieces of information:
You must specify at least one field to be indexed.
If the object value for the entries in the region comprises a primitive type value without a field name, use __REGION_VALUE_FIELD
to specify the field to be indexed. __REGION_VALUE_FIELD
serves as the field name for entry values of all primitive types, including String
, Long
, Integer
, Float
, and Double
.
Each field has a corresponding analyzer to extract terms from text. When no analyzer is specified, the org.apache.lucene.analysis.standard.StandardAnalyzer
is used.
The index has an associated serializer that renders the indexed object as a GemFire Search document comprised of searchable fields. The default serializer is a simple one that handles top-level fields, but does not render collections or nested objects.
VMware GemFire supplies a built-in serializer, FlatFormatSerializer
, that handles collections and nested objects. For more information about GemFire Search indexes for nested objects, see Using FlatFormatSerializer to Index Fields Within Nested Objects.
As a third alternative, you can create your own serializer. This serializer must implement the LuceneSerializer
interface.
In gfsh
, use the create lucene index command to create GemFire Search indexes.
Example 1: This example creates a GemFire Search index with two fields. In this example:
The default serializer is used.
gfsh>create lucene index --name=indexName --region=/orders --field=customer,tags
Example 2: This example creates a GemFire Search index with two fields. In this example:
The default serializer is used.
gfsh>create lucene index --name=indexName --region=/orders
--field=customer,tags --analyzer=DEFAULT,org.apache.lucene.analysis.bg.BulgarianAnalyzer
For this example, the XML configuration file below specifies a GemFire Search index with three fields and three analyzers:
<cache
xmlns="http://geode.apache.org/schema/cache"
xmlns:lucene="http://geode.apache.org/schema/lucene"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://geode.apache.org/schema/cache
http://geode.apache.org/schema/cache/cache-1.0.xsd
http://geode.apache.org/schema/lucene
http://geode.apache.org/schema/lucene/lucene-1.0.xsd"
version="1.1">
<region name="region" refid="PARTITION">
<lucene:index name="myIndex">
<lucene:field name="a"
analyzer="org.apache.lucene.analysis.core.KeywordAnalyzer"/>
<lucene:field name="b"
analyzer="org.apache.lucene.analysis.core.SimpleAnalyzer"/>
<lucene:field name="c"
analyzer="org.apache.lucene.analysis.standard.ClassicAnalyzer"/>
<lucene:field name="d" />
</lucene:index>
</region>
</cache>
GemFire Search provides Lucene indexing search capabilities over the data stores in GemFire regions. Lucene indexes are managed and stored in GemFire regions. Lucene indexing can generate numerous “tombstones” that consume memory before they can be collected and removed. The rate of tombstone generation depends on the use-case, with the rate of generation higher in write-heavy indexed regions.
“Tombstones” are created by GemFire when consistency checking is enabled for a region. GemFire members do not immediately remove entries from the region when an application destroys the entry. Instead, the member retains the entry with its current version stamp for a period of time to detect possible conflicts with operations that have occurred. The retained entry is referred to as a tombstone.
To manage the memory consumed by tombstones, GemFire Search provides configuration settings related to the GemFire AsyncEventQueue used as the queuing mechanism to update the Lucene indexes. These settings help in batching and processing the region events that are applied to or passed to Lucene Index. Applications can change the values of these settings from the default based on the use case.
The following are the configurable settings (java system properties) that help to manage tombstones:
gemfire.search.batch-size: Determines the maximum number of events in a single batch. Default value: 10000
. Decreasing this increases the rate of tombstone creation.
gemfire.search.batch-time-interval: Determines the maximum amount of time in milliseconds. Default value: 1000
. Decreasing this increases the rate of tombstone creation.
gemfire.search.dispatcher-threads: Determines the number of threads that process batches. Default value: 1
. Increasing this increases the rate of tombstone creation.
When either the gemfire.search.batch-size or the gemfire.search.batch-time-interval is reached during the creation of a batch, the batch is delivered to one of the gemfire.search.dispatcher-threads for processing.
Several statistics can help decide how to tune these properties.
CachePerfStats puts for the data region. For example, RegionStats-partition-<region_name>
: The rate of put
operations per second.
CachePerfStats tombstones for the files region for the index and data region, for example, RegionStats-partition-<index_name>#_<region_name>.files
: The number of tombstones. GemFire Search commits cause entries from previous commits to be destroyed. These destroys generated the tombstones which are used to help facilitate region consistency across members.
LuceneIndexStatistics commits for the index and data region, for example, <index_name>-/<region_name>
: The number of GemFire Search commits. One or more commits will occur for each batch processed by the LuceneEventListener. Each commit causes approximately 35 destroys and tombstones.
AsyncEventQueueStatistics eventQueueSize for the index and data region, for example, asyncEventQueueStats-<index_name>#_<region_name>
: The size of the AsyncEventQueue’s queue of events to be processed. The batch processing must be fast enough to keep up with the put
rate so that the queue does not grow continuously.
VMware GemFire supplies a built-in serializer, org.apache.geode.cache.lucene.FlatFormatSerializer
. This serializer renders collections and nested objects as searchable fields, which you can access using the syntax fieldnameAtLevel1.fieldnameAtLevel2
for both indexing and querying.
For example, in the following data model, the Customer object contains both a Person
object and a collection of Page
objects. The Person
object also contains a Page
object.
public class Customer implements Serializable {
private String name;
private Collection<String> phoneNumbers;
private Collection<Person> contacts;
private Page[] myHomePages;
......
}
public class Person implements Serializable {
private String name;
private String email;
private int revenue;
private String address;
private String[] phoneNumbers;
private Page homepage;
.......
}
public class Page implements Serializable {
private int id; // search integer in int format
private String title;
private String content;
......
}
The FlatFormatSerializer
creates one document for each parent object, adding an indexed field for each data field in a nested object, identified by its qualified name. Similarly, collections are flattened and treated as tokens in a single field.
For example, the FlatFormatSerializer
could convert a Customer object, with the structure described above, into a document containing fields such as name
, contacts.name
, and contacts.homepage.title
, based on the indexed fields specified at index creation. Each segment is a field name, not a field type, because a class (such as Customer) could have more than one field of the same type (such as Person).
The serializer creates and indexes the fields that you specify when you request index creation.
The gfsh
equivalent of the above Java code uses the create lucene index
command with:
The FlatFormatSerializer
, specified using its fully qualified name, org.apache.geode.cache.lucene.FlatFormatSerializer
gfsh>create lucene index --name=customerIndex --region=Customer
--field=name,contacts.name,contacts.email,contacts.address,contacts.homepage.title
--serializer=org.apache.geode.cache.lucene.FlatFormatSerializer
The syntax for querying a nested field is the same as for querying a top level field with the additional qualifying parent field name, such as contacts.name:Jones77*
. This distinguishes which “name” field is intended when there can be more than one “name” field at different hierarchical levels in the object.
Example Java query:
LuceneQuery query = luceneService.createLuceneQueryFactory()
.create("customerIndex", "Customer", "contacts.name:Jones77*", "name");
PageableLuceneQueryResults<K,Object> results = query.findPages();
Example gfsh query:
gfsh>search lucene --name=customerIndex --region=Customer
--queryString="contacts.name:Jones77*"
--defaultField=name
LuceneQuery<String, Person> query = luceneService.createLuceneQueryFactory()
.create(indexName, regionName, "name:John AND zipcode:97006", defaultField);
Collection<Person> results = query.findValues();
gfsh>search lucene --name=indexName --region=/orders --queryString="Jones*"
--defaultField=customer
For more information, see the gfsh search lucene command reference page.
A region-destroy
operation does not cause the destruction of any GemFire Search indexes. You must destroy any GemFire Search indexes prior to destroying the associated region.
luceneService.destroyIndex(indexName, regionName);
An attempt to destroy a region before destroying its associated GemFire Search index will result in an error message similar to the following:
Region /orders cannot be destroyed because it defines Lucene index(es)
[/ordersIndex]. Destroy all Lucene indexes before destroying the region.
gfsh>destroy lucene index --name=indexName --region=/orders
For more information, see the gfsh destroy lucene index command reference page.
An attempt to destroy a region before destroying its associated GemFire Search index will result in an error message similar to the following:
Region /orders cannot be destroyed because it defines Lucene index(es)
[/ordersIndex]. Destroy all Lucene indexes before destroying the region.
Destroy the GemFire Search index.
Recreate a new GemFire Search index.
The gfsh describe lucene index
command displays details about a specified index. For more information about this command, see the gfsh describe lucene index command reference page.
The gfsh list lucene index
command displays the list of GemFire Search indexes created for all members. For more information about this command, see the gfsh list lucene index command reference page.