The Compass Needle GigaSpaces integration allows to store a Lucene index within GigaSpaces. It also allows to automatically index the data grid using Compass OSEM support and mirror changes done to the data grid into the search engine.
Compass provides a GigaSpaceDirectory which is an implementation of Lucene Directory allowing to store the index within GigaSpaces data grid.
Here is a simple example of how it can be used:
IJSpace space = SpaceFinder.find("jini://*/*/mySpace"); GigaSpaceDirectory dir = new GigaSpaceDirectory(space, "test"); // ... (use the dir with IndexWriter and IndexSearcher)
In the above example we created a directory on top of GigaSpace's Space with an index named "test". The directory can now be used to create Lucene IndexWriter and IndexSearcher.
The Lucene directory interface represents a virtual file system. Implementing it on top of the Space is done by breaking files into a file header, called FileEntry and one or more FileBucketEntry. The FileEntry holds the meta data of the file, for example, its size and timestamp, while the FileBucketEntry holds a bucket size of the actual file content. The bucket size can be controlled when constructing the GigaSpaceDirectory, but note that it must not be changed if connecting to an existing index.
Note, it is preferable to configure the directory not to use the compound index format as it yields better performance.
The GigaSpaces integration can also use GigaSpaces just as a distributed lock manager without the need to actually store the index on GigaSpaces. The GigaSpaceLockFactory can be used for it.
Compass allows for simple integration with GigaSpaceDirectory as the index storage mechanism. The following example shows how Compass can be configured to work against a GigaSpaces based index with an index named test:
<compass name="default"> <connection> <space indexName="test" url="jini://*/*/mySpace"/> </connection> </compass>
The following shows how to configure it using properties based configuration:
compass.engine.connection=space://test:jini://*/*/mySpace
By default, when using GigaSpaces as the Compass store, the index will be in an uncompound file format. It will also automatically be configured with an expiration time based index deletion policy so multiple clients will work correctly.
Compass can also be configured just to used GigaSpaces as a distributed lock manager without the need to actually store the index on GigaSpaces (note that when configuring GigaSpaces as the actual store, the GigaSpaces lock factory will be used by default). Here is how it can be configured:
compass.engine.store.lockFactory.type=org.compass.needle.gigaspaces.store.GigaSpaceLockFactoryProvider compass.engine.store.lockFactory.path=jini://*/*/mySpace?groups=kimchy
The GigaSpaces integration comes with a built in external data source that can be used with GigaSpaces Mirror Service. Basically, a mirror allows to mirror changes done to the Space (data grid) into the search engine in a reliable asynchronous manner. The following is an example of how it can be configured within a mirror processing unit (for more information see here)
<beans xmlns="http://www.springframework.org/schema/beans" ... <bean id="compass" class="org.compass.spring.LocalCompassBean"> <property name="classMappings"> <list> <value>eg.Blog</value> <value>eg.Post</value> <value>eg.Comment</value> </list> </property> <property name="compassSettings"> <props> <prop key="compass.engine.connection">space://blog:jini://*/*/searchContent</prop> <!-- Configure expiration time so other clients that haven't refreshed the cache will still see deleted files --> <prop key="compass.engine.store.indexDeletionPolicy.type">expirationtime</prop> <prop key="compass.engine.store.indexDeletionPolicy.expirationTimeInSeconds">300</prop> </props> </property> </bean> <bean id="compassDataSource" class="org.compass.needle.gigaspaces.CompassDataSource"> <property name="compass" ref="compass" /> </bean> <os-core:space id="mirrodSpace" url="/./mirror-service" schema="mirror" external-data-source="compassDataSource" /> </beans>
The above configuration will mirror any changes done in the data grid into the search engine through the Compass instance. It will, further more, connect and store the index content on a specific Space called blog.