MainOverviewWikiIssuesForumBuildFisheye

Chapter 14. Introduction

14.1. Overview

Compass Gps provides integration with different indexable data sources using two interfaces: CompassGps and CompassGpsDevice. Both interfaces are very abstract, since different data sources are usually different in the way they work or the API they expose.

A device is considered to be any type of indexable data source imaginable, from a database (maybe through the use of an ORM mapping tool), file system, ftp site, or a web site.

The main contract that a device is required to provide is the ability to index it's data (using the index() operation). You can think of it as batch indexing the datasource data, providing access for future search queries. An additional possible operation that a device can implement is mirror data changes, either actively or passively.

Compass Gps is built on top of Compass Core module, utilizing all it's features such as transactions (including the important batch_insert level for batch indexing), OSEM, and the simple API that comes with Compass Core.

When performing the index operation, it is very important NOT to perform it within an already running transaction. For LocalTransactionFactory, no outer LocalTransaction should be started. For JTATransactionFactory, no JTA transaction must be started, or no CMT transaction defined for the method level (on EJB Session Bean for example). For SpringSyncTransactionFactory, no spring transaction should be wrapping the index code, and the executing method should not be wrapped with a transaction (using transaction proxy for example).

14.2. CompassGps

CompassGps is the main interface within the Compass Gps module. It holds a list of CompassGpsDevices, and manages their lifecycle.

CompassGpsInterfaceDevice is an extension of CompassGps, and provides the needed abstration between the Compass instance/s and the given devices. Every implementation of a CompassGps must also implement the CompassGpsInterfaceDevice. Compass Gps module comes with two implementations of CompassGps:

14.2.1. SingleCompassGps

Holds a single Compass instance. The Compass instance is used for both the index operation and the mirror operation. When executing the index operation Single Compass Gps will clone the provided Compass instance. Additional or overriding settings can be provided using indexSettings. By default, default overriding settings are: batch_insert as transaction isolation mode, and disabling of any cascading operations (as they usually do not make sense for index operations). A prime example for overriding setting of the index operation can be when using a database as the index storage, but define a file based storage for the index operation (the index will be built on the file system and then copied to the database).

When calling the index operation on the SingleCompassGps, it will gracefully replace the current index (pointed by the initialized single Compass instance), with the content of the index operation. Gracefully means that while the index operation is executing and building a temporary index, no write operations will be allowed on the actual index, and while the actual index is replaced by the temporary index, no read operations are allowed as well.

14.2.2. DualCompassGps

Holds two Compass instances. One, called indexCompass is responsible for index operation. The other, called mirrorCompass is responsible for mirror operations. The main reason why we have two different instances is because the transaction isolation level can greatly affect the performance of each operation. Usually the indexCompass instance will be configured with the batch_insert isolation level, while the mirrorCompass instance will use the default transaction isolation level (read_committed).

When calling the index operation on the DualCompassGps, it will gracefully replace the mirror index (pointed by the initialized mirrorCompass instance), with the content of the index index (pointed by the initialized indexCompass instance). Gracefully means that while the index operation is executing and building the index, no write operations will be allowed on the mirror index, and while the mirror index is replaced by the index, no read operations are allowed as well.

Both implementations of CompassGps allow to set / override settings of the Compass that will be responsible for the index process. One sample of using the feature which might yield performance improvements can be when storing the index within a database. The indexing process can be done on the local file system (on a temporary location), in a compound format (or non compound format), by setting the indexing compass connection setting to point to a file system location. Both implementations will perform "hot replace" of the file system index into the database location, automatically compounding / uncompounding based on the settings of both the index and the mirror compass instances.

14.3. CompassGpsDevice

A Gps devices must implement the CompassGpsDevice interface in order to provide device indexing. It is responsible for interacting with a data source and reflecting it's data in the Compass index. Two examples of devices are a file system and a database, accessed through the use of a ORM tool (like Hibernate).

A device will provide the ability to index the data source (using the index() operation), which usually means iterating through the device data and indexing it. It might also provide "real time" monitoring of changes in the device, and applying them to the index as well.

A CompassGpsDevice cannot operate standalone, and must be a part of a CompassGps instance (even if we have only one device), since the device requires the Compass instance(s) in order to apply the changes to the index.

Each device has a name associated with it. A device name must be unique across all the devices within a single CompassGps instance.

14.3.1. MirrorDataChangesGpsDevice

As mentioned, the main operation in CompassGpsDevice is index(), which is responsible for batch indexing all the relevant data in the data source. Gps devices that can mirror real time data changes made to the data source by implementing the MirrorDataChangesGpsDevice interface (which extends the CompassGpsDevice interface).

There are two types of devices for mirroring data. ActiveMirrorGpsDevice provides data mirroring of the datasource by explicit programmatic calls to performMirroring. PassiveMirrorGpsDevice is a GPS device that gets notified of data changes made to the data source, and does not require user intervention in order to reflect data changes to the compass index.

For ActiveMirrorGpsDevice, Compass Gps provides a ScheduledMirrorGpsDevice class, which wraps an ActiveMirrorGpsDevice and schedules the execution of the performMirror() operation.

14.4. Programmatic Configuration

Configuration of Compass Gps is achieved by programmatic configuration or through an IOC container. All the devices provided by Compass Gps as well as CompassGps can be configured via Spring framework.

The following code snippet shows how to configure Compass Gps as well as managing it's lifecycle.

Compass compass = ... // configure compass
CompassGps gps = new SingleCompassGps(compass);

CompassGpsDevice device1 = ... // configure the first device
device1.setName("device1");
gps.addDevice(device1);

CompassGpsDevice device2 = ... // configure the second device
device2.setName("device2");
gps.addDevice(device2);

gps.start();
....
....
//on application shutdown
gps.stop();

14.5. Parallel Device

The Compass Gps module provides a convenient base class for parallel indexing of devices (data sources). The AbstractParallelGpsDevice and its supporting classes allow to simplify paralleled gps devices index operations (and is used by Hibernate and Jpa Gps devices).

If we use the following aliases mapped to different sub indexes as an example:

Alias To Sub Index Mapping

The first step during the parallel device startup (start operation) is to ask its derived class for its indexable entities (the parallel device support defines an index entity as an entity "template" about to be indexed associated with a name and a set of sub indexes). In our case, the following are the indexed entities:

Parallel Index Entities

Then, still during the startup process, the index entities are partitioned using an IndexEntitiesPartitioner implementation. The default (and the only one provided built in) is the SubIndexIndexEntitiesPartitioner that partitions the entities based on their sub index allocation (this is also usually the best partitioning possible, as locking is performed on the sub index level). Here are the index entities partitioned:

Partitioned Index Entities

During the index operation, a ParallelIndexExecutor implementation will then execute the index operation using the partitioned index entities, and an IndexEntitiesIndexer implementation (which is provided by the derived class). The default implementation is ConcurrentParallelIndexExecutor which creates N threads during the index operation based on the number of partitioned entities and then executes the index process in parallel on the partitioned index entities. In our case, the following diagram shows the index process:

Concurrent Parallel Index Process

Compass also comes with a simple SameThreadParallelIndexExecutor which basically uses the same thread of execution to execute the index operation sequentially.

14.6. Building a Gps Device

If you wish to build your own Gps Device, it could not be simpler (actually, it is as simple as getting the data from the data source or monitoring the data sorce data changes). The main API that a device must implement is index() which by contract means that all the relevant data for indexing in the data source is indexed.

If you wish to perform real time mirroring of data changes from the data source to the index, you can controll the lifecycle of the mirroring using the start() and stop() operations, and must implement either the ActiveMirrorGpsDevice or the PassiveMirrorGpsDevice interfaces.

Compass::Gps comes with a set of base classes for gps devices that can help the development of new gps devices.