Payload-based indices

Payload-based indices
Prev	Chapter 2. Behind the scenes: The indexing process	Next

MG4J provides a special kind of index, called payload-based index, that is used to store not text but rather metadata (dates, integers, etc.) related to a document. It is the default way of storing non-textual fields. Essentially, a payload-based index leverages the structure of a text-based index: it has no counts or positions, but each posting has a payload—a piece of data related to the document referred by the posting. In this way, by creating an index with a single posting list (related to the term #) we are effectively storing metadata related to each document. The main advantage of this approach is that we get almost for free the sophisticated skipping structure of MG4J's indices, and support for splitting, combination, and so on.

From the user viewpoint there is no particular difference between standard and payload-based indices, except that the latter do not provide some files that would be nonsensical, such as the file of sizes or the global occurrence count, and that searching a payload-based index is rather different form searching an index (instead of term-based operators and Boolean combinators you just get range queries).

Prev	Up	Next
Virtual fields in MG4J	Home	Chapter 3. Performance