MG4J provides a special kind of index, called
payload-based index, that is used to store not text
but rather metadata (dates, integers, etc.) related to a document. It is
the default way of storing non-textual fields. Essentially, a
payload-based index leverages the structure of a text-based index: it has
no counts or positions, but each posting has a
payload—a piece of data related to the document
referred by the posting. In this way, by creating an index with a single
posting list (related to the term #
) we are effectively
storing metadata related to each document. The main advantage of this
approach is that we get almost for free the sophisticated skipping
structure of MG4J's indices, and support for splitting, combination, and
so on.
From the user viewpoint there is no particular difference between standard and payload-based indices, except that the latter do not provide some files that would be nonsensical, such as the file of sizes or the global occurrence count, and that searching a payload-based index is rather different form searching an index (instead of term-based operators and Boolean combinators you just get range queries).