Querying MG4J

Querying MG4J is easy if you already used a text-indexing system. The simplest possible query is a single term, e.g., class: the answer that you will obtain by such a query is the set of all documents (in our case: all files among those that have been indexed) that contain the word class (or any other uppercase/lowercase variant thereof).

There are several additional operators you might want to try:

MG4J will emphasise intervals satisfying the query. By clicking on the link of a document, the document will be opened in the browser.

The description we have just given just scratches the surfaces of the queries you can write with MG4J: all the operators can be freely combined, obtaining very sophisticated constraints on the documents returned. More information on this topic can be found in the documentation of the package it.unimi.dsi.mg4j.search.

More sophisticated queries

MG4J actually provide very sophisticated query tuning. In particular, it provides scorers, which let you reorder the document satisying a query depending on some criterion. To use this features, you must use the command line interface, albeit all settings will be used for the subsequent web queries.

Type $ to get some help on the available options. A basic command is $mode, which lets you choose the kind of result: just the document number and title, the intervals, snippets and so on. Some options require a full index and a collection (for instance, snippets). The most interesting command, however, is $scorer, that lets you choose a scorer for your documents. For instance,

$score BM25Scorer VignaScorer

reproduces the standard settings, using a BM25 scorer and a scorer that shows firsts documents satisfying your queries more frequently and in smaller intervals, linearly combined with equal weight. Scorers are described in the documentation of the package it.unimi.dsi.mg4j.search.score.

When you use a scorer, it is a good idea to use multiplexing: when multiplexing is on, each query is multiplexed to all indices (by default, a query is directed to the first index specified on the command line). Just type

$mplex on

Of course, you can always choose a specific index with the colon notation. You can also change the weight of your indices (which is particularly useful when multiplexing):

$weight text:1 title:3

In this way, weight-based scorers will usually consider the title field three times more important than the text field.

You can also change the way snippets (or intervals) on display are chosen: MG4J provides an interval selector, a class that will try to choose the best intervals to be shown. You can set the maximum length of an interval, and the maximum number of intervals:

$selector 3 40

will show at most three intervals, and intervals longer than 40 characters will be broken. All these changes are reflected in the web interface.

If you want to learn more about query resolution, you should have a look at the documentation of the class it.unimi.dsi.mg4j.query.QueryEngine, which embodies all the logic used to answer queries in MG4J.