mnoGoSearch 3.3.7 reference manual: Full-featured search engine software | ||
---|---|---|
Prev | Chapter 8. Searching documents | Next |
mnoGoSearch sorts results first by relevancy and second by popularity rank.
Relevancy for every found document is calculated as 100% multiplied by the cosine of an angle formed by weights vectors for the request
and weights vectors for the document found. The number of vector coordinates is equal to the multiplication of the number of words forms in
the search query and the number of sections defined in indexer.conf. Every vector's coordinate corresponds to
a word in a search query that fits one of the document's sections. The values of this coordinate depend on the weight of this section,
defined by the wf
parameter (see the Section called Changing different document parts weights at search time).
And this word is exactly the same as in the search query or its word form or synonym.
And one more coordinate is equal to the average distance between searched words in the document. For the query's vector, this coordinate is equal to 0.
In the default configuration search can produce quite small score values, because it expects that the words will be found in up to 256 document sections at the same time. Please see NumSections search.htm command description how to specify the real number of sections used, and thus increase score values.
Other commands affecting document order and/or score value are: DateFactor, DocSizeWeight, MinCoordFactor, NumDistinctWordFactor, NumWordFactor, WordDistanceWeight.
The popularity rank calculation is made in two stages. At first stage, the value of the Weight
parameter
for every server is divided by the number of links from this server. Thus, the weight of one link from this server is calculated.
At second stage, for every page we find the sum of weights of all links pointed to this page.
This sum is the popularity rank for this page. Self links, i.e. when a page
has a link to itself, do not affect popularity rank.
By default, the value of the Weight
parameter is equal to 1 for all servers indexed.
You may change this value by Weight command in the indexer.conf file or
directly in the server table, if you load the servers configuration from this table.
If you place the
PopRankSkipSameSite yes
command in the indexer.conf file, the indexer will take only inter-site links (i.e. links from a page on
one site to a page on another site) for popularity rank calculation.
If you place the
PopRankFeedBack yes
command in the indexer.conf file, the indexer will calculate the site weight before page rank
calculation. To do that, the indexer calculates the sum of popularity rank for all pages from the same site. If this sum is
greater than 1, the weight for the site is set to this sum, otherwise, the site weight is set to 1.
If you place the
PopRankUseTracking yes
command in the indexer.conf file, the indexer will calculate the site weight as the number of
tracked queries with restriction on this site.
If you place the
PopRankUseShowCnt yes
command in the search.htm file, then for every result shown to the user, the
corresponding url.shows value will be increased by 1, if relevancy for this result is great or equal to
the value specified by the
PopRankShowCntRatio
command (default value is 25.0).
If you place PopRankUseShowCnt yes
in the indexer.conf file, the indexer
will add to url's PopularityRank the value of url.shows multiplied by value, specified in the
PopRankShowCntWeight
command (default value is 0.01).
Starting from version 3.3.7, it's possible to debug score values calculated for the documents found. In order to debug score value go through these steps:
<--restop--> .... [DebugScore: $(DebugScore)] <--/restop-->
<--res--> .... [ID=$(ID)] <--/res-->
Note: URL will look approximately like this:
http://hostname/cgi-bin/search.cgi?q=test+query&DebugURLID=100
DebugScore: url_id=82 RDsum=98 distance=84 (84/1) minmax=0.99091089 density=0.00196271 numword=0.90135133 wordform=0.00000000It will give you an idea why score for the chosen document is too high or too low and help to fine tune various parameters like WordDistanceWeight or WordDensityFactor.
Note: Score debugging currently works only for queries with multiple search words. Queries with a single search word don't return debug information.
This feature authorizes assignment of the words between <a href="xxx"> and </a> to the document given in the link. To enable using Crosswords, use the CrossWords command in indexer.conf and search.htm.