There are two ways to search the index. The first method uses Query Parser to construct query from a string. The second provides the ability to create your own queries through the Zend_Search_Lucene API.
Before choosing to use the provided Query Parser, please consider the following:
Both ways use the same API method to search through the index:
<?php require_once('Zend/Search/Lucene.php'); $index = new Zend_Search_Lucene('/data/my_index'); $index->find($query); ?>
The Zend_Search_Lucene::find()
method determines input type automatically and
uses query parser to construct appropriate Zend_Search_Lucene_Search_Query object
from a string.
It is important to note that find()
IS case sensitive. By default,
LuceneIndexCreation.jar normalizes all documents to lowercase. This can be turned
off with a command line switch (type LuceneIndexCreation.jar with no arguments
for help). The case of the text supplied to find()
must match that
of the index. If the index is normalized to lowercase, then all text supplied
to find()
must pass through strtolower()
, or else it
may not match.
The search result is an array of Zend_Search_Lucene_Search_QueryHit objects. Each of these has
two properties: $hit->document
is a document number within
the index and $hit->score
is a score of the hit in
a search result. Result is ordered by score (top scores come first).
The Zend_Search_Lucene_Search_QueryHit object also exposes each field of the Zend_Search_Lucene_Document found by the hit as a property of the hit. In this example, a hit is returned and the corresponding document has two fields: title and author.
<?php require_once('Zend/Search/Lucene.php'); $index = new Zend_Search_Lucene('/data/my_index'); $hits = $index->find($query); foreach ($hits as $hit) { echo $hit->score; echo $hit->title; echo $hit->author; } ?>
Optionally, the original Zend_Search_Lucene_Document object can be returned from the
Zend_Search_Lucene_Search_QueryHit.
You can retrieve indexed parts of the document by using the getDocument()
method of the index object and then get them by
getFieldValue()
method:
<?php require_once('Zend/Search/Lucene.php'); $index = new Zend_Search_Lucene('/data/my_index'); $hits = $index->find($query); foreach ($hits as $hit) { // return Zend_Search_Lucene_Document object for this hit echo $document = $hit->getDocument(); // return a Zend_Search_Lucene_Field object // from the Zend_Search_Lucene_Document echo $document->getField('title'); // return the string value of the Zend_Search_Lucene_Field object echo $document->getFieldValue('title'); // same as getFieldValue() echo $document->title; } ?>
The fields available from the Zend_Search_Lucene_Document object are determined at the time of indexing. The document fields are either indexed, or index and stored, in the document by the indexing application (e.g. LuceneIndexCreation.jar).
Note that the document identity ('path' in our example) is also stored in the index and must be retrieved from it.
Zend_Search_Lucene uses the same scoring algorithms as Java Lucene. Search results are ordered by score. Hits with greater score come first, and documents having higher scores match the query more than documents having lower scores.
Roughly speaking, search hits that contain the searched term or phrase more frequently will have a higher score.
A scores can be retrieved by accessing the score
property of a hit:
<?php $hits = $index->find($query); foreach ($hits as $hit) { echo $hit->id; echo $hit->score; } ?>
Zend_Search_Lucene_Search_Similarity class is used to calculate score. See Extensibility. Scoring Algorithms section for details.