Index creation and updating capabilities are implemented within Zend_Search_Lucene module and Java Lucene. You can use both of these capabilities.
The PHP code listing below provides an example of how to index a file using Zend_Search_Lucene indexing API:
<?php // Setting the second argument to TRUE creates a new index $index = new Zend_Search_Lucene('/data/my-index', true); $doc = new Zend_Search_Lucene_Document(); // Store document URL to identify it in search result. $doc->addField(Zend_Search_Lucene_Field::Text('url', $docUrl)); // Index document content $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $docContent)); // Add document to the index. $index->addDocument($doc); // Write changes to the index. $index->commit(); ?>
Newly added documents could be retrived from the index after commit operation.
Zend_Search_Lucene::commit()
is automatically called at the and of script execution and
before any search request.
Each commit() call generates new index segment. So it must be requested as rarely as possible. From the other side commiting large amount of documents in one step needs more memory.
Automatic segment management optimization is a subject of future Zend_Search_Lucene enhancements.
The same procedure is used to update existing index. The only difference is that index should be opened without second parameter:
<?php // Open existing index $index = new Zend_Search_Lucene('/data/my-index'); $doc = new Zend_Search_Lucene_Document(); // Store document URL to identify it in search result. $doc->addField(Zend_Search_Lucene_Field::Text('url', $docUrl)); // Index document content $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $docContent)); // Add document to the index. $index->addDocument($doc); // Write changes to the index. $index->commit(); ?>
Each commit() call (explicit or implicit) generates new index segment.
Zend_Search_Lucene doesn't manage segments automatically. Thus you should care about segment size. From the one side large segment is more optimal, but from another large segment needs more memory during creation.
Lucene Java and Luke (Lucene Index Toolbox - http://www.getopt.org/luke/) can be used to optimize index with this version of Zend_Search_Lucene.