Search/Lucene.php
Zend Framework
LICENSE
This source file is subject to the new BSD license that is bundled with this package in the file LICENSE.txt. It is also available through the world-wide-web at this URL: http://framework.zend.com/license/new-bsd If you did not receive a copy of the license and are unable to obtain it through the world-wide-web, please send an email to [email protected] so we can send you a copy immediately.
- Category
- Zend
- Copyright
- Copyright (c) 2005-2012 Zend Technologies USA Inc. (http://www.zend.com)
- License
- New BSD License
- Package
- Zend_Search_Lucene
- Version
- $Id: Lucene.php 24593 2012-01-05 20:35:02Z matthew $
\Zend_Search_Lucene
- Implements
- \Zend_Search_Lucene_Interface
- Category
- Zend
- Copyright
- Copyright (c) 2005-2012 Zend Technologies USA Inc. (http://www.zend.com)
- License
- New BSD License
Constants
Properties


boolean $_closeDirOnExit = true
File system adapter closing option
true
Details- Type
- boolean


boolean $_closed = false
Signal, that index is already closed, changes are fixed and resources are cleaned up
false
Details- Type
- boolean


string $_defaultSearchField = null
Default field name for search
Null means search through all fields
null
Details- Type
- string


\Zend_Search_Lucene_Storage_Directory $_directory = null
File system adapter.
null
Details


integer $_resultSetLimit = 0
Result set limit
0 means no limit
0
Details- Type
- integer


array $_segmentInfos = array()
Array of Zend_Search_Lucene_Index_SegmentInfo objects for current version of index.
<p>Zend_Search_Lucene_Index_SegmentInfo</p>array()
Details- Type
- array


integer $_termsPerQueryLimit = 1024
Terms per query limit
0 means no limit
1024
Details- Type
- integer


\Zend_Search_Lucene_TermStreamsPriorityQueue $_termsStream = null
Terms stream priority queue object
null
Details
Methods


__construct(\Zend_Search_Lucene_Storage_Directory_Filesystem | string $directory = null, $create = false) : void
Opens the index.
IndexReader constructor needs Directory as a parameter. It should be a string with a path to the index folder or a Directory object.
Name | Type | Description |
---|---|---|
$directory | \Zend_Search_Lucene_Storage_Directory_Filesystem | string | |
$create |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


_getIndexWriter() : \Zend_Search_Lucene_Index_Writer
Returns an instance of Zend_Search_Lucene_Index_Writer for the index
Type | Description |
---|---|
\Zend_Search_Lucene_Index_Writer |


_readPre21SegmentsFile() : void
Read segments file for pre-2.1 Lucene index format
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


_readSegmentsFile() : void
Read segments file
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


addDocument(\Zend_Search_Lucene_Document $document) : void
Adds a document to this index.
Name | Type | Description |
---|---|---|
$document | \Zend_Search_Lucene_Document |


closeTermsStream() : void
Close terms stream
Should be used for resources clean up if stream is not read up to the end


commit() : void
Commit changes resulting from delete() or undeleteAll() operations.
- Todo
- undeleteAll processing.


count() : integer
Returns the total number of documents in this index (including deleted documents).
Type | Description |
---|---|
integer |


create(mixed $directory) : \Zend_Search_Lucene_Interface
Create index
Name | Type | Description |
---|---|---|
$directory | mixed |
Type | Description |
---|---|
\Zend_Search_Lucene_Interface |


currentTerm() : \Zend_Search_Lucene_Index_Term | null
Returns term in current position
Type | Description |
---|---|
\Zend_Search_Lucene_Index_Term | null |


delete(integer | \Zend_Search_Lucene_Search_QueryHit $id) : void
Deletes a document from the index.
$id is an internal document id
Name | Type | Description |
---|---|---|
$id | integer | \Zend_Search_Lucene_Search_QueryHit |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


docFreq(\Zend_Search_Lucene_Index_Term $term) : integer
Returns the number of documents in this index containing the $term.
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term |
Type | Description |
---|---|
integer |


find(\Zend_Search_Lucene_Search_QueryParser | string $query) : array
Performs a query against the index and returns an array of Zend_Search_Lucene_Search_QueryHit objects.
Input is a string or Zend_Search_Lucene_Search_Query.
Name | Type | Description |
---|---|---|
$query | \Zend_Search_Lucene_Search_QueryParser | string |
Type | Description |
---|---|
array | Zend_Search_Lucene_Search_QueryHit |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


getActualGeneration(\Zend_Search_Lucene_Storage_Directory $directory) : integer
Get current generation number
Returns generation number 0 means pre-2.1 index format -1 means there are no segments files.
Name | Type | Description |
---|---|---|
$directory | \Zend_Search_Lucene_Storage_Directory |
Type | Description |
---|---|
integer |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


getDefaultSearchField() : string
Get default search field.
Null means, that search is performed through all fields by default
Type | Description |
---|---|
string |


getDirectory() : \Zend_Search_Lucene_Storage_Directory
Returns the Zend_Search_Lucene_Storage_Directory instance for this index.
Type | Description |
---|---|
\Zend_Search_Lucene_Storage_Directory |


getDocument(integer | \Zend_Search_Lucene_Search_QueryHit $id) : \Zend_Search_Lucene_Document
Returns a Zend_Search_Lucene_Document object for the document number $id in this index.
Name | Type | Description |
---|---|---|
$id | integer | \Zend_Search_Lucene_Search_QueryHit |
Type | Description |
---|---|
\Zend_Search_Lucene_Document |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception | Exception is thrown if $id is out of the range |


getFieldNames(boolean $indexed = false) : array
Returns a list of all unique field names that exist in this index.
Name | Type | Description |
---|---|---|
$indexed | boolean |
Type | Description |
---|---|
array |


getGeneration() : integer
Get generation number associated with this index instance
The same generation number in pair with document number or query string guarantees to give the same result while index retrieving. So it may be used for search result caching.
Type | Description |
---|---|
integer |


getMaxBufferedDocs() : integer
Retrieve index maxBufferedDocs option
maxBufferedDocs is a minimal number of documents required before the buffered in-memory documents are written into a new Segment
Default value is 10
Type | Description |
---|---|
integer |


getMaxMergeDocs() : integer
Retrieve index maxMergeDocs option
maxMergeDocs is a largest number of documents ever merged by addDocument(). Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.
Default value is PHP_INT_MAX
Type | Description |
---|---|
integer |


getMergeFactor() : integer
Retrieve index mergeFactor option
mergeFactor determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.
Default value is 10
Type | Description |
---|---|
integer |


getResultSetLimit() : integer
Get result set limit.
0 means no limit
Type | Description |
---|---|
integer |


getSegmentFileName(integer $generation) : string
Get segments file name
Name | Type | Description |
---|---|---|
$generation | integer |
Type | Description |
---|---|
string |


getSimilarity() : \Zend_Search_Lucene_Search_Similarity
Retrive similarity used by index reader
Type | Description |
---|---|
\Zend_Search_Lucene_Search_Similarity |


getTermsPerQueryLimit() : integer
Get result set limit.
0 (default) means no limit
Type | Description |
---|---|
integer |


hasDeletions() : boolean
Returns true if any documents have been deleted from this index.
Type | Description |
---|---|
boolean |


hasTerm(\Zend_Search_Lucene_Index_Term $term) : boolean
Returns true if index contain documents with specified term.
Is used for query optimization.
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term |
Type | Description |
---|---|
boolean |


isDeleted(integer $id) : boolean
Checks, that document is deleted
Name | Type | Description |
---|---|---|
$id | integer |
Type | Description |
---|---|
boolean |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception | Exception is thrown if $id is out of the range |


maxDoc() : integer
Returns one greater than the largest possible document number.
This may be used to, e.g., determine how big to allocate a structure which will have an element for every document number in an index.
Type | Description |
---|---|
integer |


nextTerm() : \Zend_Search_Lucene_Index_Term | null
Scans terms dictionary and returns next term
Type | Description |
---|---|
\Zend_Search_Lucene_Index_Term | null |


norm(integer $id, string $fieldName) : float
Returns a normalization factor for "field, document" pair.
Name | Type | Description |
---|---|---|
$id | integer | |
$fieldName | string |
Type | Description |
---|---|
float |


numDocs() : integer
Returns the total number of non-deleted documents in this index.
Type | Description |
---|---|
integer |


open(mixed $directory) : \Zend_Search_Lucene_Interface
Open index
Name | Type | Description |
---|---|---|
$directory | mixed |
Type | Description |
---|---|
\Zend_Search_Lucene_Interface |


setDefaultSearchField(string $fieldName) : void
Set default search field.
Null means, that search is performed through all fields by default
Default value is null
Name | Type | Description |
---|---|---|
$fieldName | string |


setFormatVersion(int $formatVersion) : void
Set index format version.
Index is converted to this format at the nearest upfdate time
Name | Type | Description |
---|---|---|
$formatVersion | int |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


setMaxBufferedDocs(integer $maxBufferedDocs) : void
Set index maxBufferedDocs option
maxBufferedDocs is a minimal number of documents required before the buffered in-memory documents are written into a new Segment
Default value is 10
Name | Type | Description |
---|---|---|
$maxBufferedDocs | integer |


setMaxMergeDocs(integer $maxMergeDocs) : void
Set index maxMergeDocs option
maxMergeDocs is a largest number of documents ever merged by addDocument(). Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.
Default value is PHP_INT_MAX
Name | Type | Description |
---|---|---|
$maxMergeDocs | integer |


setMergeFactor( $mergeFactor) : void
Set index mergeFactor option
mergeFactor determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.
Default value is 10
Name | Type | Description |
---|---|---|
$mergeFactor |


setResultSetLimit(integer $limit) : void
Set result set limit.
0 (default) means no limit
Name | Type | Description |
---|---|---|
$limit | integer |


setTermsPerQueryLimit(integer $limit) : void
Set terms per query limit.
0 means no limit
Name | Type | Description |
---|---|---|
$limit | integer |


skipTo(\Zend_Search_Lucene_Index_Term $prefix) : void
Skip terms stream up to the specified term preffix.
Prefix contains fully specified field info and portion of searched term
Name | Type | Description |
---|---|---|
$prefix | \Zend_Search_Lucene_Index_Term |


termDocs(\Zend_Search_Lucene_Index_Term $term, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : array
Returns IDs of all documents containing term.
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term | |
$docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
Type | Description |
---|---|
array |


termDocsFilter(\Zend_Search_Lucene_Index_Term $term, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : \Zend_Search_Lucene_Index_DocsFilter
Returns documents filter for all documents containing term.
It performs the same operation as termDocs, but return result as Zend_Search_Lucene_Index_DocsFilter object
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term | |
$docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
Type | Description |
---|---|
\Zend_Search_Lucene_Index_DocsFilter |


termFreqs(\Zend_Search_Lucene_Index_Term $term, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : integer
Returns an array of all term freqs.
Result array structure: array(docId => freq, ...)
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term | |
$docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
Type | Description |
---|---|
integer |


termPositions(\Zend_Search_Lucene_Index_Term $term, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : array
Returns an array of all term positions in the documents.
Result array structure: array(docId => array(pos1, pos2, ...), ...)
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term | |
$docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
Type | Description |
---|---|
array |