Search/Lucene/Index/SegmentInfo.php
Zend Framework
LICENSE
This source file is subject to the new BSD license that is bundled with this package in the file LICENSE.txt. It is also available through the world-wide-web at this URL: http://framework.zend.com/license/new-bsd If you did not receive a copy of the license and are unable to obtain it through the world-wide-web, please send an email to [email protected] so we can send you a copy immediately.
- Category
- Zend
- Copyright
- Copyright (c) 2005-2012 Zend Technologies USA Inc. (http://www.zend.com)
- License
- New BSD License
- Package
- Zend_Search_Lucene
- Subpackage
- Index
- Version
- $Id: SegmentInfo.php 24593 2012-01-05 20:35:02Z matthew $
\Zend_Search_Lucene_Index_SegmentInfo
- Implements
- \Zend_Search_Lucene_Index_TermsStream_Interface
- Category
- Zend
- Copyright
- Copyright (c) 2005-2012 Zend Technologies USA Inc. (http://www.zend.com)
- License
- New BSD License
Constants
FULL_SCAN_VS_FETCH_BOUNDARY
= 5"Full scan vs fetch" boundary.
If filter selectivity is less than this value, then full scan is performed (since term entries fetching has some additional overhead).
Properties

integer $_delGen = Delete file generation number
-2 means autodetect latest delete generation -1 means 'there is no delete file' 0 means pre-2.1 format delete file X specifies used delete file
- Type
- integer

mixed $_deleted = nullList of deleted documents.
bitset if bitset extension is loaded or array otherwise.
nullDetails- Type
- mixed

array|null $_docMap = nullMap of the document IDs Used to get new docID after removing deleted documents.
It's not very effective from memory usage point of view, but much more faster, then other methods
nullDetails- Type
- array | null

array $_fields = Segment fields.
Array of Zend_Search_Lucene_Index_FieldInfo objects for this segment
- Type
- array

array $_fieldsDicPositions = Field positions in a dictionary.
(Term dictionary contains filelds ordered by names)
- Type
- array

\Zend_Search_Lucene_Storage_File $_frqFile = nullFrequencies File object for stream like terms reading
nullDetails

boolean $_hasSingleNormFile = Segment has single norms file
If true then one .nrm file is used for all fields Otherwise .fN files are used
- Type
- boolean

boolean $_isCompound = Use compound segment file (*.cfs) to collect all other segment files (excluding .del files)
- Type
- boolean

\Zend_Search_Lucene_Index_Term $_lastTerm = nullLast Term in a terms stream
nullDetails

\Zend_Search_Lucene_Index_TermInfo $_lastTermInfo = nullLast TermInfo in a terms stream
nullDetails

array|null $_lastTermPositions = An array of all term positions in the documents.
Array structure: array( docId => array( pos1, pos2, ...), ...)
Is set to null if term positions loading has to be skipped
- Type
- array | null

array $_norms = array()Normalization factors.
An array fieldName => normVector normVector is a binary string. Each byte corresponds to an indexed document in a segment and encodes normalization factor (float value, encoded by Zend_Search_Lucene_Search_Similarity::encodeNorm())
array()Details- Type
- array

\Zend_Search_Lucene_Storage_File $_prxFile = nullPositions File object for stream like terms reading
nullDetails

array $_segFileSizes = Associative array where the key is the file name and the value is file size (.csf).
- Type
- array

array $_segFiles = Associative array where the key is the file name and the value is data offset in a compound segment file (.csf).
- Type
- array

array $_termDictionary = Term Dictionary Index
Array of arrays (Zend_Search_Lucene_Index_Term objects are represented as arrays because of performance considerations) [0] -> $termValue [1] -> $termFieldNum
Corresponding Zend_Search_Lucene_Index_TermInfo object stored in the $_termDictionaryInfos
- Type
- array

array $_termDictionaryInfos = Term Dictionary Index TermInfos
Array of arrays (Zend_Search_Lucene_Index_TermInfo objects are represented as arrays because of performance considerations) [0] -> $docFreq [1] -> $freqPointer [2] -> $proxPointer [3] -> $skipOffset [4] -> $indexPointer
- Type
- array

array $_termInfoCache = array()TermInfo cache
Size is 1024. Numbers are used instead of class constants because of performance considerations
array()Details- Type
- array

integer $_termsScanMode = Terms scan mode
Values:
self::SM_TERMS_ONLY - terms are scanned, no additional info is retrieved self::SM_FULL_INFO - terms are scanned, frequency and position info is retrieved self::SM_MERGE_INFO - terms are scanned, frequency and position info is retrieved document numbers are compacted (shifted if segment has deleted documents)
- Type
- integer

\Zend_Search_Lucene_Storage_File $_tisFile = nullTerm Dictionary File object for stream like terms reading
nullDetails
Methods

__construct(\Zend_Search_Lucene_Storage_Directory $directory, string $name, integer $docCount, integer $delGen = 0, array | null $docStoreOptions = null, boolean $hasSingleNormFile = false, boolean $isCompound = null) : voidZend_Search_Lucene_Index_SegmentInfo constructor
| Name | Type | Description |
|---|---|---|
| $directory | \Zend_Search_Lucene_Storage_Directory | |
| $name | string | |
| $docCount | integer | |
| $delGen | integer | |
| $docStoreOptions | array | null | |
| $hasSingleNormFile | boolean | |
| $isCompound | boolean |

_detectLatestDelGen() : integerDetect latest delete generation
Is actualy used from writeChanges() method or from the constructor if it's invoked from Index writer. In both cases index write lock is already obtained, so we shouldn't care about it
| Type | Description |
|---|---|
| integer |

_getFieldPosition(integer $fieldNum) : integerGet field position in a fields dictionary
| Name | Type | Description |
|---|---|---|
| $fieldNum | integer |
| Type | Description |
|---|---|
| integer |

_load21DelFile() : mixedLoad 2.1+ format detetions file
Returns bitset or an array depending on bitset extension availability
| Type | Description |
|---|---|
| mixed |

_loadDelFile() : mixedLoad detetions file
Returns bitset or an array depending on bitset extension availability
| Type | Description |
|---|---|
| mixed |
| Exception | Description |
|---|---|
| \Zend_Search_Lucene_Exception |

_loadDictionaryIndex() : voidLoad terms dictionary index
| Exception | Description |
|---|---|
| \Zend_Search_Lucene_Exception |

_loadNorm(integer $fieldNum) : voidLoad normalizatin factors from an index file
| Name | Type | Description |
|---|---|---|
| $fieldNum | integer |
| Exception | Description |
|---|---|
| \Zend_Search_Lucene_Exception |

_loadPre21DelFile() : mixedLoad pre-2.1 detetions file
Returns bitset or an array depending on bitset extension availability
| Type | Description |
|---|---|
| mixed |
| Exception | Description |
|---|---|
| \Zend_Search_Lucene_Exception |

closeTermsStream() : voidClose terms stream
Should be used for resources clean up if stream is not read up to the end

compoundFileLength(string $extension) : integerGet compound file length
| Name | Type | Description |
|---|---|---|
| $extension | string |
| Type | Description |
|---|---|
| integer |

count() : integerReturns the total number of documents in this segment (including deleted documents).
| Type | Description |
|---|---|
| integer |

currentTerm() : \Zend_Search_Lucene_Index_Term | nullReturns term in current position
| Type | Description |
|---|---|
| \Zend_Search_Lucene_Index_Term | null |

currentTermPositions() : arrayReturns an array of all term positions in the documents.
Return array structure: array( docId => array( pos1, pos2, ...), ...)
| Type | Description |
|---|---|
| array |

delete( $id) : voidDeletes a document from the index segment.
$id is an internal document id
| Name | Type | Description |
|---|---|---|
| $id | integer |

getDelGen() : integerReturns actual deletions file generation number.
| Type | Description |
|---|---|
| integer |

getField(integer $fieldNum) : \Zend_Search_Lucene_Index_FieldInfoReturns field info for specified field
| Name | Type | Description |
|---|---|---|
| $fieldNum | integer |
| Type | Description |
|---|---|
| \Zend_Search_Lucene_Index_FieldInfo |

getFieldNum(string $fieldName) : integerReturns field index or -1 if field is not found
| Name | Type | Description |
|---|---|---|
| $fieldName | string |
| Type | Description |
|---|---|
| integer |

getFields(boolean $indexed = false) : arrayReturns array of fields.
if $indexed parameter is true, then returns only indexed fields.
| Name | Type | Description |
|---|---|---|
| $indexed | boolean |
| Type | Description |
|---|---|
| array |

getTermInfo(\Zend_Search_Lucene_Index_Term $term) : \Zend_Search_Lucene_Index_TermInfoScans terms dictionary and returns term info
| Name | Type | Description |
|---|---|---|
| $term | \Zend_Search_Lucene_Index_Term |
| Type | Description |
|---|---|
| \Zend_Search_Lucene_Index_TermInfo |

hasDeletions() : booleanReturns true if any documents have been deleted from this index segment.
| Type | Description |
|---|---|
| boolean |

hasSingleNormFile() : booleanReturns true if segment has single norms file.
| Type | Description |
|---|---|
| boolean |

isCompound() : booleanReturns true if segment is stored using compound segment file.
| Type | Description |
|---|---|
| boolean |

isDeleted( $id) : booleanChecks, that document is deleted
| Name | Type | Description |
|---|---|---|
| $id | integer |
| Type | Description |
|---|---|
| boolean |

nextTerm() : \Zend_Search_Lucene_Index_Term | nullScans terms dictionary and returns next term
| Type | Description |
|---|---|
| \Zend_Search_Lucene_Index_Term | null |

norm(integer $id, string $fieldName) : floatReturns normalization factor for specified documents
| Name | Type | Description |
|---|---|---|
| $id | integer | |
| $fieldName | string |
| Type | Description |
|---|---|
| float |

normVector(string $fieldName) : stringReturns norm vector, encoded in a byte string
| Name | Type | Description |
|---|---|---|
| $fieldName | string |
| Type | Description |
|---|---|
| string |

numDocs() : integerReturns the total number of non-deleted documents in this segment.
| Type | Description |
|---|---|
| integer |

openCompoundFile(string $extension, boolean $shareHandler = true) : \Zend_Search_Lucene_Storage_FileOpens index file stoted within compound index file
| Name | Type | Description |
|---|---|---|
| $extension | string | |
| $shareHandler | boolean |
| Type | Description |
|---|---|
| \Zend_Search_Lucene_Storage_File |
| Exception | Description |
|---|---|
| \Zend_Search_Lucene_Exception |

resetTermsStream() : integerReset terms stream
$startId - id for the fist document $compact - remove deleted documents
Returns start document id for the next segment
| Type | Description |
|---|---|
| integer |
| Exception | Description |
|---|---|
| \Zend_Search_Lucene_Exception |

skipTo(\Zend_Search_Lucene_Index_Term $prefix) : voidSkip terms stream up to the specified term preffix.
Prefix contains fully specified field info and portion of searched term
| Name | Type | Description |
|---|---|---|
| $prefix | \Zend_Search_Lucene_Index_Term |
| Exception | Description |
|---|---|
| \Zend_Search_Lucene_Exception |

termDocs(\Zend_Search_Lucene_Index_Term $term, integer $shift = 0, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : arrayReturns IDs of all the documents containing term.
| Name | Type | Description |
|---|---|---|
| $term | \Zend_Search_Lucene_Index_Term | |
| $shift | integer | |
| $docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
| Type | Description |
|---|---|
| array |

termFreqs(\Zend_Search_Lucene_Index_Term $term, integer $shift = 0, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : \Zend_Search_Lucene_Index_TermInfoReturns term freqs array.
Result array structure: array(docId => freq, ...)
| Name | Type | Description |
|---|---|---|
| $term | \Zend_Search_Lucene_Index_Term | |
| $shift | integer | |
| $docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
| Type | Description |
|---|---|
| \Zend_Search_Lucene_Index_TermInfo |

termPositions(\Zend_Search_Lucene_Index_Term $term, integer $shift = 0, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : \Zend_Search_Lucene_Index_TermInfoReturns term positions array.
Result array structure: array(docId => array(pos1, pos2, ...), ...)
| Name | Type | Description |
|---|---|---|
| $term | \Zend_Search_Lucene_Index_Term | |
| $shift | integer | |
| $docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
| Type | Description |
|---|---|
| \Zend_Search_Lucene_Index_TermInfo |