Search/Lucene/Index/SegmentInfo.php
Zend Framework
LICENSE
This source file is subject to the new BSD license that is bundled with this package in the file LICENSE.txt. It is also available through the world-wide-web at this URL: http://framework.zend.com/license/new-bsd If you did not receive a copy of the license and are unable to obtain it through the world-wide-web, please send an email to [email protected] so we can send you a copy immediately.
- Category
- Zend
- Copyright
- Copyright (c) 2005-2012 Zend Technologies USA Inc. (http://www.zend.com)
- License
- New BSD License
- Package
- Zend_Search_Lucene
- Subpackage
- Index
- Version
- $Id: SegmentInfo.php 24593 2012-01-05 20:35:02Z matthew $
\Zend_Search_Lucene_Index_SegmentInfo
- Implements
- \Zend_Search_Lucene_Index_TermsStream_Interface
- Category
- Zend
- Copyright
- Copyright (c) 2005-2012 Zend Technologies USA Inc. (http://www.zend.com)
- License
- New BSD License
Constants

FULL_SCAN_VS_FETCH_BOUNDARY
= 5
"Full scan vs fetch" boundary.
If filter selectivity is less than this value, then full scan is performed (since term entries fetching has some additional overhead).
Properties


integer $_delGen =
Delete file generation number
-2 means autodetect latest delete generation -1 means 'there is no delete file' 0 means pre-2.1 format delete file X specifies used delete file
- Type
- integer


mixed $_deleted = null
List of deleted documents.
bitset if bitset extension is loaded or array otherwise.
null
Details- Type
- mixed


array|null $_docMap = null
Map of the document IDs Used to get new docID after removing deleted documents.
It's not very effective from memory usage point of view, but much more faster, then other methods
null
Details- Type
- array | null


array $_fields =
Segment fields.
Array of Zend_Search_Lucene_Index_FieldInfo objects for this segment
- Type
- array


array $_fieldsDicPositions =
Field positions in a dictionary.
(Term dictionary contains filelds ordered by names)
- Type
- array


\Zend_Search_Lucene_Storage_File $_frqFile = null
Frequencies File object for stream like terms reading
null
Details


boolean $_hasSingleNormFile =
Segment has single norms file
If true then one .nrm file is used for all fields Otherwise .fN files are used
- Type
- boolean


boolean $_isCompound =
Use compound segment file (*.cfs) to collect all other segment files (excluding .del files)
- Type
- boolean


\Zend_Search_Lucene_Index_Term $_lastTerm = null
Last Term in a terms stream
null
Details


\Zend_Search_Lucene_Index_TermInfo $_lastTermInfo = null
Last TermInfo in a terms stream
null
Details


array|null $_lastTermPositions =
An array of all term positions in the documents.
Array structure: array( docId => array( pos1, pos2, ...), ...)
Is set to null if term positions loading has to be skipped
- Type
- array | null


array $_norms = array()
Normalization factors.
An array fieldName => normVector normVector is a binary string. Each byte corresponds to an indexed document in a segment and encodes normalization factor (float value, encoded by Zend_Search_Lucene_Search_Similarity::encodeNorm())
array()
Details- Type
- array


\Zend_Search_Lucene_Storage_File $_prxFile = null
Positions File object for stream like terms reading
null
Details


array $_segFileSizes =
Associative array where the key is the file name and the value is file size (.csf).
- Type
- array


array $_segFiles =
Associative array where the key is the file name and the value is data offset in a compound segment file (.csf).
- Type
- array


array $_termDictionary =
Term Dictionary Index
Array of arrays (Zend_Search_Lucene_Index_Term objects are represented as arrays because of performance considerations) [0] -> $termValue [1] -> $termFieldNum
Corresponding Zend_Search_Lucene_Index_TermInfo object stored in the $_termDictionaryInfos
- Type
- array


array $_termDictionaryInfos =
Term Dictionary Index TermInfos
Array of arrays (Zend_Search_Lucene_Index_TermInfo objects are represented as arrays because of performance considerations) [0] -> $docFreq [1] -> $freqPointer [2] -> $proxPointer [3] -> $skipOffset [4] -> $indexPointer
- Type
- array


array $_termInfoCache = array()
TermInfo cache
Size is 1024. Numbers are used instead of class constants because of performance considerations
array()
Details- Type
- array


integer $_termsScanMode =
Terms scan mode
Values:
self::SM_TERMS_ONLY - terms are scanned, no additional info is retrieved self::SM_FULL_INFO - terms are scanned, frequency and position info is retrieved self::SM_MERGE_INFO - terms are scanned, frequency and position info is retrieved document numbers are compacted (shifted if segment has deleted documents)
- Type
- integer


\Zend_Search_Lucene_Storage_File $_tisFile = null
Term Dictionary File object for stream like terms reading
null
Details
Methods


__construct(\Zend_Search_Lucene_Storage_Directory $directory, string $name, integer $docCount, integer $delGen = 0, array | null $docStoreOptions = null, boolean $hasSingleNormFile = false, boolean $isCompound = null) : void
Zend_Search_Lucene_Index_SegmentInfo constructor
Name | Type | Description |
---|---|---|
$directory | \Zend_Search_Lucene_Storage_Directory | |
$name | string | |
$docCount | integer | |
$delGen | integer | |
$docStoreOptions | array | null | |
$hasSingleNormFile | boolean | |
$isCompound | boolean |


_detectLatestDelGen() : integer
Detect latest delete generation
Is actualy used from writeChanges() method or from the constructor if it's invoked from Index writer. In both cases index write lock is already obtained, so we shouldn't care about it
Type | Description |
---|---|
integer |


_getFieldPosition(integer $fieldNum) : integer
Get field position in a fields dictionary
Name | Type | Description |
---|---|---|
$fieldNum | integer |
Type | Description |
---|---|
integer |


_load21DelFile() : mixed
Load 2.1+ format detetions file
Returns bitset or an array depending on bitset extension availability
Type | Description |
---|---|
mixed |


_loadDelFile() : mixed
Load detetions file
Returns bitset or an array depending on bitset extension availability
Type | Description |
---|---|
mixed |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


_loadDictionaryIndex() : void
Load terms dictionary index
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


_loadNorm(integer $fieldNum) : void
Load normalizatin factors from an index file
Name | Type | Description |
---|---|---|
$fieldNum | integer |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


_loadPre21DelFile() : mixed
Load pre-2.1 detetions file
Returns bitset or an array depending on bitset extension availability
Type | Description |
---|---|
mixed |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


closeTermsStream() : void
Close terms stream
Should be used for resources clean up if stream is not read up to the end


compoundFileLength(string $extension) : integer
Get compound file length
Name | Type | Description |
---|---|---|
$extension | string |
Type | Description |
---|---|
integer |


count() : integer
Returns the total number of documents in this segment (including deleted documents).
Type | Description |
---|---|
integer |


currentTerm() : \Zend_Search_Lucene_Index_Term | null
Returns term in current position
Type | Description |
---|---|
\Zend_Search_Lucene_Index_Term | null |


currentTermPositions() : array
Returns an array of all term positions in the documents.
Return array structure: array( docId => array( pos1, pos2, ...), ...)
Type | Description |
---|---|
array |


delete( $id) : void
Deletes a document from the index segment.
$id is an internal document id
Name | Type | Description |
---|---|---|
$id | integer |


getDelGen() : integer
Returns actual deletions file generation number.
Type | Description |
---|---|
integer |


getField(integer $fieldNum) : \Zend_Search_Lucene_Index_FieldInfo
Returns field info for specified field
Name | Type | Description |
---|---|---|
$fieldNum | integer |
Type | Description |
---|---|
\Zend_Search_Lucene_Index_FieldInfo |


getFieldNum(string $fieldName) : integer
Returns field index or -1 if field is not found
Name | Type | Description |
---|---|---|
$fieldName | string |
Type | Description |
---|---|
integer |


getFields(boolean $indexed = false) : array
Returns array of fields.
if $indexed parameter is true, then returns only indexed fields.
Name | Type | Description |
---|---|---|
$indexed | boolean |
Type | Description |
---|---|
array |


getTermInfo(\Zend_Search_Lucene_Index_Term $term) : \Zend_Search_Lucene_Index_TermInfo
Scans terms dictionary and returns term info
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term |
Type | Description |
---|---|
\Zend_Search_Lucene_Index_TermInfo |


hasDeletions() : boolean
Returns true if any documents have been deleted from this index segment.
Type | Description |
---|---|
boolean |


hasSingleNormFile() : boolean
Returns true if segment has single norms file.
Type | Description |
---|---|
boolean |


isCompound() : boolean
Returns true if segment is stored using compound segment file.
Type | Description |
---|---|
boolean |


isDeleted( $id) : boolean
Checks, that document is deleted
Name | Type | Description |
---|---|---|
$id | integer |
Type | Description |
---|---|
boolean |


nextTerm() : \Zend_Search_Lucene_Index_Term | null
Scans terms dictionary and returns next term
Type | Description |
---|---|
\Zend_Search_Lucene_Index_Term | null |


norm(integer $id, string $fieldName) : float
Returns normalization factor for specified documents
Name | Type | Description |
---|---|---|
$id | integer | |
$fieldName | string |
Type | Description |
---|---|
float |


normVector(string $fieldName) : string
Returns norm vector, encoded in a byte string
Name | Type | Description |
---|---|---|
$fieldName | string |
Type | Description |
---|---|
string |


numDocs() : integer
Returns the total number of non-deleted documents in this segment.
Type | Description |
---|---|
integer |


openCompoundFile(string $extension, boolean $shareHandler = true) : \Zend_Search_Lucene_Storage_File
Opens index file stoted within compound index file
Name | Type | Description |
---|---|---|
$extension | string | |
$shareHandler | boolean |
Type | Description |
---|---|
\Zend_Search_Lucene_Storage_File |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


resetTermsStream() : integer
Reset terms stream
$startId - id for the fist document $compact - remove deleted documents
Returns start document id for the next segment
Type | Description |
---|---|
integer |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


skipTo(\Zend_Search_Lucene_Index_Term $prefix) : void
Skip terms stream up to the specified term preffix.
Prefix contains fully specified field info and portion of searched term
Name | Type | Description |
---|---|---|
$prefix | \Zend_Search_Lucene_Index_Term |
Exception | Description |
---|---|
\Zend_Search_Lucene_Exception |


termDocs(\Zend_Search_Lucene_Index_Term $term, integer $shift = 0, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : array
Returns IDs of all the documents containing term.
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term | |
$shift | integer | |
$docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
Type | Description |
---|---|
array |


termFreqs(\Zend_Search_Lucene_Index_Term $term, integer $shift = 0, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : \Zend_Search_Lucene_Index_TermInfo
Returns term freqs array.
Result array structure: array(docId => freq, ...)
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term | |
$shift | integer | |
$docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
Type | Description |
---|---|
\Zend_Search_Lucene_Index_TermInfo |


termPositions(\Zend_Search_Lucene_Index_Term $term, integer $shift = 0, \Zend_Search_Lucene_Index_DocsFilter | null $docsFilter = null) : \Zend_Search_Lucene_Index_TermInfo
Returns term positions array.
Result array structure: array(docId => array(pos1, pos2, ...), ...)
Name | Type | Description |
---|---|---|
$term | \Zend_Search_Lucene_Index_Term | |
$shift | integer | |
$docsFilter | \Zend_Search_Lucene_Index_DocsFilter | null |
Type | Description |
---|---|
\Zend_Search_Lucene_Index_TermInfo |