Package nltk :: Package corpus :: Package reader :: Module util :: Class ConcatenatedCorpusView
[hide private]
[frames] | no frames]

Class ConcatenatedCorpusView

source code

               object --+    
                        |    
util.AbstractLazySequence --+
                            |
                           ConcatenatedCorpusView

A 'view' of a corpus file that joins together one or more StreamBackedCorpusViews. At most one file handle is left open at any time.

Instance Methods [hide private]
 
__init__(self, corpus_views)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
__len__(self)
Return the number of tokens in the corpus file underlying this corpus view.
source code
 
close(self) source code
 
iterate_from(self, start_tok)
Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start.
source code

Inherited from util.AbstractLazySequence: __add__, __cmp__, __contains__, __getitem__, __hash__, __iter__, __mul__, __radd__, __repr__, __rmul__, count, index

Inherited from object: __delattr__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Variables [hide private]

Inherited from util.AbstractLazySequence (private): _MAX_REPR_SIZE

Instance Variables [hide private]
  _pieces
A list of the corpus subviews that make up this concatenation.
  _offsets
A list of offsets, indicating the index at which each subview begins.
  _open_piece
The most recently accessed corpus subview (or None).
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, corpus_views)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Overrides: object.__init__
(inherited documentation)

__len__(self)
(Length operator)

source code 

Return the number of tokens in the corpus file underlying this corpus view.

Overrides: util.AbstractLazySequence.__len__
(inherited documentation)

iterate_from(self, start_tok)

source code 

Return an iterator that generates the tokens in the corpus file underlying this corpus view, starting at the token number start. If start>=len(self), then this iterator will generate no tokens.

Overrides: util.AbstractLazySequence.iterate_from
(inherited documentation)

Instance Variable Details [hide private]

_offsets

A list of offsets, indicating the index at which each subview begins. In particular:

   offsets[i] = sum([len(p) for p in pieces[:i]])

_open_piece

The most recently accessed corpus subview (or None). Before a new subview is accessed, this subview will be closed.