Package nltk :: Package corpus :: Package reader :: Module tagged :: Class TaggedCorpusView
[hide private]
[frames] | no frames]

Class TaggedCorpusView

source code

               object --+        
                        |        
util.AbstractLazySequence --+    
                            |    
  util.StreamBackedCorpusView --+
                                |
                               TaggedCorpusView

A specialized corpus view for tagged documents. It can be customized via flags to divide the tagged corpus documents up by sentence or paragraph, and to include or omit part of speech tags. TaggedCorpusView objects are typically created by TaggedCorpusReader (not directly by nltk users).

Instance Methods [hide private]
 
__init__(self, corpus_file, encoding, tagged, group_by_sent, group_by_para, sep, word_tokenizer, sent_tokenizer, para_block_reader, tag_mapping_function=None)
Create a new corpus view, based on the file filename, and read with block_reader.
source code
list of any
read_block(self, stream)
Reads one paragraph at a time.
source code

Inherited from util.StreamBackedCorpusView: __add__, __getitem__, __len__, __mul__, __radd__, __rmul__, close, iterate_from

Inherited from util.StreamBackedCorpusView (private): _open

Inherited from util.AbstractLazySequence: __cmp__, __contains__, __hash__, __iter__, __repr__, count, index

Inherited from object: __delattr__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Variables [hide private]

Inherited from util.AbstractLazySequence (private): _MAX_REPR_SIZE

Instance Variables [hide private]
Properties [hide private]

Inherited from util.StreamBackedCorpusView: filename

Inherited from object: __class__

Method Details [hide private]

__init__(self, corpus_file, encoding, tagged, group_by_sent, group_by_para, sep, word_tokenizer, sent_tokenizer, para_block_reader, tag_mapping_function=None)
(Constructor)

source code 

Create a new corpus view, based on the file filename, and read with block_reader. See the class documentation for more information.

Parameters:
  • filename - The path to the file that is read by this corpus view. filename can either be a string or a PathPointer.
  • startpos - The file position at which the view will start reading. This can be used to skip over preface sections.
  • encoding - The unicode encoding that should be used to read the file's contents. If no encoding is specified, then the file's contents will be read as a non-unicode string (i.e., a str).
Overrides: util.StreamBackedCorpusView.__init__
(inherited documentation)

read_block(self, stream)

source code 

Reads one paragraph at a time.

Parameters:
  • stream - an input stream
Returns: list of any
a block of tokens from the input stream
Overrides: util.StreamBackedCorpusView.read_block