Package nltk :: Package corpus :: Package reader :: Module tagged :: Class MacMorphoCorpusReader
[hide private]
[frames] | no frames]

Class MacMorphoCorpusReader

source code

      object --+        
               |        
api.CorpusReader --+    
                   |    
  TaggedCorpusReader --+
                       |
                      MacMorphoCorpusReader

A corpus reader for the MAC_MORPHO corpus. Each line contains a single tagged word, using '_' as a separator. Sentence boundaries are based on the end-sentence tag ('_.'). Paragraph information is not included in the corpus, so each paragraph returned by self.paras() and self.tagged_paras() contains a single sentence.

Instance Methods [hide private]
 
__init__(self, root, files, encoding=None, tag_mapping_function=None)
Construct a new Tagged Corpus reader for a set of documents located at the given root directory.
source code
 
_read_block(self, stream) source code

Inherited from TaggedCorpusReader: paras, raw, sents, tagged_paras, tagged_sents, tagged_words, words

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, files, open

Inherited from api.CorpusReader (private): _get_root

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

    Deprecated since 0.8

Inherited from TaggedCorpusReader: read, tagged, tokenized

    Deprecated since 0.9.1

Inherited from api.CorpusReader: filenames

Inherited from api.CorpusReader (private): _get_items

Instance Variables [hide private]

Inherited from api.CorpusReader (private): _encoding, _files, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Inherited from object: __class__

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Method Details [hide private]

__init__(self, root, files, encoding=None, tag_mapping_function=None)
(Constructor)

source code 

Construct a new Tagged Corpus reader for a set of documents located at the given root directory. Example usage:

>>> root = '/...path to corpus.../'
>>> reader = TaggedCorpusReader(root, '.*', '.txt')
Parameters:
  • root - The root directory for this corpus.
  • files - A list or regexp specifying the files in this corpus.
Overrides: TaggedCorpusReader.__init__
(inherited documentation)