Package nltk :: Package corpus :: Package reader :: Module tagged :: Class CategorizedTaggedCorpusReader

Class CategorizedTaggedCorpusReader

                 object --+    
                          |    
api.CategorizedCorpusReader --+
                              |
             object --+       |
                      |       |
       api.CorpusReader --+   |
                          |   |
         TaggedCorpusReader --+
                              |
                             CategorizedTaggedCorpusReader

A reader for part-of-speech tagged corpora whose documents are divided into categories based on their file identifiers.

Instance Methods

[hide private]

__init__(self, *args, **kwargs)
Initialize the corpus reader.

source code

_resolve(self, files, categories)

source code

str

raw(self, files=None, categories=None)
Returns: the given file or files as a single string.

source code

list of str

words(self, files=None, categories=None)
Returns: the given file or files as a list of words and punctuation symbols.

source code

list of (list of str)

sents(self, files=None, categories=None)
Returns: the given file or files as a list of sentences or utterances, each encoded as a list of word strings.

source code

list of (list of (list of str))

paras(self, files=None, categories=None)
Returns: the given file or files as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of word strings.

source code

list of (str,str)

tagged_words(self, files=None, categories=None, simplify_tags=False)
Returns: the given file or files as a list of tagged words and punctuation symbols, encoded as tuples (word,tag). source code

list of (list of (str,str))

tagged_sents(self, files=None, categories=None, simplify_tags=False)
Returns: the given file or files as a list of sentences, each encoded as a list of (word,tag) tuples. source code

list of (list of (list of (str,str)))

tagged_paras(self, files=None, categories=None, simplify_tags=False)
Returns: the given file or files as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of (word,tag) tuples. source code

Inherited from api.CategorizedCorpusReader: categories, files

Inherited from api.CategorizedCorpusReader (private): _add, _init

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, open

Inherited from api.CorpusReader (private): _get_root

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Deprecated since 0.8

Inherited from TaggedCorpusReader: read, tagged, tokenized

Deprecated since 0.9.1

Inherited from api.CorpusReader: filenames

Inherited from api.CorpusReader (private): _get_items

Instance Variables

[hide private]

Inherited from api.CategorizedCorpusReader (private): _c2f, _delimiter, _f2c, _file, _map, _pattern

Inherited from api.CorpusReader (private): _encoding, _files, _root

Properties

[hide private]

Inherited from api.CorpusReader: root

Inherited from object: __class__

Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Method Details

[hide private]

init(self, *args, **kwargs)
(Constructor)

source code

Initialize the corpus reader. Categorization arguments (cat_pattern, cat_map, and cat_file) are passed to the CategorizedCorpusReader constructor. The remaining arguments are passed to the TaggedCorpusReader constructor.

Overrides: api.CategorizedCorpusReader.__init__

raw(self, files=None, categories=None)

source code

Returns: str: the given file or files as a single string.
Overrides: TaggedCorpusReader.raw: (inherited documentation)

words(self, files=None, categories=None)

source code

Returns: list of str: the given file or files as a list of words and punctuation symbols.
Overrides: TaggedCorpusReader.words: (inherited documentation)

sents(self, files=None, categories=None)

source code

Returns: list of (list of str): the given file or files as a list of sentences or utterances, each encoded as a list of word strings.
Overrides: TaggedCorpusReader.sents: (inherited documentation)

paras(self, files=None, categories=None)

source code

Returns: list of (list of (list of str)): the given file or files as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of word strings.
Overrides: TaggedCorpusReader.paras: (inherited documentation)

tagged_words(self, files=None, categories=None, simplify_tags=False)

source code

Returns: list of (str,str): the given file or files as a list of tagged words and punctuation symbols, encoded as tuples (word,tag).
Overrides: TaggedCorpusReader.tagged_words: (inherited documentation)

tagged_sents(self, files=None, categories=None, simplify_tags=False)

source code

Returns: list of (list of (str,str)): the given file or files as a list of sentences, each encoded as a list of (word,tag) tuples.
Overrides: TaggedCorpusReader.tagged_sents: (inherited documentation)

tagged_paras(self, files=None, categories=None, simplify_tags=False)

source code

Returns: list of (list of (list of (str,str))): the given file or files as a list of paragraphs, each encoded as a list of sentences, which are in turn encoded as lists of (word,tag) tuples.
Overrides: TaggedCorpusReader.tagged_paras: (inherited documentation)

Class CategorizedTaggedCorpusReader

__init__(self, *args, **kwargs) (Constructor)

raw(self, files=None, categories=None)

words(self, files=None, categories=None)

sents(self, files=None, categories=None)

paras(self, files=None, categories=None)

tagged_words(self, files=None, categories=None, simplify_tags=False)

tagged_sents(self, files=None, categories=None, simplify_tags=False)

tagged_paras(self, files=None, categories=None, simplify_tags=False)

init(self, *args, **kwargs)
(Constructor)