A reader for part-of-speech tagged corpora whose documents are divided
into categories based on their file identifiers.
|
|
|
|
|
|
str
|
raw(self,
files=None,
categories=None)
Returns:
the given file or files as a single string. |
source code
|
|
list of str
|
words(self,
files=None,
categories=None)
Returns:
the given file or files as a list of words and punctuation symbols. |
source code
|
|
list of (list of str)
|
sents(self,
files=None,
categories=None)
Returns:
the given file or files as a list of sentences or utterances, each
encoded as a list of word strings. |
source code
|
|
list of (list of (list of
str))
|
paras(self,
files=None,
categories=None)
Returns:
the given file or files as a list of paragraphs, each encoded as a
list of sentences, which are in turn encoded as lists of word
strings. |
source code
|
|
list of (str,str)
|
tagged_words(self,
files=None,
categories=None,
simplify_tags=False)
Returns:
the given file or files as a list of tagged words and punctuation
symbols, encoded as tuples (word,tag). |
source code
|
|
list of (list of (str,str))
|
tagged_sents(self,
files=None,
categories=None,
simplify_tags=False)
Returns:
the given file or files as a list of sentences, each encoded as a
list of (word,tag) tuples. |
source code
|
|
list of (list of (list of
(str,str)))
|
tagged_paras(self,
files=None,
categories=None,
simplify_tags=False)
Returns:
the given file or files as a list of paragraphs, each encoded as a
list of sentences, which are in turn encoded as lists of
(word,tag) tuples. |
source code
|
|
|
Inherited from api.CategorizedCorpusReader:
categories,
files
Inherited from api.CorpusReader:
__repr__,
abspath,
abspaths,
encoding,
open
Inherited from object:
__delattr__,
__getattribute__,
__hash__,
__new__,
__reduce__,
__reduce_ex__,
__setattr__,
__str__
|
|
Inherited from TaggedCorpusReader:
read,
tagged,
tokenized
|
|
Inherited from api.CorpusReader:
filenames
|