A reader for plaintext corpora whose documents are divided into
categories based on their file identifiers.
|
|
|
|
str
|
raw(self,
files=None,
categories=None)
Returns:
the given file or files as a single string. |
source code
|
|
list of str
|
words(self,
files=None,
categories=None)
Returns:
the given file or files as a list of words and punctuation symbols. |
source code
|
|
list of (list of str )
|
sents(self,
files=None,
categories=None)
Returns:
the given file or files as a list of sentences or utterances, each
encoded as a list of word strings. |
source code
|
|
list of (list of (list of
str ))
|
paras(self,
files=None,
categories=None)
Returns:
the given file or files as a list of paragraphs, each encoded as a
list of sentences, which are in turn encoded as lists of word
strings. |
source code
|
|
Inherited from api.CategorizedCorpusReader :
categories ,
files
Inherited from api.CorpusReader :
__repr__ ,
abspath ,
abspaths ,
encoding ,
open
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__str__
|
Inherited from PlaintextCorpusReader :
read ,
tokenized
|
Inherited from api.CorpusReader :
filenames
|