Home | Trees | Indices | Help |
|
---|
|
object --+ | CorpusReader
A base class for corpus reader classes, each of which can be used to
read a specific corpus format. Each individual corpus reader instance is
used to read a specific corpus, consisting of one or more files under a
common root directory. Each file is identified by its file
identifier
, which is the relative path to the file from the root
directory.
A separate subclass is be defined for each corpus format. These
subclasses define one or more methods that provide 'views' on the corpus
contents, such as words()
(for a list of words) and
parsed_sents()
(for a list of parsed sentences). Called
with no arguments, these methods will return the contents of the entire
corpus. For most corpora, these methods define one or more selection
arguments, such as files
or categories
, which
can be used to select which portion of the corpus should be returned.
|
|||
|
|||
|
|||
|
|||
PathPointer |
|
||
list of PathPointer
|
|
||
|
|||
|
|||
|
|||
Inherited from |
|||
Deprecated since 0.9.1 | |||
---|---|---|---|
|
|||
|
|
|||
_encoding The default unicode encoding for the files that make up this corpus. |
|||
_files A list of the relative paths for the files that make up this corpus. |
|||
_root The root directory for this corpus. |
|
|||
PathPointer |
root The directory where this corpus is stored. |
||
Inherited from |
|||
Deprecated since 0.9.1 | |||
---|---|---|---|
items |
|
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
|
repr(x)
|
Deprecated: Use corpus.files() instead |
Return the absolute path for the given file.
|
Return a list of the absolute paths for all files in this corpus; or for the given list of files, if specified.
|
Return the unicode encoding for the given corpus file, if known. If
the encoding is unknown, or if the given file should be processed using
byte strings ( |
Deprecated: Use corpus.abspaths() instead |
Return an open stream that can be used to read the given file. If the
file's encoding is not
|
|
_encodingThe default unicode encoding for the files that make up this corpus.
If |
|
items
Deprecated: Use corpus.files() instead |
rootThe directory where this corpus is stored.
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0beta1 on Wed Aug 27 15:08:52 2008 | http://epydoc.sourceforge.net |