Package nltk :: Package corpus :: Package reader :: Module ycoe :: Class YCOECorpusReader
[hide private]
[frames] | no frames]

Class YCOECorpusReader

source code

      object --+    
               |    
api.CorpusReader --+
                   |
                  YCOECorpusReader

Corpus reader for the York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE), a 1.5 million word syntactically-annotated corpus of Old English prose texts.

Instance Methods [hide private]
 
__init__(self, root, encoding=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
documents(self, files=None)
Return a list of document identifiers for all documents in this corpus, or for the documents with the given file(s) if specified.
source code
 
files(self, documents=None)
Return a list of file identifiers for the files that make up this corpus, or that store the given document(s) if specified.
source code
 
_getfiles(self, documents, subcorpus)
Helper that selects the appropraite files for a given set of documents from a given subcorpus (pos or psd).
source code
 
words(self, documents=None) source code
 
sents(self, documents=None) source code
 
paras(self, documents=None) source code
 
tagged_words(self, documents=None) source code
 
tagged_sents(self, documents=None) source code
 
tagged_paras(self, documents=None) source code
 
parsed_sents(self, documents=None) source code

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, open

Inherited from api.CorpusReader (private): _get_root

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

    Deprecated since 0.8
 
read(*args, **kwargs) source code
 
parsed(*args, **kwargs) source code
 
tokenized(*args, **kwargs) source code
 
tagged(*args, **kwargs) source code
 
chunked(*args, **kwargs) source code
    Deprecated since 0.9.1

Inherited from api.CorpusReader: filenames

Inherited from api.CorpusReader (private): _get_items

Instance Variables [hide private]

Inherited from api.CorpusReader (private): _encoding, _files, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Inherited from object: __class__

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Method Details [hide private]

__init__(self, root, encoding=None)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Parameters:
  • root - A path pointer identifying the root directory for this corpus. If a string is specified, then it will be converted to a PathPointer automatically.
  • files - A list of the files that make up this corpus. This list can either be specified explicitly, as a list of strings; or implicitly, as a regular expression over file paths. The absolute path for each file will be constructed by joining the reader's root to each file name.
  • encoding - The default unicode encoding for the files that make up the corpus. encoding's value can be any of the following:
    • A string: encoding is the encoding name for all files.
    • A dictionary: encoding[file_id] is the encoding name for the file whose identifier is file_id. If file_id is not in encoding, then the file contents will be processed using non-unicode byte strings.
    • A list: encoding should be a list of (regexp, encoding) tuples. The encoding for a file whose identifier is file_id will be the encoding value for the first tuple whose regexp matches the file_id. If no tuple's regexp matches the file_id, the file contents will be processed using non-unicode byte strings.
    • None: the file contents of all files will be processed using non-unicode byte strings.
  • tag_mapping_function - A function for normalizing or simplifying the POS tags returned by the tagged_words() or tagged_sents() methods.
Overrides: api.CorpusReader.__init__
(inherited documentation)

files(self, documents=None)

source code 

Return a list of file identifiers for the files that make up this corpus, or that store the given document(s) if specified.

Overrides: api.CorpusReader.files

read(*args, **kwargs)

source code 
Decorators:
  • @deprecated("Use .raw() or .words() or .tagged_words() or " ".parsed_sents() instead.")

Deprecated: Use .raw() or .words() or .tagged_words() or .parsed_sents() instead.

parsed(*args, **kwargs)

source code 
Decorators:
  • @deprecated("Use .parsed_sents() instead.")

Deprecated: Use .parsed_sents() instead.

tokenized(*args, **kwargs)

source code 
Decorators:
  • @deprecated("Use .words() instead.")

Deprecated: Use .words() instead.

tagged(*args, **kwargs)

source code 
Decorators:
  • @deprecated("Use .tagged_words() instead.")

Deprecated: Use .tagged_words() instead.

chunked(*args, **kwargs)

source code 
Decorators:
  • @deprecated("Operation no longer supported.")

Deprecated: Operation no longer supported.