Package nltk :: Package corpus :: Package reader :: Module propbank :: Class PropbankCorpusReader
[hide private]
[frames] | no frames]

Class PropbankCorpusReader

source code

      object --+    
               |    
api.CorpusReader --+
                   |
                  PropbankCorpusReader

Corpus reader for the propbank corpus, which augments the Penn Treebank with information about the predicate argument structure of every verb instance. The corpus consists of two parts: the predicate-argument annotations themselves, and a set of frameset files which define the argument labels used by the annotations, on a per-verb basis. Each frameset file contains one or more predicates, such as 'turn' or 'turn_on', each of which is divided into coarse-grained word senses called rolesets. For each roleset, the frameset file provides descriptions of the argument roles, along with examples.

Instance Methods [hide private]
 
__init__(self, root, propfile, framefiles='', verbsfile=None, parse_filename_xform=None, parse_corpus=None, encoding=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
raw(self, files=None)
Returns: the text contents of the given files, as a single string.
source code
 
instances(self)
Returns: a corpus view that acts as a list of PropbankInstance objects, one for each verb in the corpus.
source code
 
lines(self)
Returns: a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.
source code
 
roleset(self, roleset_id)
Returns: the xml description for the given roleset.
source code
 
verbs(self)
Returns: a corpus view that acts as a list of all verb lemmas in this corpus (from the verbs.txt file).
source code
 
_read_instance_block(self, stream) source code

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, files, open

Inherited from api.CorpusReader (private): _get_root

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

    Deprecated since 0.9.1

Inherited from api.CorpusReader: filenames

Inherited from api.CorpusReader (private): _get_items

Instance Variables [hide private]

Inherited from api.CorpusReader (private): _encoding, _files, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Inherited from object: __class__

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Method Details [hide private]

__init__(self, root, propfile, framefiles='', verbsfile=None, parse_filename_xform=None, parse_corpus=None, encoding=None)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Parameters:
  • root - The root directory for this corpus.
  • propfile - The name of the file containing the predicate- argument annotations (relative to root).
  • framefiles - A list or regexp specifying the frameset files for this corpus.
  • parse_filename_xform - A transform that should be applied to the filenames in this corpus. This should be a function of one argument (a filename) that returns a string (the new filename).
  • parse_corpus - The corpus containing the parse trees corresponding to this corpus. These parse trees are necessary to resolve the tree pointers used by propbank.
Overrides: api.CorpusReader.__init__

raw(self, files=None)

source code 
Returns:
the text contents of the given files, as a single string.

instances(self)

source code 
Returns:
a corpus view that acts as a list of PropbankInstance objects, one for each verb in the corpus.

lines(self)

source code 
Returns:
a corpus view that acts as a list of strings, one for each line in the predicate-argument annotation file.

roleset(self, roleset_id)

source code 
Returns:
the xml description for the given roleset.

verbs(self)

source code 
Returns:
a corpus view that acts as a list of all verb lemmas in this corpus (from the verbs.txt file).