Package nltk :: Package corpus :: Package reader :: Module verbnet :: Class VerbnetCorpusReader
[hide private]
[frames] | no frames]

Class VerbnetCorpusReader

source code

         object --+        
                  |        
   api.CorpusReader --+    
                      |    
xmldocs.XMLCorpusReader --+
                          |
                         VerbnetCorpusReader

Instance Methods [hide private]
 
__init__(self, root, files, wrap_etree=False) source code
 
lemmas(self, classid=None)
Return a list of all verb lemmas that appear in any class, or in the classid if specified.
source code
 
wordnetids(self, classid=None)
Return a list of all wordnet identifiers that appear in any class, or in classid if specified.
source code
 
classids(self, lemma=None, wordnetid=None, fileid=None, classid=None)
Return a list of the verbnet class identifiers.
source code
 
vnclass(self, fileid_or_classid)
Return an ElementTree containing the xml for the specified verbnet class.
source code
 
files(self, vnclass_ids=None)
Return a list of files that make up this corpus.
source code

Inherited from xmldocs.XMLCorpusReader: raw, xml

Inherited from api.CorpusReader: __repr__, abspath, abspaths, encoding, open

Inherited from api.CorpusReader (private): _get_root

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

    Index Initialization
 
_index(self)
Initialize the indexes _lemma_to_class, _wordnet_to_class, and _class_to_fileid by scanning through the corpus files.
source code
 
_index_helper(self, xmltree, fileid)
Helper for _index()
source code
 
_quick_index(self)
Initialize the indexes _lemma_to_class, _wordnet_to_class, and _class_to_fileid by scanning through the corpus files.
source code
    Identifier conversion
 
longid(self, shortid)
Given a short verbnet class identifier (eg '37.10'), map it to a long id (eg 'confess-37.10').
source code
 
shortid(self, longid)
Given a long verbnet class identifier (eg 'confess-37.10'), map it to a short id (eg '37.10').
source code
    Pretty Printing
 
pprint(self, vnclass)
Return a string containing a pretty-printed representation of the given verbnet class.
source code
 
pprint_subclasses(self, vnclass, indent='')
Return a string containing a pretty-printed representation of the given verbnet class's subclasses.
source code
 
pprint_members(self, vnclass, indent='')
Return a string containing a pretty-printed representation of the given verbnet class's member verbs.
source code
 
pprint_themroles(self, vnclass, indent='')
Return a string containing a pretty-printed representation of the given verbnet class's thematic roles.
source code
 
pprint_frame(self, vnframe, indent='')
Return a string containing a pretty-printed representation of the given verbnet frame.
source code
 
pprint_description(self, vnframe, indent='')
Return a string containing a pretty-printed representation of the given verbnet frame description.
source code
 
pprint_syntax(self, vnframe, indent='')
Return a string containing a pretty-printed representation of the given verbnet frame syntax.
source code
 
pprint_semantics(self, vnframe, indent='')
Return a string containing a pretty-printed representation of the given verbnet frame semantics.
source code
    Deprecated since 0.8

Inherited from xmldocs.XMLCorpusReader: read

    Deprecated since 0.9.1

Inherited from api.CorpusReader: filenames

Inherited from api.CorpusReader (private): _get_items

Class Variables [hide private]
  _LONGID_RE = re.compile(r'([^-\.]*)-([\d\+\.-]+)$')
Regular expression that matches (and decomposes) longids
  _SHORTID_RE = re.compile(r'[\d\+\.-]+$')
Regular expression that matches shortids
  _INDEX_RE = re.compile(r'<MEMBER name="\??([^"]+)" wn="([^"]*)...
Regular expression used by _index() to quickly scan the corpus for basic information.
Instance Variables [hide private]
  _lemma_to_class
A dictionary mapping from verb lemma strings to lists of verbnet class identifiers.
  _wordnet_to_class
A dictionary mapping from wordnet identifier strings to lists of verbnet class identifiers.
  _class_to_fileid
A dictionary mapping from class identifiers to corresponding file identifiers.

Inherited from api.CorpusReader (private): _encoding, _files, _root

Properties [hide private]

Inherited from api.CorpusReader: root

Inherited from object: __class__

    Deprecated since 0.9.1

Inherited from api.CorpusReader: items

Method Details [hide private]

__init__(self, root, files, wrap_etree=False)
(Constructor)

source code 
Overrides: xmldocs.XMLCorpusReader.__init__

classids(self, lemma=None, wordnetid=None, fileid=None, classid=None)

source code 

Return a list of the verbnet class identifiers. If a file identifier is specified, then return only the verbnet class identifiers for classes (and subclasses) defined by that file. If a lemma is specified, then return only verbnet class identifiers for classes that contain that lemma as a member. If a wordnetid is specified, then return only identifiers for classes that contain that wordnetid as a member. If a classid is specified, then return only identifiers for subclasses of the specified verbnet class.

vnclass(self, fileid_or_classid)

source code 

Return an ElementTree containing the xml for the specified verbnet class.

Parameters:
  • fileid_or_classid - An identifier specifying which class should be returned. Can be a file identifier (such as 'put-9.1.xml'), or a verbnet class identifier (such as 'put-9.1') or a short verbnet class identifier (such as '9.1').

files(self, vnclass_ids=None)

source code 

Return a list of files that make up this corpus. If vnclass_ids is specified, then return the files that make up the specified verbnet class(es).

Overrides: api.CorpusReader.files

_index(self)

source code 

Initialize the indexes _lemma_to_class, _wordnet_to_class, and _class_to_fileid by scanning through the corpus files. This is fast with cElementTree (<0.1 secs), but quite slow (>10 secs) with the python implementation of ElementTree.

_quick_index(self)

source code 

Initialize the indexes _lemma_to_class, _wordnet_to_class, and _class_to_fileid by scanning through the corpus files. This doesn't do proper xml parsing, but is good enough to find everything in the standard verbnet corpus -- and it runs about 30 times faster than xml parsing (with the python ElementTree; only 2-3 times faster with cElementTree).

longid(self, shortid)

source code 

Given a short verbnet class identifier (eg '37.10'), map it to a long id (eg 'confess-37.10'). If shortid is already a long id, then return it as-is

shortid(self, longid)

source code 

Given a long verbnet class identifier (eg 'confess-37.10'), map it to a short id (eg '37.10'). If longid is already a short id, then return it as-is.

pprint(self, vnclass)

source code 

Return a string containing a pretty-printed representation of the given verbnet class.

Parameters:
  • vnclass - A verbnet class identifier; or an ElementTree containing the xml contents of a verbnet class.

pprint_subclasses(self, vnclass, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet class's subclasses.

Parameters:
  • vnclass - A verbnet class identifier; or an ElementTree containing the xml contents of a verbnet class.

pprint_members(self, vnclass, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet class's member verbs.

Parameters:
  • vnclass - A verbnet class identifier; or an ElementTree containing the xml contents of a verbnet class.

pprint_themroles(self, vnclass, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet class's thematic roles.

Parameters:
  • vnclass - A verbnet class identifier; or an ElementTree containing the xml contents of a verbnet class.

pprint_frame(self, vnframe, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet frame.

Parameters:
  • vnframe - An ElementTree containing the xml contents of a verbnet frame.

pprint_description(self, vnframe, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet frame description.

Parameters:
  • vnframe - An ElementTree containing the xml contents of a verbnet frame.

pprint_syntax(self, vnframe, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet frame syntax.

Parameters:
  • vnframe - An ElementTree containing the xml contents of a verbnet frame.

pprint_semantics(self, vnframe, indent='')

source code 

Return a string containing a pretty-printed representation of the given verbnet frame semantics.

Parameters:
  • vnframe - An ElementTree containing the xml contents of a verbnet frame.

Class Variable Details [hide private]

_INDEX_RE

Regular expression used by _index() to quickly scan the corpus for basic information.

Value:
re.compile(r'<MEMBER name="\??([^"]+)" wn="([^"]*)"/?>|VNSUBCLASS ID="\
([^"]+)"/?>')

Instance Variable Details [hide private]

_class_to_fileid

A dictionary mapping from class identifiers to corresponding file identifiers. The keys of this dictionary provide a complete list of all classes and subclasses.