Package nltk :: Package corpus :: Package reader :: Module bnc :: Class BNCWordView
Class BNCWordView

               object --+            
util.AbstractLazySequence --+        
  util.StreamBackedCorpusView --+    
            xmldocs.XMLCorpusView --+

A stream backed corpus view specialized for use with the BNC corpus.

Instance Methods [hide private]
__init__(self, filename, sent, tag, strip_space, stem)
Create a new corpus view based on a specified XML file.
handle_header(self, elt, context) source code
handle_elt(self, elt, context)
Convert an element into an appropriate value for inclusion in the view.
handle_word(self, elt) source code
handle_sent(self, elt) source code

Class Variables [hide private]
  title = None
Title of the document.
  author = None
Author of the document.
  editor = None
  resps = None
Statement of responsibility

Instance Variables [hide private]

Properties [hide private]

Method Details [hide private]

__init__(self, filename, sent, tag, strip_space, stem)

Create a new corpus view based on a specified XML file.

Note that the XMLCorpusView constructor does not take an encoding argument, because the unicode encoding is specified by the XML files themselves.

  • filename - The name of the underlying file.
  • sent - If true, include sentence bracketing.
  • tag - The name of the tagset to use, or None for no tags.
  • strip_space - If true, strip spaces from word tokens.
  • stem - If true, then substitute stems for words.
Overrides: xmldocs.XMLCorpusView.__init__

handle_elt(self, elt, context)

Convert an element into an appropriate value for inclusion in the view. Unless overridden by a subclass or by the elt_handler constructor argument, this method simply returns elt.

  • elt - The element that should be converted.
  • context - A string composed of element tags separated by forward slashes, indicating the XML context of the given element. For example, the string 'foo/bar/baz' indicates that the element is a baz element whose parent is a bar element and whose grandparent is a top-level foo element.
The view value corresponding to elt.
Overrides: xmldocs.XMLCorpusView.handle_elt
