Package nltk :: Package corpus :: Package reader :: Module bnc :: Class BNCWordView
[hide private]
[frames] | no frames]

Class BNCWordView

source code

               object --+            
                        |            
util.AbstractLazySequence --+        
                            |        
  util.StreamBackedCorpusView --+    
                                |    
            xmldocs.XMLCorpusView --+
                                    |
                                   BNCWordView

A stream backed corpus view specialized for use with the BNC corpus.

Instance Methods [hide private]
 
__init__(self, filename, sent, tag, strip_space, stem)
Create a new corpus view based on a specified XML file.
source code
 
handle_header(self, elt, context) source code
 
handle_elt(self, elt, context)
Convert an element into an appropriate value for inclusion in the view.
source code
 
handle_word(self, elt) source code
 
handle_sent(self, elt) source code

Inherited from xmldocs.XMLCorpusView: read_block

Inherited from util.StreamBackedCorpusView: __add__, __getitem__, __len__, __mul__, __radd__, __rmul__, close, iterate_from

Inherited from util.StreamBackedCorpusView (private): _open

Inherited from util.AbstractLazySequence: __cmp__, __contains__, __hash__, __iter__, __repr__, count, index

Inherited from object: __delattr__, __getattribute__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Variables [hide private]
  title = None
Title of the document.
  author = None
Author of the document.
  editor = None
Editor
  resps = None
Statement of responsibility

Inherited from util.AbstractLazySequence (private): _MAX_REPR_SIZE

Instance Variables [hide private]

Inherited from xmldocs.XMLCorpusView (private): _tag_context, _tagspec

Properties [hide private]

Inherited from util.StreamBackedCorpusView: filename

Inherited from object: __class__

Method Details [hide private]

__init__(self, filename, sent, tag, strip_space, stem)
(Constructor)

source code 

Create a new corpus view based on a specified XML file.

Note that the XMLCorpusView constructor does not take an encoding argument, because the unicode encoding is specified by the XML files themselves.

Parameters:
  • filename - The name of the underlying file.
  • sent - If true, include sentence bracketing.
  • tag - The name of the tagset to use, or None for no tags.
  • strip_space - If true, strip spaces from word tokens.
  • stem - If true, then substitute stems for words.
Overrides: xmldocs.XMLCorpusView.__init__

handle_elt(self, elt, context)

source code 

Convert an element into an appropriate value for inclusion in the view. Unless overridden by a subclass or by the elt_handler constructor argument, this method simply returns elt.

Parameters:
  • elt - The element that should be converted.
  • context - A string composed of element tags separated by forward slashes, indicating the XML context of the given element. For example, the string 'foo/bar/baz' indicates that the element is a baz element whose parent is a bar element and whose grandparent is a top-level foo element.
Returns:
The view value corresponding to elt.
Overrides: xmldocs.XMLCorpusView.handle_elt
(inherited documentation)