Package nltk :: Module util :: Class HTMLCleaner
[hide private]
[frames] | no frames]

Class HTMLCleaner

source code

markupbase.ParserBase --+    
                        |    
    HTMLParser.HTMLParser --+
                            |
                           HTMLCleaner

Instance Methods [hide private]
 
__init__(self)
Initialize and reset this instance.
source code
 
handle_data(self, d) source code
 
handle_starttag(self, tag, attrs) source code
 
handle_endtag(self, tag) source code
 
clean_text(self) source code

Inherited from HTMLParser.HTMLParser: check_for_whole_start_tag, clear_cdata_mode, close, error, feed, get_starttag_text, goahead, handle_charref, handle_comment, handle_decl, handle_entityref, handle_pi, handle_startendtag, parse_endtag, parse_pi, parse_starttag, reset, set_cdata_mode, unescape, unknown_decl

Inherited from markupbase.ParserBase: getpos, parse_comment, parse_declaration, parse_marked_section, updatepos

Inherited from markupbase.ParserBase (private): _parse_doctype_attlist, _parse_doctype_element, _parse_doctype_entity, _parse_doctype_notation, _parse_doctype_subset, _scan_name

Class Variables [hide private]

Inherited from HTMLParser.HTMLParser: CDATA_CONTENT_ELEMENTS

Inherited from markupbase.ParserBase (private): _decl_otherchars

Method Details [hide private]

__init__(self)
(Constructor)

source code 

Initialize and reset this instance.

Overrides: HTMLParser.HTMLParser.__init__
(inherited documentation)

handle_data(self, d)

source code 
Overrides: HTMLParser.HTMLParser.handle_data

handle_starttag(self, tag, attrs)

source code 
Overrides: HTMLParser.HTMLParser.handle_starttag

handle_endtag(self, tag)

source code 
Overrides: HTMLParser.HTMLParser.handle_endtag