rxpop -- An RXP SAX driver for PyXML

Contents:

Back to top

What Is rxpop?

rxpop is a SAX2 driver for PyXML. It is implemented on top of the RXP XML parser.

rxpop exposes the standard Python SAX2 interface and call-backs plus a few additional capabilities, which are described below.

rxpop combines these two advantages:

You can learn more about RXP at http://www.cogsci.ed.ac.uk/~richard/rxp.html.

Back to top

What's missing from rxpop

These are a few features that are missing from this implementation of the rxpop driver. Before you begin, you might want to check this list of what is not implemented:

Don't forget that it's all Open Source. If something you need is missing, you can add it yourself. And, you can also contact me for help.

Back to top

License

RXP itself is covered by the GPL (GNU GENERAL PUBLIC LICENSE). For code specific to rxpop, I have used a less restrictive license (see below). I am grateful to the implementors of RXP and hope that you will respect their wishes and license.

Note that although I have used a very liberal license (below), RXP itself is under the GPL. You should consult your own lawyer about the implications of that, if you believe that might cause problems. Basically I am saying "IANAL" (I am not a lawyer).

Here is the license for the code I have added:

Copyright (c) 2002 Dave Kuhlman

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Back to top

Installation

See the file README.rxpop. Basically, you will need to do the following:

  1. Download and build and install RXP. You can find it http://www.cogsci.ed.ac.uk/~richard/rxp.html.
  2. Change directory to the PyXML directory and unzip the rxpop distribution. Use something like the following:

    NOTE! The following replaces setup.py. If you have modified that file or have a version other than PyXML-0.7, you may want to patch in the rxpop stanza instead.

    You will need to modify include_dirs and library_dirs in the rxpop stanza to reflect the location where you installed RXP.

        cd /Python/PyXML-0.7
        unzip ../rxpop.zip
    
  3. Build and install rxpop with the regular Distutils commands:
        python setup.py build
        python setup.py install
    
Back to top

Calling and Using rxpop

Use that standard Python SAX2 interface. Here is an example:

    class ContentHandler:
        def __init__(self, parser):
            self.parser = parser
        def startElement(self, name, attrs):
            print 'element: %s' % name

    class ErrorHandler:
        def __init__(self, parser):
            self.parser = parser
        def warning(self, msg):
            print '*** (ErrorHandler.warning) msg:', msg
        def error(self, msg):
            print '*** (ErrorHandler.error) msg:', msg
        def fatalError(self, msg):
            print '*** (ErrorHandler.fatalError) msg:', msg

    def test(inFileName):
        parser = xml.sax.make_parser('xml.sax.drivers2.drv_rxpop')
        handler = ContentHandler(parser)
        errorHandler = ErrorHandler(parser)
        parser.setContentHandler(handler)
        parser.setErrorHandler(errorHandler)
        parser.setParserFlag('NoNoDTDWarning', 1)
        parser.setParserFlag('Validate', 1)
        parser.parse(inFileName)

There are a few extensions not in the standard interface. See below.

Back to top

Extensions

rxpop provides the following functions not provided by the standard SAX2 interface.

activateEventHandler

Turn a SAX2 call-back off or on. After turning a call-back off, the content handler will not receive events for that call-back.

Prototype:

    parser.activateEventHandler(callback_name, flag)

Where:

Example:

    import xml.sax

    class ContentHandler:
        def __init__(self, parser):
            self.parser = parser
            self.target = ''
        def startDocument(self):
            self.parser.activateEventHandler('characters', 0)
        def startElement(self, name, attrs):
            if name == 'target':
                self.parser.activateEventHandler('characters', 1)
        def endElement(self, name, attrs):
            if name == 'target':
                self.parser.activateEventHandler('characters', 0)
        def characters(self, data):
            self.target += data

    def test(inFileName):
        parser = xml.sax.make_parser('xml.sax.drivers2.drv_rxpop')
        handler = ContentHandler(parser)
        parser.setContentHandler(handler)
        parser.parse(inFileName)

getLocation

Return a tuple with two elements: line number and column number.

Prototype:

    lineNum, columnNum = parser.getLocation()

Note that the location mechanism defined by SAX2 is also available. That is, if the content handler contains a method setDocumentLocator, then that method will be called before any other event handlers (call-backs) are called. It will be passed a DocumentLocator object. The DocumentLocator object has one method: getLocation, which if called during the parse, returns a tuple containing the line number and column number.

setParserFlag

Set (or unset) an RXP flag. See the RXP documentation for information on RXP flags, their names, and their appropriate values.

Prototype:

    parser.setParserFlag(flagName, flag)

Where:

Back to top

Additional Information

You can learn more about RXP at http://www.cogsci.ed.ac.uk/~richard/rxp.html.

More information about PyXML is at http://www.python.org/sigs/xml-sig/.

Back to top

Last update: 4/23/02

Dave Kuhlman
[email protected]
http://www.rexx.com/~dkuhlman