Gnosis and generateDS - Analysis, Comparison, and Evaluation

Dave Kuhlman

http://www.rexx.com/~dkuhlman
Email: [email protected]

Aug 23, 2002

Front Matter

Abstract:

This paper compares the intention and use of several of the capabilities in the Gnosis/objectify library with generateDS.py.

This paper is concerned mainly with the objectify module in the Gnosis library.

1 Introduction - What It Does

[Note: The Gnosis library actually has several modules, however, in this paper we discuss only the objectify module.]

Gnosis/objectify -- Gnosis/objectify creates native Python data structures from an XML document. The Python programmer can then extract information from the XML document using Python data structures that are meaningful and that follow the structure of the XML document.

generateDS.py -- Given an XSchema definition of an XML document type, generateDS.py generates Python classes whose instances can represent elements in documents of that type. generateDS.py also generates a parser that can read and parse a document of the given type.

Our focus in this paper is on the ability of these two technologies to enable us to process XML documents using native Python data structures that are more convenient, usable, and meaningful than those offered by DOM.

In general, we shall be comparing two different approaches to this problem:

Gnosis objectify creates Python classes on the fly, without source code for those classes, as it loads an XML document. It will also reuse classes defined by the user to represent XML elements.
generateDS.py generates Python source code for the elements in an XML document (type) from an XSchema definition of that document type. This source code must be generated before an XML document (of that type) can be parsed. The source code for a specific document type can then be edited and extended with application specific capabilities.

2 How to Use It

2.1 Gnosis/objectify Mini-How-to

David Mertz, the implementor of Gnosis/objectify provides excellent documentation and commentary in his articles. Refer to the ``See Also'' section, below.

Here is a summary.

Import Gnosis/objectify. On my machine, I've installed the Gnosis library, so the following does it:
```
>>> import gnosis.xml.objectify
```

Create a Gnosis/objectify XML object:

>>> xml_obj = gnosis.xml.objectify.XML_Objectify('people.xml')

And from that, create a Python object:
```
>>> py_obj = xml_obj.make_instance()
```

Now we are ready to inspect and manipulate this object. For example:

>>> # Get the first person element inside the root element.
>>> person = py_obj.person[0]
>>>
>>> # Get the ``ratio'' attribute of the person element.
>>> print person.ratio
3.2
>>>
>>> # Get the ``name'' element in the person element
>>> print person.name
<gnosis.xml.objectify._objectify._XO_name instance at 0x8253914>
>>>
>>> # Get the characters in the name element.
>>> print person.name.PCDATA
Alberta

2.2 generateDS.py Mini-How-to

David Kuhlman, the implementor of generateDS.py provides documentation at his Web site. This documentation is included in the generateDS.py distribution, which is also available at that site. Refer to the ``See Also'' section, below.

Here is a summary. Steps toward using generateDS.py:

Create an XSchema definition of the XML document type that you wish to process.
Process this XSchema definition with generateDS.py, which will generate Python class definitions and, optionally, subclass definitions.
Add your application specific code to the subclasses.
Modify the import statement at the top of the subclass file so that it imports the file containing the superclasses.
Modify the main function in the subclass file to suit your needs.
You should now be able to parse and process XML documents by running the subclass file with Python.

3 Uses and Applications

This section describes several possible uses for these two technologies.

3.1 Loading and Using Configuration Files

Both tools seem suitable for reading and writing an XML document used as a configuration file.

A few notes:

Gnosis/objectify does not automatically provide a capability to re-write an XML document to disk. Adding this capability for your config file is easy, but you will have to do it yourself. No definition of your XML config file is needed.
generateDS.py, as always, requires you to produce an XML Schema definition of your XML config file. As a reward for your effort, however, you do get a free export method that re-writes the data structures to a file.

3.2 Transformations on XML

Both Gnosis/objectify and generateDS.py seem very appropriate tools for performing transformations on XML documents.

3.2.1 Transformation with Gnosis/objectify

Two approaches seem appropriate:

Implement tree walking code that generates the output. Here is a very simple example:

xml_obj = gnosis.xml.objectify.XML_Objectify(inFileName)
people = xml_obj.make_instance()
for person in people.person:
    print 'Person: %s' % person.name.PCDATA

Create a class for each element type in the document. Add a method to each class that generates output and possibly calls the same method in child classes for nested XML elements. Here is a code snippet that shows how to do this:

class _XO_people:
    def export(self):
        for person in self.person:
            person.export()

class _XO_person:
    def export(self):
        print 'Person:'
        print '    Name: %s' % self.name.PCDATA)
        showLevel(ostrm, level)

def generate(inFileName):
    # Put our classes in the xml.objectify namespace.
    gnosis.xml.objectify._XO_people = _XO_people
    gnosis.xml.objectify._XO_person = _XO_person
    xml_obj = gnosis.xml.objectify.XML_Objectify(inFileName)
    root = xml_obj.make_instance()
    root.export()

Note the two assignment statements before ``objectifying'' the XML object. These put our classes into the xml.objectify namespace.

A couple things to be aware of:

If the content of an element is empty, then there will be no PCDATA attribute in the corresponding Python instance. Consider, for example, the following XML content:
```
<person>
  <description></description>
</person>
```
The Python instance for the ``description'' element, will have no member ``PCDATA''.
You can use a try: except: block to check for it.
If you are in the habit of inspecting objects (e.g. using dir(obj)), you may do a double-take on fact that the attribute for a child element sometimes contains a list and sometimes contains a single item, not a list with a single item. And, yet, whether list or single item, it responds correctly to list access protocol (e.g. len(people.person) and for item in people.person:). How can this be? The answer is that these instances are members of the subclass of a class that implements both the __len__ and __getitem__ methods.

3.2.2 Transformation with generateDS.py

Here are several strategies you can use for implementing XML transformations using generateDS.py:

Modify an existing (generated) method -- Steps:
1. Generate the class file. This file will contain an export method in each generated class.
2. Modify this export method to provide the transformation.
3. Un-comment the call to the export method in the main method.
Write a new method -- Steps:
1. Generate both the class and subclass files.
2. In the subclass file, in each subclass, add a method that writes out the transformed XML element. The export methods in the generated class file can serve as examples.
3. Add a call to this new method of the root object in the main method.

It may be helpful to write a simple tree walk function that collects the instances that represent specific XML elements in a dictionary. For example, if all elements of a certain type have an ``id'' attribute or a ``name'' attribute (and the name is unique), then you might collect a dictionary whose keys are the IDs or the names. This will enable your transformation methods to look up instances (elements) that are not directly connected to the element currently being processed.

4 Comparisons

4.1 Commonalities

How Gnosis/objectify and generateDS.py are similar:

Both Gnosis/objectify and generateDS.py enable us to load XML documents into native Python data structures.
Both technologies represent XML elements as instances of Python classes.
With both technologies, ``natural naming'' is used, i.e. the attributes of the classes that represent XML elements are the same as the names of the element's attributes and children.

4.2 Contrasts and differences

How Gnosis/objectify and generateDS.py are different:

generateDS.py requires a definition for each XML document type. Gnosis/objectify does not.
Code generated by generateDS.py applies to (is usable on) a single XML document type. Since no code is generated by Gnosis/objectify, this is not true. However, user written code the employs Gnosis/objectify is likely to be applicable to a single document type.
Gnosis/objectify can capture and reuse classes written in advance. (The classes must be inserted into the namespace of gnosis.xml.objectify.) The closest that generateDS.py comes to doing this is that it generates subclass stubs, where the user can add previously written code (Copy and paste sounds a bit clumsy, doesn't it?), or can add a mix-in class to the list of superclasses of specific subclasses.
Gnosis/objectify does not automatically provide the ability to write the structures loaded from an XML document back out to a file. If you need this capability, you will have to write it yourself, although Gnosis/objectify makes doing so very easy. generateDS.py provides an export method in each generated class. Of course, you'll have to provide the XSchema definition in order to produce that generated code.
Gnosis/objectify most likely adapts more easily to changes in the definition of the XML document. For example, if a new child element is added to an existing element, with Gnosis/objectify, adding a bit of code to handle that new child element may be all that is needed. In contrast, the user of generateDS.py must either (1) modify the XSchema definition of the XML document and re-generate the source code or (2) edit the (previously) generated source code by hand. Editing the generated source code by hand is not that difficult, however, doing so means that in the future, re-generating from the XSchema definition is likely to produce differences and errors. On the positive side, the recommended place to put user code is in the subclasses generated by generateDS.py, which enables the user to re-generate (superclass) code without losing user added code.

4.3 Limitations and Restrictions

Both technologies, like DOM, load the entire document into memory. Therefore, neither is suitable for ``very large'' XML documents. You will have to decide how large ``very large'' is.

Both technologies are Python solutions. Python must be installed in order to use them. In addition, both require installation of PyXML, the standard XML support package for Python.

5 Summary

Both technologies make processing XML documents in Python exceptionally easy to do. Both are a step up from DOM. Both make it easy to add application specific code. And, the application specific code is likely to be more meaningful and readable than equivalent code written for use with DOM.

About this document ...

Gnosis and generateDS - Analysis, Comparison, and Evaluation, Aug 23, 2002

This document was generated using the LaTeX2HTML translator.

LaTeX2HTML is Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds, and Copyright © 1997, 1998, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The application of LaTeX2HTML to the Python documentation has been heavily tailored by Fred L. Drake, Jr. Original navigation icons were contributed by Christopher Petrilli.