generateDS -- Generate Data Structures from XML Schema

Author: Dave Kuhlman
Address:
dkuhlman@rexx.com
http://www.rexx.com/~dkuhlman
Revision: 1.9a
Date: Dec. 4, 2006
Copyright: Copyright (c) 2004 Dave Kuhlman. This documentation and the software it describes is covered by The MIT License: http://www.opensource.org/licenses/mit-license.

Abstract

generateDS.py generates Python data structures (for example, class definitions) from an XML Schema document. These data structures represent the elements in an XML document described by the XML Schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document.

Contents

1   Introduction

generateDS.py generates Python data structures (for example, class definitions) from an XML Schema document. These data structures represent the elements in an XML document described by the XML Schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document.

The generated Python code contains:

The generated classes contain the following:

The generated subclass file contains one (sub-)class definition for each data representation class. If the subclass file is used, then the parser creates instances of the subclasses (instead of creating instances of the superclasses). This enables the user to extend the subclasses with "tree walk" methods, for example, that process the contents of the XML file. The user can also generate and extend multiple subclass files which use a single, common superclass file, thus implementing a number of different processes on the same XML document type.

This document explains (1) how to use generateDS.py; (2) how to use the Python code and data structures that it generates; and (3) how to modify the generated code for special purposes.

2   Where To Find It

2.1   Download

You can find the distribution here: http://www.rexx.com/~dkuhlman/generateDS-1.9a.tar.gz

It is also available at SourceForge: http://sourceforge.net/projects/generateds/

2.2   Support

There is a mailing list at SourceForge: http://sourceforge.net/mail/?group_id=183592

3   How to build and install it

Newer versions of Python have XML support in the Python standard library. For older versions of Python, install PyXML. You can find it at: http://pyxml.sourceforge.net/

De-compress the generateDS distribution file. Use something like the following:

tar xzvf generateDS-x.xx.tar.gz

Then, the regular Distutils commands should work:

python setup.py build
python setup.py install        # probably as root

4   How-to Use generateDS.py

4.1   Running generateDS.py

Run generateDS.py with a single argument, the XML Schema file that defines the data structures. For example, the following will generate Python source code for data structures described in people.xsd and will write it to the file people.py. In addition, it will write subclass stubs to the file peoplesubs.py:

python generateDS.py -o people.py -s peoplesubs.py people.xsd

Here is the usage message displayed by generateDS.py:

Usage: python generateDS.py [ options ] <in_xsd_file>
Options:
    -o <outfilename>         Output file name for data representation classes
    -s <subclassfilename>    Output file name for subclasses
    -p <prefix>              Prefix string to be pre-pended to the class names
    -n <mappingfilename>     Transform names with table in mappingfilename.
    -f                       Force creation of output files.  Do not ask.
    -a <namespaceabbrev>     Namespace abbreviation, e.g. "xsd:". Default = 'xs:'.
    -b <behaviorfilename>    Input file name for behaviors added to subclasses
    -m                       Generate properties for member variables
    --subclass-suffix="XXX"  Append XXX to the generated subclass names.  Default="Sub".
    --root-element="XXX"     Assume XXX is root element of instance docs.
                             Default is first element defined in schema.
    --super="XXX"            Super module name in subclass module. Default="???"
    --validator-bodies=path  Path to a directory containing files that provide
                             bodies (implementations) of validator methods.
    --use-old-getter-setter  Name getters and setters getVar() and setVar(),
                             instead of get_var() and set_var().

The following command line flags are recognized by generateDS.py:

o <filename>
Write the data representation classes to file filename.
s <filename>
Write the subclass stubs to file filename.
p <prefix>
Prepend prefix to the name of each generated data structure (class).
f
Force generation of output files even if they already exist. Do not ask before over-writing existing files.
a <namespaceabbrev>

Namespace abbreviation, for example "xsd:". The default is 'xs:'. If the <schema> element in your XML Schema, specifies something other than "xmlns:xs=", then you need to use this option. So, suppose you have the following at the beginning of your XSchema file:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

Then you can the following command line option:

-a "xsd:"

But, note that generateDS.py also tries to pick-up the namespace prefix used in the XMLSchema file automatically. If the <schema> element has an attribute "xmlns:xxx" whose value is "http://www.w3.org/2001/XMLSchema", then generateDS.py will use "xxx:" as the alias for the XMLSchema namespace in the XMLSchema document.

b <behaviorfilename>
Input file name for behaviors to be added to subclasses. Specifies is the name of an XML document containing descriptions of methods to be added to subclasses generated with the -s flag. The -b flag requires the -s flag. See the section on XMLBehaviors below.
m
Generate property members and new style classes. Causes generated classes to inherit from class object. Generates a call to the built-in property function for each pair of getters and setters. This is experimental.
subclass-suffix=<suffix>

Append suffix to the name of classes generated in the subclass file. The default, if omitted, is "Sub". For example, the following will append "_Action" to each generated subclass name:

generateDS.py --subclass-suffix="_Action" -s actions.py mydef.xsd

And the following will append nothing, making the superclass and subclass names the same:

generateDS.py --subclass-suffix="" -s actions.py mydef.xsd
root-element=<element_name>
Make element_name the assumed root of instance documents. The default is the name of the element whose definition is first in the XML Schema document. This flag effects the parsing functions (for example, parse(), parseString()).
super=<module_name>

Make module_name the name of the superclass module imported by the subclass module. If this flag is omitted, the following is generated near the top of the subclass file:

import ??? as supermod

and you will need to hand edit this so the correct superclass module is imported.

validator-bodies=<path>
Obtain the bodies (implementations) for validator methods for members defined as simpleType from files in directory specified by <path>. The name of the file in that directory should be the same as the simpleType name with an optional ".py" extension. If a file is not provided for a given type, an empty body (pass) is generated. In these files, lines with "##" in the first two columns are ignored and are not inserted.
use-old-getter-setter
generateDS.py now generates getter and setter methods (for variable "abc", for example) with the names get_abc() and set_abc(), which I believe is a more Pythonic style, instead of getAbc() and setAbc(), which was the old behavior. Use this flag to generate getters and setters in the old style (getAbc() and setAbc()).

4.2   Name conficts

4.2.1   Conflicts with Python keywords

In some cases the element and attribute names in an XML document will conflict with Python keywords. In order to avoid these clashes, generateDS.py contains a table that maps names that might clash to acceptable names. This table is a Python dictionary named NameTable. The user can modify existing entries and add additional name-replacement pairs to this table, for example, if new conflicts occur.

4.2.2   Conflicts between child elements and attributes

In some cases the name of a child element and the name of an attribute will be the same. (I believe, but am not sure, that this is allowed by XML Schema.) Since generateDS.py treats both child elements and attributes as members of the generated class, this is a name conflict. Therefore, where such conflicts exist, generateDS.py modifies the name of the attribute by adding "_attr" to its name.

4.3   Supported features of XML Schema

The following constructs in the XML Schema are supported:

  • Attributes of types xs:string, xs:integer, xs:float, and xs:boolean.
  • Repeated sub-elements specified with maxOccurs="unbounded".
  • Sub-elements of simple types xs:string, xs:integer, and xs:float.
  • Sub-elements of complex types defined separately in the XML Schema document.

See file people.xsd for examples of the definition of data types and structures. Also see the section on The XML Schema Input to generateDS.

4.3.1   Attributes + no nested children

Element definitions that contain attributes but no nested child elements provide access to their data content through getter and setter methods getValueOf_ and setValueOf_ and member variable valueOf_.

4.3.2   Mixed content

Elements that are defined to contain both text and nested child elements have "mixed content". generateDS.py provides access to mixed content, but the generated data structures (classes) are fundamentally different from that generated for other elements. See section Mixed content for more details.

Note that elements defined with attributes but with no nested sub-elements do not need to be declared as "mixed". For these elements, character data is captured in a member variable valueOf_, and can be accessed with member methods getValueOf_ and setValueOf_.

4.3.3   anyAttribute

generateDS.py supports anyAttribute. For example, if an element is defined as follows:

<xs:element name="Tool">
   <xs:complexType>
      <xs:attribute name="PartNumber" type="xs:string" />
      <xs:anyAttribute processContents="skip" />
   </xs:complexType>
</xs:element>

Then generateDS.py will generate a class with a member variable anyAttributes_ containing a dictionary. Any attributes found in the instance XML document that are not explicitly defined for this element will be stored in this dictionary. generateDS.py also generates getters and setters as well as code for parsing and export. generateDS.py ignores processContents. See section anyAttribute for more details.

4.3.4   Element extensions

generateDS.py now generates subclasses for extensions, that is when an element definition contains something like this:

<xs:extension base="sometag">

Limitation -- There is an important limitation, however: member names duplicated (overridden ?) in an extension generate erroneous code. Sigh. I guess I needed something more to do.

Several of the generated methods have been refactored so that subclasses can reuse the code in their superclasses. Take a look at the generated code to learn how to use it.

The Python compiler/interpreter requires that it has seen a superclass before it sees the subclass that uses it. Because of this, generateDS.py delays generating a subclass until after its superclass has been generated. Therefore, the order in which classes are generated may be different from what you expect.

4.3.5   Attribute groups

generateDS.py now handles definition and use of attribute groups. For example: the use of something like the following:

<xs:attributeGroup name="favorites">
    <xs:attribute name="fruit" />
    <xs:attribute name="vegetable" />
</xs:attributeGroup>

And, a reference or use like the following:

<xs:element name="person">
    <xs:complexType mixed="0">
        <xs:attributeGroup ref="favorites" />
        o
        o
        o

Results in generation of class person that contains members fruit and vegetable.

4.3.6   Substitution groups

generateDS.py now handles a limited range of substitution groups, but, there is an important limitation, in particular generateDS.py handles substitution groups that involve complex types, but does not handle those that involve (substitute for) simple types (for example, xs:string, xs:integer, etc). This is because the code generated for members defined as simple types does not provide the needed information to handle substitution groups.

4.3.7   Primitive types

generateDS.py supports some, but not all, simple types defined in "XML Schema Part 0: Primer Second Edition" ( http://www.w3.org/TR/xmlschema-0/. See section "Simple Types" and appendix B). Validation is performed for some simple types. When performed, validation is done while the XML document is being read and instances are created.

Here is a list of supported simple types:

  • xs:string -- No validation.
  • xs:token -- No validation. White space between tokens is coerced to a single blank between tokens.
  • xs:integer, xs:short, xs:long -- All treated the same. Checked for valid integer.
  • xs:float, xs:double, xs:decimal -- All treated the same. Checked for valid float.
  • xs:positiveInteger -- Checked for valid range (> 0).
  • xs:nonPositiveInteger -- Checked for valid range (<= 0).
  • xs:negativeInteger -- Checked for valid range (< 0).
  • xs:nonNegativeInteger -- Checked for valid range (>= 0).
  • xs:date, xs:dateTime -- All treated the same. No validation.
  • xs:boolean -- Checked for one of 0, false, 1, true.

4.3.8   simpleType

generateDS.py generates minimal support for members defined as simpleType. However, the code generated by generateDS.py does not enforce rescriptions. For notes on how to enforce restrictions, see section simpleType and validators.

A simpleType can be a restriction on a primitive type or on a defined element type. So, for example, the following will generate valid code:

<xs:element name="percent">
    <xs:simpleType>
        <xs:restriction base="xs:integer">
            <xs:minInclusive value="1"/>
            <xs:maxInclusive value="100"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

And, the following will also generate valid code:

<xs:simpleType name="emptyString">
    <xs:restriction base="xs:string">
        <xs:whiteSpace value="collapse"/>
    </xs:restriction>
</xs:simpleType>

<xs:element name="merge">
    <xs:complexType>
        <xs:simpleContent>
            <xs:extension base="emptyString">
                <xs:attribute name="fromTag" type="xs:string"/>
                <xs:attribute name="toTag" type="xs:string"/>
            </xs:extension>
        </xs:simpleContent>
    </xs:complexType>
</xs:element>

4.3.9   simpleType and validators

Here are a few notes that should help you use validator methods to enforce restrictions.

  • Default behavior -- The generated code, by default, treats the value of a member whose type is a simpleType as if it were declared as type xs:string.

  • Validator method stubs -- For a member variable name declared as a simpleType named X, a validator method validate_X is generated. Example -- from:

    <xs:simpleType name="tAnyName">
        <xs:restriction base="xs:string"/>
    </xs:simpleType>
    

    The class generated by generateDS.py will contain the following method definition:

    def validate_tAnyName(self, value):
        # Validate type tAnyName, a restriction on xs:string.
        pass
    
  • Calls to validator methods -- For a member variable declared as a simpleType X, a call to validate X is added to the build method. Example -- from:

    <xs:element name="person">
        <xs:complexType mixed="0">
            <xs:sequence>
                <xs:element name="test2" type="tAnyName"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    

    generateDS.py produces the following call:

    self.validate_tAnyName(self.test2)    # validate type tAnyName
    
  • Code bodies for validator methods can be added either (1) manually or (2) automatically from an external source. See command line flag --validator-bodies and see below.

You can add code to the validator method stub to enforce the restriction for the base type and further restrictions imposed on that base type. This can be done in the following ways:

  1. Add code manually after generation. I recommend that you use the -s command line flag and override the validator method in the resulting subclass file.
  2. Or, supply code bodies (implementations) in an external source and ask generateDS.py to insert those code bodies into generated validator methods. Here are notes on how to do this:
    • Use the --validator-bodies=path command line flag to specify a directory.
    • In that directory, provide one file for each simpleType. The name of the file should be the same as the name of the simpleType with an optional extension ".py". generateDS.py looks for a file named type_name.py, first, and if not found, looks for a file named type_name.
    • If the --validator-bodies=path is not on the command line or neither type_name.py nor type_name is found, an empty body (a pass statement) is generated.
    • Lines from the file are inserted as is, except that lines containing "##" in the first two columns are omitted. Note that you will need to provide the correct indentation for a method in a class, specifically 8 spaces.

The support for simpleType in generateDS.py has the following limitations (among others, I'm sure):

  • It only works for simpleType defined with and referenced through a name. It does not work for "in-line" definitions. So, for example, the following works:

    <xs:element name="person">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="test3" type="tAnyName"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    
    <xs:simpleType name="tAnyName">
        <xs:restriction base="xs:string"/>
    </xs:simpleType>
    

    But, the following does not work:

    <xs:element name="person">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="test3">
                    <xs:simpleType name="tAnyName">
                        <xs:restriction base="xs:string"/>
                    </xs:simpleType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    
  • Attributes defined as a simple type are not supported.

5   The XML Schema Input to generateDS

generateDS.py actually accepts a subset of XML Schema. The sample XML Schema file should give you a picture of how to describe an XML file and the Python classes that you will generate. And here are some notes that should help:

Here are a few additional rules that will help you to write XML Schema files for generateDS.py:

5.1   Additional constructions

Here are a few additional constructions that generateDS.py understands.

5.1.1   <complexType> at top-level

You can use the <complexType> element at top level (instead of <element>) to define an element. So, for example, instead of:

<xs:element name="server-type">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="server-name" type="xs:string"/>
            <xs:element name="server-description" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

you can use the following, which is equivalent:

<xs:complexType name="server-type">
    <xs:sequence>
        <xs:element name="server-name" type="xs:string"/>
        <xs:element name="server-description" type="xs:string"/>
    </xs:sequence>
</xs:complexType>

5.1.2   Use of "ref" instead of "name" and "type" attributes

You can use the "ref" attribute to refer to another element definition, instead of using the "name" and "type" attributes. So, for example, you can use the following:

<xs:element name="server-info">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="server-comment" type="xs:string"/>
            <xs:element ref="server-type" />
        </xs:sequence>
    </xs:complexType>
</xs:element>
   in place of this:
<xs:element name="server-info">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="server-comment" type="xs:string"/>
            <xs:element name="server-type" type="server-type"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

5.1.3   Extension types

generateDS.py generates a subclass for each element that that is defined as the extension of a base element. So, for the following:

<xs:complexType name="BType">
    <xs:complexContent>
        <xs:extension base="AType">
            <xs:sequence>
                o
                o
                o

generateDS.py will generate something like the following:

class BType(AType):
    o
    o
    o

5.1.4   Elements containing mixed content

generateDS.py generates special code to handle elements defined as containing mixed content, that is elements defined with attribute mixed="true". See section Mixed content for more details.

6   XMLBehaviors

With the use of the "-b" command line flag, generateDS.py will also accept as input an XML document instance that describes behaviors to be added to subclasses when the subclass file is generated with the "-s" command line switch.

An example is provided in the Demos/Xmlbehavior sub-directory of the distribution.

The XMLBehaviors capability in generateDS.py was inspired and, for the most part, designed by gian paolo ciceri (gp.ciceri@suddenthinks.com). This work is part of our work on our application development project for Quixote.

6.1   The XMLBehaviors input file

This section describes the XMLBehavior XML document that is used as input to generateDS.py. The XMLBehavior XML document is an XML instance document (given as an argument to the "-b" command line flag) that describes behaviors (methods) to be added to class definitions in the subclass file (generated with the "-s" command line flag).

See file xmlbehavior_po.xml in the Demos/Xmlbehavior directory in the distribution for an example that you can use as a model.

The elements in the XMLBehavior document type are the following:

  • <xb:xml-behavior> -- The base element in the document.
    • <xb:base-impl-url> -- The root (left-most portion) of URL containing implementation bodies. Implementation URLs are appended to this base URL.
    • <xb:behaviors> -- A list of behaviors.
      • <xb:behavior> -- Describes a single XMLBehavior.
        • <xb:class> -- The name of the class to which this behavior is to be added.
        • <xb:name> -- The name of the behavior/method. Must conform to Python name syntax.
        • <xb:args> -- A list of arguments to the behavior/method.
          • <xb:arg> -- A positional argument to the method.
            • <xb:name> -- The name of the argument.
            • <xb:data-type> -- The data-type of the argument.
        • <xb:return-type> -- The data-type of the value returned by the behavior/method.
        • <xb:impl-url> -- The URL of the implementation body. This value will be concatenated to the right-hand side of the base-impl-url.
        • <xb:ancillaries> -- A list of ancillary behaviors/methods. Each ancillary has a role, which defines how it is to be used.
          • <xb:ancillary> -- A specification of an ancillary behavior/method.
            • <xb:name> -- The name of the behavior/method. Must conform to Python name syntax.
            • <xb:role> -- The method's role. The following values are supported:
              • "DBC-precondition" -- A Design By Contract-style pre-condition check. This method will be called before the core behavior/method itself.
              • "DBC-postcondition" -- A Design By Contract-style post-condition check. This method will be called after the core behavior/method itself.
            • <xb:args> -- A list of arguments to the ancillary behavior/method. The element has the same content as the <xb:args> element for the core behavior/method.
            • <xb:return-type> -- The data-type of the value returned by the behavior/method.
            • <xb:impl-url> -- The URL of the implementation body. This value will be concatenated to the right-hand side of the base-impl-url.

6.2   Implementing other sources for implementation bodies

generateDS.py contains a function get_impl_body() that implements the ability to retrieve implementation bodies. The current implementation retrieves implementation bodies from an Internet Web URL. Other sources for implementation bodies can be implemented by modifying get_impl_body().

As an example, the version that follows first tries to retrieve an implementation body from a Web address and, if that fails, attempts to obtain the implementation body from a file in the local file system using the <xb:base-impl-url> as a path to a directory containing files, each of which contains one implementation body and <xb:impl-url> as the file name. This implementation of get_impl_body was provided by Colin Dembovsky of Systemsfusion Inc. Thanks, Colin. (I've included it in the generateDS.py script, but commented out, for those who want to use and possibly extend it.):

def get_impl_body(classBehavior, baseImplUrl, implUrl):
    impl = '        pass\n'
    if implUrl:
        trylocal = 0
        if baseImplUrl:
            implUrl = '%s%s' % (baseImplUrl, implUrl)
        try:
            implFile = urllib2.urlopen(implUrl)
            impl = implFile.read()
            implFile.close()
        except:
            trylocal = 1
        if trylocal:
            try:
                implFile = file(implUrl)
                impl = implFile.read()
                implFile.close()
            except:
                print '*** Implementation at %s not found.' % implUrl
    return impl

7   How-to Use the Generated Source Code

7.1   The parsing functions

The simplest use is to call one of the parsing functions in the generated source file. You may be able to use one of these functions without change, or can modify one to fit your needs. generateDS.py generates the following parsing functions:

  • parse -- Parse an XML document from a file.
  • parseString -- Parse an XML document from a string.

These parsing functions are generated in both the superclass and the subclass files. Note the call to the export method. You may need to comment out or un-comment this call to export according to your needs.

For example, if the generated source is in people.py, then, from the command line, run something like the following:

python people.py people.xml

Or, from within other Python code, use something like the following:

import people
rootObject = people.parse('people.xml')

7.2   The export methods

The generated classes contain methods export and exportLiteral which can be called to export classes to several text formats, in particular to an XML instance document and a Python module containing Python literals. See the generated parse functions for examples showing how to call the export methods.

7.2.1   Method export

The export method in generated classes writes out an XML document that represents the instance that contains it and its child elements. So, for example, if your instance tree was created by one of the parsing functions described above, then calling export on the root element should reproduce the input XML document, differing only with respect to ignorable white space.

7.2.2   Method exportLiteral

generateDS.py generates Python classes that represent the elements in an XML document, given an Xschema definition of the XML document type. The exportLiteral method will export a Python literal representation of the Python instances of the classes that represent an XML document.

7.2.2.1   What It Does

When generateDS.py generates the Python source code for your classes, this new feature also generates an exportLiteral method in each class. If you call this method on the root (top-most) object, it will write out a literal representation of your class instances as Python code.

generateDS.py also generates a function at top level (parseLiteral) that parses an XML document and calls the "exportLiteral" method on the root object to write the data structure (instances of your generated classes) as a Python module that you can import to (re-)create instances of the classes that represent your XML document.

7.2.2.2   Why You Might Care

generateDS.py was designed and built with the assumption that we are not interested in marking up text content at all. What we really want is a way to represent structured and nested date in text. It takes the statement, "I want to represent nested data structures in text.", entirely seriously. Given that assumption, there may be times when you want a more "Pythonic" textual representation of the Python data structures for which generateDS.py has generated code. exportLiteral enables you to produce that representation.

This feature means that the classes that you generate from an XML schema support the interchangeability of XML and Python literals. This means that, given classes generated by generateDS.py for your XML document type, you can perform the following transformations:

  • Translate an XML document into a Python module containing a literal definition of the contents of the XML document.
  • Translate the literal definition of a Python data structure into an XML instance document.

This capability enables you to:

  • Work with an XML (text) document, then exchange it for a Python text representation of the content of that document.
  • Work with a Python literal text representation of your XML document, then exchange that for an XML document that represents the same content.
  • "Freeze" your XML document as a Python module that you can import. The module can be edited with your text editor, so perhaps it would be better to say that it is frozen, but not too hard. The classes that you generate with generateDS.py can be used to:
    1. Read in an XML document.
    2. (Optionally) modify the Python instances that represent that XML document.
    3. Write the instances out as a Python module that you can later import.

7.2.2.3   How to use it

See the generated function parseLiteral for an example of how to use exportLiteral.

7.3   Building instances

If you have an instance of a minidom node that represents an element in an XML document, you can also use the 'build' member function to populate an instance of the corresponding class. Here is an example:

from xml.dom import minidom
from xml.dom import Node

doc = minidom.parse(inFileName)
rootNode = doc.childNodes[0]
people = []
for child in rootNode.childNodes:
    if child.nodeType == Node.ELEMENT_NODE and child.nodeName == 'person':
        obj = person()
        obj.build(child)
        people.append(obj)

7.4   Using the subclass module

If you choose to use the generated subclass module, and I encourage you to do so, you may need to edit and modify that file. Here are some of the things that you must do (look for "???"):

  • Edit the import statement at the top of the file. It should import the generated superclass file. Note that you can also use the --super= command line flag to insert this automatically.
  • Edit the USAGE_TEXT string so that it gives a help message appropriate for your use.
  • Edit the main function toward the bottom of the file. It should call a method, that you have possibly added, to the root subclass.

You can also (and most likely will want to) add methods to the generated classes. See the section How-to Modify the Generated Code for more on this.

The classes generated from each element definition provide getter and setter methods to access its attributes and child elements.

Elements that are referenced but not defined (i.e. that are simple, for example strings, integers, floats, and booleans) are accessed through getter and setter methods in the class in which they are referenced.

7.5   Elements with attributes but no nested children

Element definitions that contain attributes but no nested child elements provide access to their data content through getter and setter methods getValueOf_ and setValueOf_ and member variable valueOf_.

7.6   Mixed content

The goal of generateDS.py is to support data structures represented in XML as opposed to text mark-up. However, it does provides some support for mixed content. But, for mixed content, the data structures and code generated by generateDS.py are fundamentally different from those for elements that do not contain mixed content.

There are limitations, of course. A known limitation is related to extension elements. Specifically, if an element contains mixed content, and this element extends a base class, then the base class and any classes it extends must be defined to contain mixed content. This is due to the fact that generateDS.py generates a data structure (class) for elements containing mixed content that is fundamentally different from that generated for other elements.

Here is an example of mixed content:

<note>This is a <bold>nice</bold> comment.</note>

When an element is defined with something like the following:

<xs:complexType mixed="true">
    <xs:sequence>
        o
        o
        o

then, instead of generating a class whose named members refer to nested elements, a class containing a list of instances of class MixedContainer is generated. In order to process the content of a mixed content element, the code you write will need to walk this list of instances of MixedContainer and check the type of each item in that list. Basically, the structure becomes more DOM-like in the sense that it has a list of children, rather than named fields.

Instances of MixedContainer have the following methods:

  • getCategory -- Returns one of the following, depending on the content:
    • CategoryText -- Text content.
    • CategorySimple -- Simple elements, that is, elements defined as xs:string, xs:integer, etc. For these, the member variable content_type, accessible through method getContenttype will contain one of TypeString, TypeInteger, TypeFloat, TypeDecimal, TypeDouble, or TypeBoolean.
    • CategoryComplex -- Complex elements represented by a generated class. For these, the member variable name, accessible through method getName will return the element/tag name and the member variable value, accessible through method getValue will return the instance.
  • getContenttype -- Returns one of TypeString, TypeInteger, TypeFloat, TypeDecimal, TypeDouble, or TypeBoolean. Valid only when category is CategorySimple.
  • getName -- For CategoryComplex, returns the name of the element.
  • getValue -- Returns the value of this chunk of content. Its type depends on the value returned by getCategory and getContenttype.

Note that elements defined with attributes but with no nested sub-elements do not need to be declared as "mixed". For these elements, character data is captured in a member variable valueOf_, and can be accessed with member methods getValueOf_ and setValueOf_.

7.7   anyAttribute

For elements that specify anyAttributes, generateDS.py produces a class containing the following:

  • A member variable anyAttributes_ containing a Python dictionary. After parsing an XML instance document, this dictionary will contain name-value pairs for any attributes in the instance document not explicitly defined for that element.
  • The following getters and setters: getAnyAttributes_ and setAnyAttributes_.
  • Code to export the attribute names and values stored in the dictionary.
  • Code to parse attributes in addition to those explicitly defined for the element and store them in the dictionary.

Note: Attributes that are explicitly defined for an element are not stored in the dictionary anyAttributes_.

generateDS.py ignores the processContents attribute on the anyAttribute element in the XML Schema

8   How-to Modify the Generated Code

This section attempts to explain how to modify and add features to the generated code.

8.1   Adding features to class definitions

You can add new member definitions to a generated class. Look at the 'export' and 'exportLiteral' member functions for examples of how to access member variables and how to walk nested sub-elements.

Here are interesting places to look in each class definition:

  • The 'export' and 'exportLiteral' methods -- These methods walk the object tree. You can consider copying and renaming them to produce other tree walking methods.
  • The 'build' method -- These methods extract information from the minidom node. You can inspect the 'build' methods to learn how to extract information for other purposes.

And, if you need methods that are common to and shared by several of the generated subclasses, you can put them in a new class and add that class to the superclass list for each of your subclasses.

Although you can add your own methods to the generated superclasses, I'm recommeding that you add methods to the generated subclasses in the subclass module generated with the -s command line flage, and then edit the subclass module in order to build your application. Why?

  • The superclasses are cluttered with other code. Using the subclass file enables you to keep your application code separate.
  • By putting your application code in the subclass file, you will be able to reuse the superclass file. You can generate multiple subclass files from the same XML Schema definition file. Each of these subclass files can import the same superclass file.

Here are some alternatives to using the subclass file:

  • Add more than one method to each generated (super-)class. Each method implements a separate task or "application". If the number of tasks grows, this will create maintenance difficulties, however.
  • Re-generate multiple (super-)class files. Add methods to the classes in these separate files to implement different tasks. This of course will not work well if you have had to modify the parser, for example, since generating the file.

9   Examples and Demonstrations

Under the directory Demos are several examples:

Suggested uses:

10   Limitations of generateDS

10.1   XML Schema limitations

There are things in Xschema that are not supported. You will have to use a restricted sub-set of Xschema to define your data structures. See above for supported features. See people.xsd and people.xml for examples.

And, then, try it on your XML Schema, and let me know about what does not work.

10.2   Large documents

Warning -- This section describes an optional generated SAX parser which, I believe, is currently broken for all but the simplest schemas. Generation of a SAX parser has not been updated for the latest changes to generateDS.py. In particular, when names of elements are reused (in different parent elements), the SAX parser becomes confused. Until I've been able to figure out how to fix this, you are advised not to use the SAX parser.

generateDS.py generates two kinds of parsers: one kind is based on SAX and the other is build on minidom. See the generated functions saxParse, parse(), and parseString(). Using the SAX parser instead of the minidom parser should reduce memory requirements for large documents, since the minidom parser, but not the SAX parser, constructs a DOM tree for the entire document in memory.

However, both styles of parsers construct instances of the data structures generated by generateDS.py. This means that, even when the SAX parser is used, generateDS.py may not be well-suited for applications that read large XML documents, although what "large" means depends on your hardware. Notice that the minidom parsing functions (parse() and parseString()) over-write the variable doc so as to enable Python to reclaim the space occupied by the DOM tree, which may help alleviate the memory problem to some extent when the minidom parser is used.

11   See Also

Python: The Python home page.

Dave's Page: My home page, which contains more Python stuff.