Data Mapping Techniques -- WDDX

Contents:

Back to top

Introduction

This paper explores a technique for exloiting the interchangability between XML documents and Python data structures.

It is often useful to load an XML document into Python data structures so that they can be processed. The DOM interface performs this task. However, DOM data structures are specific to DOM. Often it is desirable to load the XML document into custom or application-specific data structures.

There are a number of approaches to this problem that come to mind:

In both of the above techniques, custom Python code is used to perform much of the conversion, plucking data values from the XML document and creating instances of the Python data structures. In effect, the developer's control over the conversion process is encoded in custom Python code.

This paper presents a alternative technique. This technique described in this paper transforms the original XML document into an XML document having a cannonical form, specifically, WDDX, then use the (un-)marshaller that is distributed with PyXML to convert that into Python objects. In effect, XSLT is used to customize the conversion. The developer's control over the conversion process is encoded in an XSLT stylesheet.

This technique has the following benefits:

Our interest in this paper is to provide help with implementing an equivalence between XML documents and Python data structures.

Note, however, that the technique we describe in this paper will work for any language for which there is an implementation of the marshaller/unmarshaller provided in generic.py (in PyXML).

The top level process is composed of the following steps:

  1. Use an XSLT processor and the stylesheet that you have written to transform the source XML document into an XML document of the form accepted by the class generic.Unmarshaller.

  2. Use the generic.py (in the PyXML distribution) to load the generated/marshalled XML into Python data structures.

Here is sample code that performs this transformation:

    import generic
    import libxsltmod

    class MsgHandler:
        def __init__(self):
            pass
        def write(self, msg):
            print '***', msg

    def loadfile(inFileName, stylesheetFile):
        msgHandler = MsgHandler()
        s1 = libxsltmod.translate_to_string(
            'f', stylesheetFile,
            'f', inFileName,
            msgHandler)
        print s1
        um = generic.Unmarshaller()
        ds = um.loads(s1)
        ds.show()
Some notes about this code:

A note on the term "WDDX" -- I don't believe that the XML documents that we are generating (for input to the generic.py Unmarshaller) follow the DTD for WDDX. That doesn't concern me much for our purposes here, since in this technique we are building documents for input to generic.py. However, if you plan to share or "syndicate" those documents, then you will want to pay attention to generating XML documents that obey a publicly known DTD. In the mean time, I believe that what this paper describes is in the "spirit" of WDDX in the sense that it marshals and unmarshals data structures in a way that is programming language neutral. However, the occurance of so many quasi-quotes in this paragraph should be a caution. As should paragraphs that refer to themselves.

Back to top


Description of the cannonical XML form

This technique generates XML that can be processed by the Unmarshaller in generic.py, which is included in the PyXML distribution.

This section describes the XML elements that we must generate. Effectively, we are describing the XML elements generated by class generic.Marshaller and function generic.dumps() and accepted by class generic.Unmarshaller and function generic.loads(). (gneric.py is in the PyXML distribution.)

To create an instance of a class, generate something like the following:

    <object class="object_class_name" module="object_classes">
        <tuple/>
        <dictionary>
            <string>member_1</string>
            <string>value_1</string>
            <string>member_2</string>
            <string>value_2</string>
            o
            o
            o
        </dictionary>
    </object>
Here are a few things to notice about this generated XML:

To create a list of objects, generate something like the following:

    <string>member_variable_name</string>
    <list>
        <object class="object_class_name" module="object_classes">
            o
            o
            o
        </object>
        o
        o
        o
    </list>
Or:

    <string>member_variable_name</string>
    <list>
        <string>value_1</string>
        <string>value_2</string>
        o
        o
        o
    </list>
Here are a few things to notice about this generated XML:

To create a string value, generate the following:

    <string>value_1</string>
To create an integer value, generate the following:

    <int>101</int>
To create a float value, generate the following:

    <float>1.23</float>
You can use class Marshaller in generic.py to determine the format of other data types. The following code will print a sample of the input to the Unmarshaller:

    import generic

    m = generic.Marshaller()
    ds1 = ([11,22], 333, 'bbb')
    s1 = m.dumps(ds1)
    print s1
Back to top

Commonly Used Templates -- A Template Cookbook

This section presents some (skeletons of) templates that produce commonly needed XML elements, for input to class generic.Unmarshaller. It can be viewed as a cookbook for creating XSLT templates to perform common data structure loading tasks.


Create an object

To create an instance of a class from the current element, create a template rule similar to the following:

    <xsl:template match="object_element_name">
        <xsl:element name="object">
            <xsl:attribute name="class">class_name</xsl:attribute>
            <xsl:attribute name="module">object_classes</xsl:attribute>
            <xsl:element name="tuple"/>
            <xsl:element name="dictionary">
                <xsl:element name="string">
                    <xsl:text>member_x</xsl:text>
                </xsl:element>
                <xsl:element name="string">
                    <xsl:value-of select="@attribute_x"/>
                </xsl:element>
                <xsl:element name="string">
                    <xsl:text>sub_object_list</xsl:text>
                </xsl:element>
                <xsl:element name="list">
                    <xsl:apply-templates select="./*"/>
                </xsl:element>
            </xsl:element>
        </xsl:element>
    </xsl:template>
Where:


Add a string member data item

To add a member variable to the current object with a simple string value that comes from an attribute of the current element, do the following:

Add the following snippet to the current template:

    <xsl:element name="string">
        <xsl:text>member_variable_name</xsl:text>
    </xsl:element>
    <xsl:element name="string">
        <xsl:value-of select="@attribute_name"/>
    </xsl:element>
Where:

To add a member variable to the current object with a simple string whose value that comes from the text (node) in the current element, do the following:

Add the following snippet to the current template:

    <xsl:element name="string">
        <xsl:text>member_variable_name</xsl:text>
    </xsl:element>
    <xsl:element name="string">
        <xsl:value-of select="."/>
    </xsl:element>
Where:


Create a list of objects

To create a list of objects from a nested list of elements, do the following:

Step 1. Add the following snippet to the parent template:

    <xsl:element name="string">
        <xsl:text>object_list</xsl:text>
    </xsl:element>
    <xsl:element name="list">
        <xsl:apply-templates select="./object_element_name"/>
    </xsl:element>
Where:

Step 2. Add a template rule for the sub-element:

    <xsl:template match="object_element_name">
        <xsl:element name="object">
            <xsl:attribute name="class">class_name</xsl:attribute>
            <xsl:attribute name="module">object_classes</xsl:attribute>
            <xsl:element name="tuple"/>
            <xsl:element name="dictionary">
                <xsl:element name="string">
                    <xsl:text>x</xsl:text>
                </xsl:element>
                <xsl:element name="string">
                    <xsl:value-of select="@X"/>
                </xsl:element>
                <xsl:element name="string">
                    <xsl:text>object_list</xsl:text>
                </xsl:element>
                <xsl:element name="list">
                    <xsl:apply-templates select="./*"/>
                </xsl:element>
            </xsl:element>
        </xsl:element>
    </xsl:template>
Where:

Back to top
Last update: 1/4/02