Data Mapping Techniques -- WDDX
Contents:
Back to top
Introduction
This paper explores a technique for exloiting the interchangability
between XML documents and Python data structures.
It is often useful to load an XML document into Python data
structures so that they can be processed. The DOM interface
performs this task. However, DOM data structures are specific to
DOM. Often it is desirable to load the XML document into custom or
application-specific data structures.
There are a number of approaches to this problem that come to mind:
- Use the SAX interface -- Scan the XML document, creating custom
Python data structures while doing so.
- Use the DOM interface -- Load the XML document into a DOM tree,
then perform a tree walk on the DOM tree, creating custom
Python data structures while doing so.
In both of the above techniques, custom Python code is used to
perform much of the conversion, plucking data values from the XML
document and creating instances of the Python data structures. In
effect, the developer's control over the conversion process is
encoded in custom Python code.
This paper presents a alternative technique. This technique
described in this paper transforms the original XML document into
an XML document having a cannonical form, specifically, WDDX, then
use the (un-)marshaller that is distributed with PyXML to convert
that into Python objects. In effect, XSLT is used to customize the
conversion. The developer's control over the conversion process is
encoded in an XSLT stylesheet.
This technique has the following benefits:
- XSLT can be used to perform the conversion.
- We can provide a set of XSLT templates that can be easily
adapted to each of a set of common transformations.
- And the hope is that we can provide help with generating the
templates for the XSLT stylesheet that are used in the conversion
process. Perhaps, we can enable to describe the mapping from
specific XML elements to specific Python data structures in an
easier way, and then can generate XSLT stylesheet templates that
perform that mapping (or more correctly, that convert to the XML
elements that can be automatically loaded into Python data
structures).
Our interest in this paper is to provide help with implementing an
equivalence between XML documents and Python data structures.
Note, however, that the technique we describe in this paper will
work for any language for which there is an implementation of the
marshaller/unmarshaller provided in generic.py (in PyXML).
The top level process is composed of the following steps:
- Use an XSLT processor and the stylesheet that you have written
to transform the source XML document into an XML document of the form
accepted by the class generic.Unmarshaller.
- Use the generic.py (in the PyXML distribution) to load the
generated/marshalled XML into Python data structures.
Here is sample code that performs this transformation:
import generic
import libxsltmod
class MsgHandler:
def __init__(self):
pass
def write(self, msg):
print '***', msg
def loadfile(inFileName, stylesheetFile):
msgHandler = MsgHandler()
s1 = libxsltmod.translate_to_string(
'f', stylesheetFile,
'f', inFileName,
msgHandler)
print s1
um = generic.Unmarshaller()
ds = um.loads(s1)
ds.show()
Some notes about this code:
A note on the term "WDDX" -- I don't believe that the XML documents
that we are generating (for input to the generic.py Unmarshaller)
follow the DTD for WDDX. That doesn't concern me much for our
purposes here, since in this technique we are building documents
for input to generic.py. However, if you plan to share or
"syndicate" those documents, then you will want to pay attention to
generating XML documents that obey a publicly known DTD. In the
mean time, I believe that what this paper describes is in the
"spirit" of WDDX in the sense that it marshals and unmarshals data
structures in a way that is programming language neutral.
However, the occurance of so many quasi-quotes in this paragraph
should be a caution. As should paragraphs that refer to
themselves.
Back to top
Description of the cannonical XML form
This technique generates XML that can be processed by the Unmarshaller
in generic.py, which is included in the PyXML distribution.
This section describes the XML elements that we must generate.
Effectively, we are describing the XML elements generated by class
generic.Marshaller and function generic.dumps() and
accepted by class generic.Unmarshaller and function
generic.loads(). (gneric.py is in the PyXML distribution.)
To create an instance of a class, generate something like
the following:
<object class="object_class_name" module="object_classes">
<tuple/>
<dictionary>
<string>member_1</string>
<string>value_1</string>
<string>member_2</string>
<string>value_2</string>
o
o
o
</dictionary>
</object>
Here are a few things to notice about this generated XML:
- The name of the class, an instance of which is created, is
the value of the attribute class, e.g. "object_class_name".
- This class must be defined in a module whose name is the value of
the attribute module, e.g. "object_classes". So, in this
cass we would need a module object_classes.py.
- The empty tuple in this generated XML could contain parameters to
be passed to the constructor to the class. If this tuple is empty,
the constructure is not called. However, member variables for
the instance will be initialize (see next bullet).
- The dictionary contains the names and values of the member
variables to be set in the instance. The format is a member name
followed by its value followed by the next member name followed by
its value, and so on.
To create a list of objects, generate something like the
following:
<string>member_variable_name</string>
<list>
<object class="object_class_name" module="object_classes">
o
o
o
</object>
o
o
o
</list>
Or:
<string>member_variable_name</string>
<list>
<string>value_1</string>
<string>value_2</string>
o
o
o
</list>
Here are a few things to notice about this generated XML:
- If this list is to be the value of a member variable of a
class, generate this code within the dictionary that defines the
member variables of an instance of a class.
- The list will become the value of the the member
member_variable_name.
To create a string value, generate the following:
<string>value_1</string>
To create an integer value, generate the following:
<int>101</int>
To create a float value, generate the following:
<float>1.23</float>
You can use class Marshaller in generic.py to determine the format of other
data types. The following code will print a sample of the input to the
Unmarshaller:
import generic
m = generic.Marshaller()
ds1 = ([11,22], 333, 'bbb')
s1 = m.dumps(ds1)
print s1
Back to top
Commonly Used Templates -- A Template Cookbook
This section presents some (skeletons of) templates that produce
commonly needed XML elements, for input to class
generic.Unmarshaller. It can be viewed as a cookbook for creating
XSLT templates to perform common data structure loading tasks.
Create an object
To create an instance of a class from the current element, create a
template rule similar to the following:
<xsl:template match="object_element_name">
<xsl:element name="object">
<xsl:attribute name="class">class_name</xsl:attribute>
<xsl:attribute name="module">object_classes</xsl:attribute>
<xsl:element name="tuple"/>
<xsl:element name="dictionary">
<xsl:element name="string">
<xsl:text>member_x</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@attribute_x"/>
</xsl:element>
<xsl:element name="string">
<xsl:text>sub_object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./*"/>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
Where:
- object_element_name is the name of the element.
- object_class_name is the name of the Class. An instance of this
class will be created from the element.
- object_classes is the name of the module in which the class
is defined. Create a .py file with this name containing the class
definition.
-
Additional notes:
- This example creates a member variable named member_x
with a string value from the attribute named attribute_x.
- This example creates a list of sub-objects and assigns it to
member variable sub_object_list.
Add a string member data item
To add a member variable to the current object with a simple string
value that comes from an attribute of the current element, do the
following:
Add the following snippet to the current template:
<xsl:element name="string">
<xsl:text>member_variable_name</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@attribute_name"/>
</xsl:element>
Where:
- member_variable_name is the name of the member data item
to be added to the current instance.
- attribute_name is the name of the attribute that provides
the value.
To add a member variable to the current object with a simple string
whose value that comes from the text (node) in the current element,
do the following:
Add the following snippet to the current template:
<xsl:element name="string">
<xsl:text>member_variable_name</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="."/>
</xsl:element>
Where:
- member_variable_name is the name of the member data item
to be added to the current instance.
Create a list of objects
To create a list of objects from a nested list of elements, do the
following:
Step 1. Add the following snippet to the parent template:
<xsl:element name="string">
<xsl:text>object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./object_element_name"/>
</xsl:element>
Where:
- object_list is the name of the member variable to be
added to the parent instance.
- object_element_name is the element/tag of the
sub-elements. One object will be created and added to the list
(object_list) for each sub-element of this name.
Step 2. Add a template rule for the sub-element:
<xsl:template match="object_element_name">
<xsl:element name="object">
<xsl:attribute name="class">class_name</xsl:attribute>
<xsl:attribute name="module">object_classes</xsl:attribute>
<xsl:element name="tuple"/>
<xsl:element name="dictionary">
<xsl:element name="string">
<xsl:text>x</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@X"/>
</xsl:element>
<xsl:element name="string">
<xsl:text>object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./*"/>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
Where:
- object_element_name is the name of the element.
- object_class_name is the name of the Class. An instance of this
class will be created from the element.
- object_classes is the name of the module in which the class
is defined. Create a .py file with this name containing the class
definition.
Back to top
Last update: 1/4/02