Back to top
It is often useful to load an XML document into Python data structures so that they can be processed. The DOM interface performs this task. However, DOM data structures are specific to DOM. Often it is desirable to load the XML document into custom or application-specific data structures.
There are a number of approaches to this problem that come to mind:
This paper presents a alternative technique. This technique described in this paper transforms the original XML document into an XML document having a cannonical form, specifically, WDDX, then use the (un-)marshaller that is distributed with PyXML to convert that into Python objects. In effect, XSLT is used to customize the conversion. The developer's control over the conversion process is encoded in an XSLT stylesheet.
This technique has the following benefits:
Note, however, that the technique we describe in this paper will work for any language for which there is an implementation of the marshaller/unmarshaller provided in generic.py (in PyXML).
The top level process is composed of the following steps:
import generic
import libxsltmod
class MsgHandler:
def __init__(self):
pass
def write(self, msg):
print '***', msg
def loadfile(inFileName, stylesheetFile):
msgHandler = MsgHandler()
s1 = libxsltmod.translate_to_string(
'f', stylesheetFile,
'f', inFileName,
msgHandler)
print s1
um = generic.Unmarshaller()
ds = um.loads(s1)
ds.show()
Some notes about this code:
<xsl:strip-space elements="tag1 tag2 ..."/>
This section describes the XML elements that we must generate. Effectively, we are describing the XML elements generated by class generic.Marshaller and function generic.dumps() and accepted by class generic.Unmarshaller and function generic.loads(). (gneric.py is in the PyXML distribution.)
To create an instance of a class, generate something like the following:
<object class="object_class_name" module="object_classes">
<tuple/>
<dictionary>
<string>member_1</string>
<string>value_1</string>
<string>member_2</string>
<string>value_2</string>
o
o
o
</dictionary>
</object>
Here are a few things to notice about this generated XML:
<string>member_variable_name</string>
<list>
<object class="object_class_name" module="object_classes">
o
o
o
</object>
o
o
o
</list>
Or:
<string>member_variable_name</string>
<list>
<string>value_1</string>
<string>value_2</string>
o
o
o
</list>
Here are a few things to notice about this generated XML:
<string>value_1</string>
To create an integer value, generate the following:
<int>101</int>
To create a float value, generate the following:
<float>1.23</float>
You can use class Marshaller in generic.py to determine the format of other
data types. The following code will print a sample of the input to the
Unmarshaller:
import generic
m = generic.Marshaller()
ds1 = ([11,22], 333, 'bbb')
s1 = m.dumps(ds1)
print s1
Back to top
<xsl:template match="object_element_name">
<xsl:element name="object">
<xsl:attribute name="class">class_name</xsl:attribute>
<xsl:attribute name="module">object_classes</xsl:attribute>
<xsl:element name="tuple"/>
<xsl:element name="dictionary">
<xsl:element name="string">
<xsl:text>member_x</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@attribute_x"/>
</xsl:element>
<xsl:element name="string">
<xsl:text>sub_object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./*"/>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
Where:
Add the following snippet to the current template:
<xsl:element name="string">
<xsl:text>member_variable_name</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@attribute_name"/>
</xsl:element>
Where:
Add the following snippet to the current template:
<xsl:element name="string">
<xsl:text>member_variable_name</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="."/>
</xsl:element>
Where:
Step 1. Add the following snippet to the parent template:
<xsl:element name="string">
<xsl:text>object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./object_element_name"/>
</xsl:element>
Where:
<xsl:template match="object_element_name">
<xsl:element name="object">
<xsl:attribute name="class">class_name</xsl:attribute>
<xsl:attribute name="module">object_classes</xsl:attribute>
<xsl:element name="tuple"/>
<xsl:element name="dictionary">
<xsl:element name="string">
<xsl:text>x</xsl:text>
</xsl:element>
<xsl:element name="string">
<xsl:value-of select="@X"/>
</xsl:element>
<xsl:element name="string">
<xsl:text>object_list</xsl:text>
</xsl:element>
<xsl:element name="list">
<xsl:apply-templates select="./*"/>
</xsl:element>
</xsl:element>
</xsl:element>
</xsl:template>
Where: