|
|
|
XSLT and generateDS - Analysis, Comparison, and Evaluation |
|
|
XSLT and generateDS - Analysis, Comparison, and Evaluation
Dave Kuhlman
http://www.rexx.com/~dkuhlman
Email: [email protected]
Aug 13, 2002
Abstract:
In this paper we consider and compare several technologies for
performing transformations on XML documents. In particular we look
at the use of generateDS.py to transform XML documents and
compare that use with XSLT.
1 Introduction - What It Does
The problem space -- What are we trying to accomplish? In this paper
we discuss technologies for performing transformations on XML documents,
that is, transformations where the source is an XML document and the
target (result) is any one of another XML document, an HTML or XHTML
document, a plain text file, a ``special'' text file (e.g. TeX/LaTeX),
etc.
In this paper, we pay special attention to the following technologies
(but also mention others):
- XSLT performs a transformation by using an
XSLT engine and a stylesheet containing rules to transform
the element tree from one XML document into another element tree,
and then to generate a new document from that new element tree.
- generateDS generates Python classes that represent
the elements in an XML document. These classes are generated from
an XML Schema which describes the XML documents to be processed. In
addition, a parser is generated which, when executed, creates
instances of the classes from an XML document. The classes can be
extended with methods that generate a new document. Thus, the
generated parser and the extended classes can perform
transformations by parsing an XML document and generating a new
document.
Caution -- I'm a Python lover and advocate. I'm biased
toward Python as a solution to problems. I'm also the implementor
of generateDS.py, so I'm likely to be partial towards it,
also. In what follows, I've tried to present a balanced view, but
``let the reader beware''.
Principles:
- The use of generateDS.py to perform transformations
is isomorphic to XSLT up to coding style. That is, they
can perform the same tasks, but use a different language and coding
style. XSLT uses template rules and a stylesheet.
generateDS.py uses XSchema for the definition of
input documents and uses Python to add the methods that generate a
new document.
- generateDS.py uses methods added to data
representation classes to perform a tree walk.
- XSLT uses (1) patterns to match XML elements, (2)
templates that specify the output to be generated for each pattern,
and (3) an engine that matches the patterns and generates the
output.
How to use XSLT:
- Create a stylesheet file. The stylesheet file contains the
template rules that guide the XSLT engine.
- Create the templates (in the stylesheet file). Each template
contains a pattern and an output template. The XSLT
engine applies the template if the pattern matches. When the
pattern matches, the XSLT engine uses the output template
to generate output.
- Execute the XSLT engine, applying the XSLT
stylesheet to the XML document to produce the output.
How to use generateDS.py - (Details on this process are at
http://www.rexx.com/~dkuhlman/generateDS.html.):
- Create an XML Schema definition of the input XML document type.
- Use generateDS.py to generate the data representation classes
and the parser. Optionally, generate subclasses of the data representation
classes.
- Add export methods to the generated classes (or
subclasses). These methods will most likely perform a tree walk on
the data structures (classes) created by parsing the XML document.
And, these methods will generate output (hint: use print
or call write on an open file/stream.
- In the main function, call the parse to parse the
XML document and create the document tree, then call the added
export method on the root object.
2 Objective Comparisons
Declarative |
Yes |
No |
No |
Requires input definition |
No |
Yes |
No |
Table/data driven |
Yes |
No |
No |
Uses native data structures |
No |
Yes |
Yes |
Standardized input |
Yes (XML) |
Yes (XML) |
Yes (YAML) |
Full programming language |
No |
Yes |
Yes |
Requires code generation |
No |
Yes |
No |
Supports XML |
Yes |
Yes |
No |
I have included YAML in this table for several reasons:
- I'm interested in YAML and in figuring out what to use it for
and how to use it.
- YAML can be thought of as an alternative representation
for structured data. It is appropriate where and when strict
compatibility with XML is not a requirement.
- YAML loads YAML files to native data structures. If Python is
used, the data structures are Python lists, dictionaries, etc
- YAML uses an indented format to store structured
data in files (and strings). This format is cleaner and easier to
read and edit than the equivalent XML representation, claim
YAML proponents.
- Transforming YAML to XML and XML to YAML is a straight forward
process and is relatively easy to implement, when an interchange
between YAML and XML is needed. In fact, generating YAML files from
XML documents is an ideal task for generateDS.py.
- There is a Python implementation of YAML. You can find it at
the YAML home page
Explanation of features in the above table:
- Declarative -- Is input in the form of declarations or rules
rather than imperative, executable statements? For example,
XSLT uses XSL stylesheets and XSL template rules, whereas
generateDS.py and PyYaml use executable Python
code.
- Requires input definition -- Does processing require
definition of the structure of input files? For example,
generateDS.py requires an XSchema definition of the XML
input files to be processed.
- Uses native data structures -- Does processing use native
data structures in the supported programming language? Example:
generateDS.py uses instances of Python classes and Python
lists plus simple Python data types to represent the contents of an
XML file. PyYaml uses Python lists and dictionaries plus simple
Python data types to represent the contents of a YAML file.
- Standardized input -- Is the format of data input described
in a publicly available standard? What is that standard?
- Full programming language -- Does implementation of a
transformation use and make available a full programming language?
For example, generateDS.py and PyYaml both employ Python.
- Requires code generation -- Does the system require the user
to generate code before use? For example, generateDS.py
requires the user/developer to generate Python code from an XSchema
definition of the input files.
- Supports XML -- Is XML supported? For example,
XSLT uses XML to define transformations (XML/XSL
stylesheets) and takes XML files as input. And,
generateDS.py uses XML (XSchema) to define the structure
of input files and takes XML as data input files.
Consequences:
- Support for XML -- Because XSLT stylesheet are XML,
there is the possibility of using XML and XSL editors to develop
transformations (XSL stylesheets).
- Requires code generation -- Possibly a burden. However, in
the case of generateDS.py, the generated code for a
specific XML Schema definition can be reused to define multiple
transformations and the same document type.
- Uses native data structures -- Ability to use native data
structures of the programming language, is very helpful for
defining transformations on XML input. This is especially true when
(1) the native data structures map reasonable directly to the
structures in the XML input document and (2) the native data
structures are reasonably high level. For example, with
generateDS.py there is a 1-to-1 correspondence between XML
elements and instances of Python classes. And, elements in YAML
files map very directly onto Python lists and dictionaries. Compare
this with the use of DOM, which also uses native (Python) data
structures, but requires laborious picking through children of
children of children etc in order to find information of interest
from an XML document.
- Full programming language -- ``XSLT was never intended as a
complete answer to the problem of transforming XML documents.''
Because XSLT lacks a complete programming language, some
transformation tasks may be difficult or awkward.
generateDS.py, which does provide access to a full
programming language, specifically Python, can be a more complete
solution and can more appropriately be applied to tasks that are
awkward for XSLT. However, note that while these tasks
can be solved with a full programming language, some will
require a more labor to do so.
3 Subjective Comparisons
Here are some comparison points (we'll go into details later):
- The use of generateDS.py is more readable for Python
programmers. XSLT is (possibly) more readable for those used to
reading XML. (I don't really believe this. However, it is
important to recognize that different programming languages and
styles will appeal to and be more understandable to different
users.)
- XSLT is declarative. generateDS.py is imperative
and object-oriented.
- XSLT uses multiple stylesheets to generate different
output from the same input XML document type. generateDS.py
can use multiple subclass modules with the same (super-)class module
to produce different styles of output.
- XSLT is most appropriate for producing ``text''
output, for example HTML, XML, and ``plain'' text.
generateDS.py is suitable for producing these text formats
and other formats as well, for example, generating PDF,
inserting content into a relational database, updating a database or
repository, etc. This is due to the fact that the export methods
added to the classes generated by generateDS.py are
written in Python and can do most near anything that Python can do.
- Is development with XSLT easy or hard? Can
XSLT stylesheets be debugged?
XSLT is exceptionally verbose. Your mileage may vary, but
even for those for who like to read XML, XSLT seems cluttered and
``dense''. In contrast, Python, which is used to encode export
methods for is an exceptionally clear and readable language. And,
the separation of representation classes from the classes that
contain the export functions helps to make the transformation and
generation process especially clear.
However, with respect to XSLT the readability of XML and
XSLT stylesheets may be a red herring. There are editors for XSLT
stylesheets which hide the actual XML and present a graphical (or
alternative) view of the stylesheet. I do not have experience with
these tools, so I can't say how helpful they are. Perhaps the best
that I can do is to say that if you intend to do serious work with
XSLT, you certainly should evaluate several of these tools. I'm
guessing that there are some who would say that XSLT is
not even intended to be edited directly and that it should
only be edited with an XSLT editor.
Suggestion -- If you attempt to evaluate the readability of XSLT
stylesheets, you may want to consider two dimensions to readability:
- Is the syntax transparent, i.e. can you perceive the intension of
a snippet of an XSLT through the syntax?
- Is the logic transparent? And, can you predict the behavior
of a stylesheet by reading the stylesheet? Here are a few
additional and analogous questions: If you make a change to a
stylesheet in one location, does it cause changes in the behavior of
other parts of the stylesheet? In other words, do changes have
objectionable non-local effects?
Since XSLT is declarative, the logic is in a separate program,
the XSLT engine.
Since generateDS.py is imperative all control logic is
explicit and visible in the added export methods. For example, an
export method in a parent node (XML element) explicitly calls the
export methods in child nodes.
For Pythonistas, ``Explicit is better than implicit.'' (See
The Zen of Python (by Tim Peters).) If you also feel this
way, the fact that the logic of XSLT processing is in some sense
hidden in the XSLT engine may be a negative for you. Although an
XSLT editor may make the stylesheet more readable, controlling the
logic of an XSLT engine may remain somewhat mysterious.
In XSLT, input processing (pattern matching) is mixed with
output generation. Each template rule contains both a pattern and
an output template. In generateDS.py input processing and
output generation can be separated. With generateDS.py,
subclasses can be generated in a separate file (from the
superclasses), so that the output processing (the export methods,
which are added to the subclasses) can be in a separate file from
the data representation and parsing classes.
Some user/developers will prefer the unification or encapsulation of
the pattern matching code with the output generation code that
XSLT provides. Others will prefer the ability to organize
their code in a way that separates or localizes output generation
code from the parsing (recognizing) code. generateDS.py
provides this separation.
The template rules in an XSLT stylesheet form an
unordered collection. The export methods used in
generateDS.py are organized in classes that follow the
hierarchical structure of the XML document type. However, the
classes can be organized as an unordered collection. In practice,
generateDS.py generates the classes (roughly) in order
from top to bottom.
One thing to realize is that in declarative and rule-based
systems(of which XSLT is an example), the addition of one
(or more) rules to an existing system can change the behavior of
that system in ways that are difficult to predict. The same thing
can be said of changes made to the logic (executable code) in an
imperative system (e.g. generateDS.py). It is a
subjective judgment as to which (changes to a rule-based system or
changes to an imperative system) are more difficult to manage,
predict, etc.
I'm going to have to leave
these questions to those who have more experience with
XSLT.
However, for those of you who seek to evaluate this aspect of
XSLT, here are a few questions you may want to ask:
- Do useful debugging tools and techniques exist for
XSLT?
- Is it possible to trace the evaluation of an XML document
under an XSLT stylesheet?
- Can a developer produce a listing of the template rules
attempted and selected and the XML elements evaluated as rules are
selected?
With respect to transformations implemented with
generateDS.py, available debugging techniques are those
available for debugging any Python code, e.g. pdb (the
Python debugger) or another Python debugger if you have one,
print statements, etc.
4 Evaluation
In this section, we try to offer some guidance about when to use
each of these two technologies and what to use each for.
If you already know XSLT and have experience with it and
especially if you do not know Python, lean toward XSLT.
If you already know Python and do not know XSLT, lean toward
generateDS.py.
If you need to generate HTML, XML, or plain text, then
XSLT seems appropriate. generateDS.py is
appropriate for these output types, but also for generating PDF
(e.g. using the ReportLab library), for updating relational (and
other) databases, etc. Note that if you have a need to generate PDF
from XML, you should also look at the ReportLab RML2PDF package.
It's proprietary and a bit expensive. However, from my quick
reading at ReportLab's Web site, it looks very powerful and quite
well thought out. See RML2PDF: the XML Based Reporting
Solution.
Additional suggestions on when and where to use
generateDS.py:
- Transformations to structured text files: XML to HTML; XML to
XML; XML to text.
- Transformations to other formats: XML to PDF (but look at the
ReportLab RML2PDF package, too); XML to database; etc.
- Analysis of XML documents -- Sometimes the process of generating
output requires analysis and search of the source (XML) document.
generateDS.py is especially appropriate when complex
analysis, searching, and testing is required.
It is claimed that XSLT is not a
solution to all (transformation) problems and, for that, a full
programming language is needed. generateDS.py can be
viewed as an attempt to make the alternative more usable.
generateDS.py provides the parser and data structures that
make it easier to write transformations on XML easier.
One approach or view -- Use a full programming language for
business logic; use XSLT for styling and formatting
(e.g. to generate HTML). Here are several approaches for doing so:
- Use business logic in a full programming language to produce
XML documents. Then use XSLT to transform those documents
into a viewable format, e.g. HTML.
- Use business logic in a full programming language to directly
produce a viewable format, e.g. HTML.
- Use generateDS.py to transform XML to XML. Then use
XSLT to transform the resulting XML documents into a
viewable format, e.g. HTML.
- You could even generate XML documents of a generic type from
YAML files, then use generateDS.py to implement
transformations from that common XML document type to a variety of
XML document types, and finally use XSLT to transform
those XML documents to a viewable format.
As you can see, if you put all these tools in your toolkit, you have
a lot of options.
License, See also, Etc.
Copyright (c) 2002 Dave Kuhlman
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
XSLT and generateDS - Analysis, Comparison, and Evaluation,
Aug 13, 2002
This document was generated using the
LaTeX2HTML translator.
LaTeX2HTML is Copyright ©
1993, 1994, 1995, 1996, 1997, Nikos
Drakos, Computer Based Learning Unit, University of
Leeds, and Copyright © 1997, 1998, Ross
Moore, Mathematics Department, Macquarie University,
Sydney.
The application of
LaTeX2HTML to the Python
documentation has been heavily tailored by Fred L. Drake,
Jr. Original navigation icons were contributed by Christopher
Petrilli.