XSLT and generateDS - Analysis, Comparison, and Evaluation

Dave Kuhlman

http://www.rexx.com/~dkuhlman
Email:

Aug 13, 2002

Front Matter

Abstract:

In this paper we consider and compare several technologies for performing transformations on XML documents. In particular we look at the use of generateDS.py to transform XML documents and compare that use with XSLT.



Contents

 
1 Introduction - What It Does

The problem space -- What are we trying to accomplish? In this paper we discuss technologies for performing transformations on XML documents, that is, transformations where the source is an XML document and the target (result) is any one of another XML document, an HTML or XHTML document, a plain text file, a ``special'' text file (e.g. TeX/LaTeX), etc.

In this paper, we pay special attention to the following technologies (but also mention others):

Caution -- I'm a Python lover and advocate. I'm biased toward Python as a solution to problems. I'm also the implementor of generateDS.py, so I'm likely to be partial towards it, also. In what follows, I've tried to present a balanced view, but ``let the reader beware''.

Principles:

How to use XSLT:

How to use generateDS.py - (Details on this process are at http://www.rexx.com/~dkuhlman/generateDS.html.):

 
2 Objective Comparisons

Feature  XSLT                      generateDS.py  YAML/PyYaml 
Declarative Yes No No
Requires input definition No Yes No
Table/data driven Yes No No
Uses native data structures No Yes Yes
Standardized input Yes (XML) Yes (XML) Yes (YAML)
Full programming language No Yes Yes
Requires code generation No Yes No
Supports XML Yes Yes No

I have included YAML in this table for several reasons:

Explanation of features in the above table:

Consequences:

 
3 Subjective Comparisons

Here are some comparison points (we'll go into details later):

3.1 Readability

XSLT is exceptionally verbose. Your mileage may vary, but even for those for who like to read XML, XSLT seems cluttered and ``dense''. In contrast, Python, which is used to encode export methods for is an exceptionally clear and readable language. And, the separation of representation classes from the classes that contain the export functions helps to make the transformation and generation process especially clear.

However, with respect to XSLT the readability of XML and XSLT stylesheets may be a red herring. There are editors for XSLT stylesheets which hide the actual XML and present a graphical (or alternative) view of the stylesheet. I do not have experience with these tools, so I can't say how helpful they are. Perhaps the best that I can do is to say that if you intend to do serious work with XSLT, you certainly should evaluate several of these tools. I'm guessing that there are some who would say that XSLT is not even intended to be edited directly and that it should only be edited with an XSLT editor.

Suggestion -- If you attempt to evaluate the readability of XSLT stylesheets, you may want to consider two dimensions to readability:

3.2 Declarative vs. Imperative

Since XSLT is declarative, the logic is in a separate program, the XSLT engine.

Since generateDS.py is imperative all control logic is explicit and visible in the added export methods. For example, an export method in a parent node (XML element) explicitly calls the export methods in child nodes.

For Pythonistas, ``Explicit is better than implicit.'' (See The Zen of Python (by Tim Peters).) If you also feel this way, the fact that the logic of XSLT processing is in some sense hidden in the XSLT engine may be a negative for you. Although an XSLT editor may make the stylesheet more readable, controlling the logic of an XSLT engine may remain somewhat mysterious.

3.3 Structure and Organization

In XSLT, input processing (pattern matching) is mixed with output generation. Each template rule contains both a pattern and an output template. In generateDS.py input processing and output generation can be separated. With generateDS.py, subclasses can be generated in a separate file (from the superclasses), so that the output processing (the export methods, which are added to the subclasses) can be in a separate file from the data representation and parsing classes.

Some user/developers will prefer the unification or encapsulation of the pattern matching code with the output generation code that XSLT provides. Others will prefer the ability to organize their code in a way that separates or localizes output generation code from the parsing (recognizing) code. generateDS.py provides this separation.

The template rules in an XSLT stylesheet form an unordered collection. The export methods used in generateDS.py are organized in classes that follow the hierarchical structure of the XML document type. However, the classes can be organized as an unordered collection. In practice, generateDS.py generates the classes (roughly) in order from top to bottom.

One thing to realize is that in declarative and rule-based systems(of which XSLT is an example), the addition of one (or more) rules to an existing system can change the behavior of that system in ways that are difficult to predict. The same thing can be said of changes made to the logic (executable code) in an imperative system (e.g. generateDS.py). It is a subjective judgment as to which (changes to a rule-based system or changes to an imperative system) are more difficult to manage, predict, etc.

3.4 Development and Debugging

I'm going to have to leave these questions to those who have more experience with XSLT.

However, for those of you who seek to evaluate this aspect of XSLT, here are a few questions you may want to ask:

With respect to transformations implemented with generateDS.py, available debugging techniques are those available for debugging any Python code, e.g. pdb (the Python debugger) or another Python debugger if you have one, print statements, etc.

 
4 Evaluation

In this section, we try to offer some guidance about when to use each of these two technologies and what to use each for.

If you already know XSLT and have experience with it and especially if you do not know Python, lean toward XSLT. If you already know Python and do not know XSLT, lean toward generateDS.py.

If you need to generate HTML, XML, or plain text, then XSLT seems appropriate. generateDS.py is appropriate for these output types, but also for generating PDF (e.g. using the ReportLab library), for updating relational (and other) databases, etc. Note that if you have a need to generate PDF from XML, you should also look at the ReportLab RML2PDF package. It's proprietary and a bit expensive. However, from my quick reading at ReportLab's Web site, it looks very powerful and quite well thought out. See RML2PDF: the XML Based Reporting Solution.

Additional suggestions on when and where to use generateDS.py:

It is claimed that XSLT is not a solution to all (transformation) problems and, for that, a full programming language is needed. generateDS.py can be viewed as an attempt to make the alternative more usable. generateDS.py provides the parser and data structures that make it easier to write transformations on XML easier.

One approach or view -- Use a full programming language for business logic; use XSLT for styling and formatting (e.g. to generate HTML). Here are several approaches for doing so:

As you can see, if you put all these tools in your toolkit, you have a lot of options.

 
License, See also, Etc.

Copyright (c) 2002 Dave Kuhlman

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

See Also:

Dave's Web Site
for more software and information on using Python for XML and the Web

The main Python Web Site
for more information on Python

The Python XML Special Interest Group
for more information on processing XML with Python

ReportLab
for more information on the ReportLab library and RML2PDF for generating PDF documents

The XML C library for Gnome -- XSLT
for more information on the Xmlsoft XSLT library (along with its Python interface)

XSL Considered Harmful, by Michael Leventhal
for an article critical of XSL

The YAML home page
for more information on YAML (and PyYaml)

About this document ...

XSLT and generateDS - Analysis, Comparison, and Evaluation, Aug 13, 2002

This document was generated using the LaTeX2HTML translator.

LaTeX2HTML is Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds, and Copyright © 1997, 1998, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The application of LaTeX2HTML to the Python documentation has been heavily tailored by Fred L. Drake, Jr. Original navigation icons were contributed by Christopher Petrilli.