XSLT and generateDS - Analysis, Comparison, and Evaluation

Dave Kuhlman

http://www.rexx.com/~dkuhlman
Email: [email protected]

Aug 13, 2002

Front Matter

Abstract:

In this paper we consider and compare several technologies for performing transformations on XML documents. In particular we look at the use of generateDS.py to transform XML documents and compare that use with XSLT.

1 Introduction - What It Does

The problem space -- What are we trying to accomplish? In this paper we discuss technologies for performing transformations on XML documents, that is, transformations where the source is an XML document and the target (result) is any one of another XML document, an HTML or XHTML document, a plain text file, a ``special'' text file (e.g. TeX/LaTeX), etc.

In this paper, we pay special attention to the following technologies (but also mention others):

XSLT performs a transformation by using an XSLT engine and a stylesheet containing rules to transform the element tree from one XML document into another element tree, and then to generate a new document from that new element tree.
generateDS generates Python classes that represent the elements in an XML document. These classes are generated from an XML Schema which describes the XML documents to be processed. In addition, a parser is generated which, when executed, creates instances of the classes from an XML document. The classes can be extended with methods that generate a new document. Thus, the generated parser and the extended classes can perform transformations by parsing an XML document and generating a new document.

Caution -- I'm a Python lover and advocate. I'm biased toward Python as a solution to problems. I'm also the implementor of generateDS.py, so I'm likely to be partial towards it, also. In what follows, I've tried to present a balanced view, but ``let the reader beware''.

Principles:

The use of generateDS.py to perform transformations is isomorphic to XSLT up to coding style. That is, they can perform the same tasks, but use a different language and coding style. XSLT uses template rules and a stylesheet. generateDS.py uses XSchema for the definition of input documents and uses Python to add the methods that generate a new document.
generateDS.py uses methods added to data representation classes to perform a tree walk.
XSLT uses (1) patterns to match XML elements, (2) templates that specify the output to be generated for each pattern, and (3) an engine that matches the patterns and generates the output.

How to use XSLT:

Create a stylesheet file. The stylesheet file contains the template rules that guide the XSLT engine.
Create the templates (in the stylesheet file). Each template contains a pattern and an output template. The XSLT engine applies the template if the pattern matches. When the pattern matches, the XSLT engine uses the output template to generate output.
Execute the XSLT engine, applying the XSLT stylesheet to the XML document to produce the output.

How to use generateDS.py - (Details on this process are at http://www.rexx.com/~dkuhlman/generateDS.html.):

Create an XML Schema definition of the input XML document type.
Use generateDS.py to generate the data representation classes and the parser. Optionally, generate subclasses of the data representation classes.
Add export methods to the generated classes (or subclasses). These methods will most likely perform a tree walk on the data structures (classes) created by parsing the XML document. And, these methods will generate output (hint: use print or call write on an open file/stream.
In the main function, call the parse to parse the XML document and create the document tree, then call the added export method on the root object.

2 Objective Comparisons

Feature XSLT generateDS.py YAML/PyYaml

Declarative Yes No No
Requires input definition No Yes No
Table/data driven Yes No No
Uses native data structures No Yes Yes
Standardized input Yes (XML) Yes (XML) Yes (YAML)
Full programming language No Yes Yes
Requires code generation No Yes No
Supports XML Yes Yes No

Feature	XSLT	generateDS.py	YAML/PyYaml
`Declarative`	Yes	No	No
`Requires input definition`	No	Yes	No
`Table/data driven`	Yes	No	No
`Uses native data structures`	No	Yes	Yes
`Standardized input`	Yes (XML)	Yes (XML)	Yes (YAML)
`Full programming language`	No	Yes	Yes
`Requires code generation`	No	Yes	No
`Supports XML`	Yes	Yes	No

I have included YAML in this table for several reasons:

I'm interested in YAML and in figuring out what to use it for and how to use it.
YAML can be thought of as an alternative representation for structured data. It is appropriate where and when strict compatibility with XML is not a requirement.
YAML loads YAML files to native data structures. If Python is used, the data structures are Python lists, dictionaries, etc
YAML uses an indented format to store structured data in files (and strings). This format is cleaner and easier to read and edit than the equivalent XML representation, claim YAML proponents.
Transforming YAML to XML and XML to YAML is a straight forward process and is relatively easy to implement, when an interchange between YAML and XML is needed. In fact, generating YAML files from XML documents is an ideal task for generateDS.py.
There is a Python implementation of YAML. You can find it at the YAML home page

Explanation of features in the above table:

Declarative -- Is input in the form of declarations or rules rather than imperative, executable statements? For example, XSLT uses XSL stylesheets and XSL template rules, whereas generateDS.py and PyYaml use executable Python code.
Requires input definition -- Does processing require definition of the structure of input files? For example, generateDS.py requires an XSchema definition of the XML input files to be processed.
Uses native data structures -- Does processing use native data structures in the supported programming language? Example: generateDS.py uses instances of Python classes and Python lists plus simple Python data types to represent the contents of an XML file. PyYaml uses Python lists and dictionaries plus simple Python data types to represent the contents of a YAML file.
Standardized input -- Is the format of data input described in a publicly available standard? What is that standard?
Full programming language -- Does implementation of a transformation use and make available a full programming language? For example, generateDS.py and PyYaml both employ Python.
Requires code generation -- Does the system require the user to generate code before use? For example, generateDS.py requires the user/developer to generate Python code from an XSchema definition of the input files.
Supports XML -- Is XML supported? For example, XSLT uses XML to define transformations (XML/XSL stylesheets) and takes XML files as input. And, generateDS.py uses XML (XSchema) to define the structure of input files and takes XML as data input files.

Consequences:

Support for XML -- Because XSLT stylesheet are XML, there is the possibility of using XML and XSL editors to develop transformations (XSL stylesheets).
Requires code generation -- Possibly a burden. However, in the case of generateDS.py, the generated code for a specific XML Schema definition can be reused to define multiple transformations and the same document type.
Uses native data structures -- Ability to use native data structures of the programming language, is very helpful for defining transformations on XML input. This is especially true when (1) the native data structures map reasonable directly to the structures in the XML input document and (2) the native data structures are reasonably high level. For example, with generateDS.py there is a 1-to-1 correspondence between XML elements and instances of Python classes. And, elements in YAML files map very directly onto Python lists and dictionaries. Compare this with the use of DOM, which also uses native (Python) data structures, but requires laborious picking through children of children of children etc in order to find information of interest from an XML document.
Full programming language -- ``XSLT was never intended as a complete answer to the problem of transforming XML documents.'' Because XSLT lacks a complete programming language, some transformation tasks may be difficult or awkward. generateDS.py, which does provide access to a full programming language, specifically Python, can be a more complete solution and can more appropriately be applied to tasks that are awkward for XSLT. However, note that while these tasks can be solved with a full programming language, some will require a more labor to do so.

3 Subjective Comparisons

Here are some comparison points (we'll go into details later):

The use of generateDS.py is more readable for Python programmers. XSLT is (possibly) more readable for those used to reading XML. (I don't really believe this. However, it is important to recognize that different programming languages and styles will appeal to and be more understandable to different users.)
XSLT is declarative. generateDS.py is imperative and object-oriented.
XSLT uses multiple stylesheets to generate different output from the same input XML document type. generateDS.py can use multiple subclass modules with the same (super-)class module to produce different styles of output.
XSLT is most appropriate for producing ``text'' output, for example HTML, XML, and ``plain'' text. generateDS.py is suitable for producing these text formats and other formats as well, for example, generating PDF, inserting content into a relational database, updating a database or repository, etc. This is due to the fact that the export methods added to the classes generated by generateDS.py are written in Python and can do most near anything that Python can do.
Is development with XSLT easy or hard? Can XSLT stylesheets be debugged?

3.1 Readability

XSLT is exceptionally verbose. Your mileage may vary, but even for those for who like to read XML, XSLT seems cluttered and ``dense''. In contrast, Python, which is used to encode export methods for is an exceptionally clear and readable language. And, the separation of representation classes from the classes that contain the export functions helps to make the transformation and generation process especially clear.

However, with respect to XSLT the readability of XML and XSLT stylesheets may be a red herring. There are editors for XSLT stylesheets which hide the actual XML and present a graphical (or alternative) view of the stylesheet. I do not have experience with these tools, so I can't say how helpful they are. Perhaps the best that I can do is to say that if you intend to do serious work with XSLT, you certainly should evaluate several of these tools. I'm guessing that there are some who would say that XSLT is not even intended to be edited directly and that it should only be edited with an XSLT editor.

Suggestion -- If you attempt to evaluate the readability of XSLT stylesheets, you may want to consider two dimensions to readability:

Is the syntax transparent, i.e. can you perceive the intension of a snippet of an XSLT through the syntax?
Is the logic transparent? And, can you predict the behavior of a stylesheet by reading the stylesheet? Here are a few additional and analogous questions: If you make a change to a stylesheet in one location, does it cause changes in the behavior of other parts of the stylesheet? In other words, do changes have objectionable non-local effects?

3.2 Declarative vs. Imperative

Since XSLT is declarative, the logic is in a separate program, the XSLT engine.

Since generateDS.py is imperative all control logic is explicit and visible in the added export methods. For example, an export method in a parent node (XML element) explicitly calls the export methods in child nodes.

For Pythonistas, ``Explicit is better than implicit.'' (See The Zen of Python (by Tim Peters).) If you also feel this way, the fact that the logic of XSLT processing is in some sense hidden in the XSLT engine may be a negative for you. Although an XSLT editor may make the stylesheet more readable, controlling the logic of an XSLT engine may remain somewhat mysterious.

3.3 Structure and Organization

In XSLT, input processing (pattern matching) is mixed with output generation. Each template rule contains both a pattern and an output template. In generateDS.py input processing and output generation can be separated. With generateDS.py, subclasses can be generated in a separate file (from the superclasses), so that the output processing (the export methods, which are added to the subclasses) can be in a separate file from the data representation and parsing classes.

Some user/developers will prefer the unification or encapsulation of the pattern matching code with the output generation code that XSLT provides. Others will prefer the ability to organize their code in a way that separates or localizes output generation code from the parsing (recognizing) code. generateDS.py provides this separation.

The template rules in an XSLT stylesheet form an unordered collection. The export methods used in generateDS.py are organized in classes that follow the hierarchical structure of the XML document type. However, the classes can be organized as an unordered collection. In practice, generateDS.py generates the classes (roughly) in order from top to bottom.

One thing to realize is that in declarative and rule-based systems(of which XSLT is an example), the addition of one (or more) rules to an existing system can change the behavior of that system in ways that are difficult to predict. The same thing can be said of changes made to the logic (executable code) in an imperative system (e.g. generateDS.py). It is a subjective judgment as to which (changes to a rule-based system or changes to an imperative system) are more difficult to manage, predict, etc.

3.4 Development and Debugging

I'm going to have to leave these questions to those who have more experience with XSLT.

However, for those of you who seek to evaluate this aspect of XSLT, here are a few questions you may want to ask:

Do useful debugging tools and techniques exist for XSLT?
Is it possible to trace the evaluation of an XML document under an XSLT stylesheet?
Can a developer produce a listing of the template rules attempted and selected and the XML elements evaluated as rules are selected?

With respect to transformations implemented with generateDS.py, available debugging techniques are those available for debugging any Python code, e.g. pdb (the Python debugger) or another Python debugger if you have one, print statements, etc.

4 Evaluation

In this section, we try to offer some guidance about when to use each of these two technologies and what to use each for.

If you already know XSLT and have experience with it and especially if you do not know Python, lean toward XSLT. If you already know Python and do not know XSLT, lean toward generateDS.py.

If you need to generate HTML, XML, or plain text, then XSLT seems appropriate. generateDS.py is appropriate for these output types, but also for generating PDF (e.g. using the ReportLab library), for updating relational (and other) databases, etc. Note that if you have a need to generate PDF from XML, you should also look at the ReportLab RML2PDF package. It's proprietary and a bit expensive. However, from my quick reading at ReportLab's Web site, it looks very powerful and quite well thought out. See RML2PDF: the XML Based Reporting Solution.

Additional suggestions on when and where to use generateDS.py:

Transformations to structured text files: XML to HTML; XML to XML; XML to text.
Transformations to other formats: XML to PDF (but look at the ReportLab RML2PDF package, too); XML to database; etc.
Analysis of XML documents -- Sometimes the process of generating output requires analysis and search of the source (XML) document. generateDS.py is especially appropriate when complex analysis, searching, and testing is required.

It is claimed that XSLT is not a solution to all (transformation) problems and, for that, a full programming language is needed. generateDS.py can be viewed as an attempt to make the alternative more usable. generateDS.py provides the parser and data structures that make it easier to write transformations on XML easier.

One approach or view -- Use a full programming language for business logic; use XSLT for styling and formatting (e.g. to generate HTML). Here are several approaches for doing so:

Use business logic in a full programming language to produce XML documents. Then use XSLT to transform those documents into a viewable format, e.g. HTML.
Use business logic in a full programming language to directly produce a viewable format, e.g. HTML.
Use generateDS.py to transform XML to XML. Then use XSLT to transform the resulting XML documents into a viewable format, e.g. HTML.
You could even generate XML documents of a generic type from YAML files, then use generateDS.py to implement transformations from that common XML document type to a variety of XML document types, and finally use XSLT to transform those XML documents to a viewable format.

As you can see, if you put all these tools in your toolkit, you have a lot of options.

License, See also, Etc.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

See Also:

Dave's Web Site: for more software and information on using Python for XML and the Web

The main Python Web Site: for more information on Python

The Python XML Special Interest Group: for more information on processing XML with Python

ReportLab: for more information on the ReportLab library and RML2PDF for generating PDF documents

The XML C library for Gnome -- XSLT: for more information on the Xmlsoft XSLT library (along with its Python interface)

XSL Considered Harmful, by Michael Leventhal: for an article critical of XSL

The YAML home page: for more information on YAML (and PyYaml)

About this document ...

XSLT and generateDS - Analysis, Comparison, and Evaluation, Aug 13, 2002

This document was generated using the LaTeX2HTML translator.

LaTeX2HTML is Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds, and Copyright © 1997, 1998, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The application of LaTeX2HTML to the Python documentation has been heavily tailored by Fred L. Drake, Jr. Original navigation icons were contributed by Christopher Petrilli.