Appendix A. A brief introduction to XSL

Table of Contents

XSL processing model
Context is important
Programming features
Generating HTML output
Generating formatting objects

XSL is both a transformation language and a formatting language. The XSLT transformation part lets you scan through a document's structure and rearrange its content any way you like. You can write out the content using a different set of XML tags, and generate text as needed. For example, you can scan through a document to locate all headings and then insert a generated table of contents at the beginning of the document, at the same time writing out the content marked up as HTML. XSL is also a rich formatting language, letting you apply typesetting controls to all components of your output. With a good formatting back end, it is capable of producing high quality printed pages.

An XSL stylesheet is written using XML syntax, and is itself a well-formed XML document. That makes the basic syntax familiar, and enables an XML processor to check for basic syntax errors. The stylesheet instructions use special element names, which typically begin with xsl: to distinguish them from any XML tags you want to appear in the output. The XSL namespace is identified at the top of the stylesheet file. As with other XML, any XSL elements that are not empty will require a closing tag. And some XSL elements have specific attributes that control their behavior. It helps to keep a good XSL reference book handy.

The following are examples of a simple XSL stylesheet applied to a simple XML file to generate HTML output.

Example A.1. Simple XML file

<?xml version="1.0"?>
<document>
<title>Using a mouse</title>
<para>It's easy to use a mouse. Just roll it
around and click the buttons.</para>
</document>

Example A.2. Simple XSL stylesheet

<?xml version='1.0'?>
<xsl:stylesheet
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
<xsl:output method="html"/>

<xsl:template match="document">
  <HTML><HEAD><TITLE>
    <xsl:value-of select="./title"/>
  </TITLE>
  </HEAD>
  <BODY>
    <xsl:apply-templates/>
  </BODY>
  </HTML>
</xsl:template>

<xsl:template match="title">
  <H1><xsl:apply-templates/></H1>
</xsl:template>

<xsl:template match="para">
  <P><xsl:apply-templates/></P>
</xsl:template>

</xsl:stylesheet>

Example A.3. HTML output

<HTML>
<HEAD>
<TITLE>Using a mouse</TITLE>
</HEAD>
<BODY>
<H1>Using a mouse</H1>
<P>It's easy to use a mouse. Just roll it
around and click the buttons.</P>
</BODY>
</HTML>

XSL processing model

XSL is a template language, not a procedural language. That means a stylesheet specifies a sample of the output, not a sequence of programming steps to generate it. A stylesheet consists of a mixture of output samples with instructions of what to put in each sample. Each bit of output sample and instructions is called a template.

In general, you write a template for each element type in your document. That lets you concentrate on handling just one element at a time, and keeps a stylesheet modular. The power of XSL comes from processing the templates recursively. That is, each template handles the processing of its own element, and then calls other templates to process its children, and so on. Since an XML document is always a single root element at the top level that contains all of the nested descendant elements, the XSL templates also start at the top and work their way down through the hierarchy of elements.

Take the DocBook <para> paragraph element as an example. To convert this to HTML, you want to wrap the paragraph content with the HTML tags <p> and </p>. But a DocBook <para> can contain any number of in-line DocBook elements marking up the text. Fortunately, you can let other templates take care of those elements, so your XSL template for <para> can be quite simple:

<xsl:template match="para">
  <p>
    <xsl:apply-templates/>
  </p>
</xsl:template>

The <xsl:template> element starts a new template, and its match attribute indicates where to apply the template, in this case to any <para> elements. The template says to output a literal <p> string and then execute the <xsl:apply-templates/> instruction. This tells the XSL processor to look among all the templates in the stylesheet for any that should be applied to the content of the paragraph. If each template in the stylesheet includes an <xsl:apply-templates/> instruction, then all descendants will eventually be processed. When it is through recursively applying templates to the paragraph content, it outputs the </p> closing tag.

Context is important

Since you are not writing a linear procedure to process your document, the context of where and how to apply each modular template is important. The match attribute of <xsl:template> provides that context for most templates. There is an entire expression language, XPath, for identifying what parts of your document should be handled by each template. The simplest context is just an element name, as in the example above. But you can also specify elements as children of other elements, elements with certain attribute values, the first or last elements in a sequence, and so on. The following is how the DocBook <formalpara> element is handled:

<xsl:template match="formalpara">
  <p>
    <xsl:apply-templates/>
  </p>
</xsl:template>

<xsl:template match="formalpara/title">
  <b><xsl:apply-templates/></b>
  <xsl:text> </xsl:text>
</xsl:template>

<xsl:template match="formalpara/para">
  <xsl:apply-templates/>
</xsl:template>

There are three templates defined, one for the <formalpara> element itself, and one for each of its children elements. The match attribute value formalpara/title in the second template is an XPath expression indicating a <title> element that is an immediate child of a <formalpara> element. This distinguishes such titles from other <title> elements used in DocBook. XPath expressions are the key to controlling how your templates are applied.

In general, the XSL processor has internal rules that apply templates that are more specific before templates that are less specific. That lets you control the details, but also provides a fallback mechanism to a less specific template when you do not supply the full context for every combination of elements. This feature is illustrated by the third template, for formalpara/para. By including this template, the stylesheet processes a <para> within <formalpara> in a special way, in this case by not outputting the HTML <p> tags already output by its parent. If this template had not been included, then the processor would have fallen back to the template specified by match="para" described above, which would have output a second set of <p> tags.

You can also control template context with XSL modes, which are used extensively in the DocBook stylesheets. Modes let you process the same input more than once in different ways. A mode attribute in an <xsl:template> definition adds a specific mode name to that template. When the same mode name is used in <xsl:apply-templates/>, it acts as a filter to narrow the selection of templates to only those selected by the match expression and that have that mode name. This lets you define two different templates for the same element match that are applied under different contexts. For example, there are two templates defined for DocBook <listitem> elements:

<xsl:template match="listitem">
  <li><xsl:apply-templates/></li>
</xsl:template>

<xsl:template match="listitem" mode="xref">
  <xsl:number format="1"/>
</xsl:template>

The first template is for the normal list item context where you want to output the HTML <li> tags. The second template is called with <xsl:apply-templates select="$target" mode="xref"/> in the context of processing <xref> elements. In this case the select attribute locates the ID of the specific list item and the mode attribute selects the second template, whose effect is to output its item number when it is in an ordered list. Because there are many such special needs when processing <xref> elements, it is convenient to define a mode name xref to handle them all.

Keep in mind that mode settingsnot do automatically get passed down to other templates through <xsl:apply-templates/>. You have two choices for processing children while in a template with a mode.

  • To continue using that mode, process the children with <xsl:apply-templates mode="mode", where mode is the same mode name. The processor will look for templates with that mode name that match on the child elements. There is no fallback to the templates without mode, so if a child does not have a template match with that mode, it does not get processed. If you want to fall back to the mode-less templates for such children, then include a template like the following:

    <xsl:template  match="*"  mode="mode">
      <xsl:apply-templates select="." />
    </xsl:template>

    For any child element that does not have a template in that mode, this template will cause it to be processed with the mode-less templates.

  • To use the regular mode-less templates, process the children with <xsl:apply-templates />. You can also use named templates, which do not have a mode.

Programming features

Although XSL is template-driven, it also has some features of traditional programming languages. The following are some examples from the DocBook stylesheets.

Assign a value to a variable:
<xsl:variable name="refelem" select="name($target)"/>

If statement:
<xsl:if test="$show.comments">
    <i><xsl:call-template name="inline.charseq"/></i>
</xsl:if>

Case statement:
<xsl:choose>
    <xsl:when test="@columns">
        <xsl:value-of select="@columns"/>
    </xsl:when>
    <xsl:otherwise>1</xsl:otherwise>
</xsl:choose>

Call a template by name like a subroutine, passing parameter values and accepting a return value:
<xsl:call-template name="xref.xreflabel">
   <xsl:with-param name="target" select="$target"/>
</xsl:call-template>

However, you cannot always use these constructs as you do in other programming languages. Variables in particular have very different behavior.

Using variables and parameters

XSL provides two elements that let you assign a value to a name: <xsl:variable> and <xsl:param>. These share the same name space and syntax for assigning names and values. Both can be referred to using the $name syntax. The main difference between these two elements is that a param's value acts as a default value that can be overridden when a template is called using a <xsl:with-param> element as in the last example above.

The following are two examples from DocBook:

<xsl:param name="cols">1</xsl:param>
<xsl:variable name="segnum" select="position()"/>

In both elements, the name of the parameter or variable is specified with the name attribute. So the name of the param here is cols and the name of the variable is segnum. The value of either can be supplied in two ways. The value of the first example is the text node "1" and is supplied as the content of the element. The value of the second example is supplied as the result of the expression in its select attribute, and the element itself has no content.

The feature of XSL variables that is odd to new users is that once you assign a value to a variable, you cannot assign a new value within the same scope. Doing so will generate an error. So variables are not used as dynamic storage bins they way they are in other languages. They hold a fixed value within their scope of application, and then disappear when the scope is exited. This feature is a result of the design of XSL, which is template-driven and not procedural. This means there is no definite order of processing, so you cannot rely on the values of changing variables. To use variables in XSL, you need to understand how their scope is defined.

Variables defined outside of all templates are considered global variables, and they are readable within all templates. The value of a global variable is fixed, and its global value cannot be altered from within any template. However, a template can create a local variable of the same name and give it a different value. That local value remains in effect only within the scope of the local variable.

Variables defined within a template remain in effect only within their permitted scope, which is defined as all following siblings and their descendants. To understand such a scope, you have to remember that XSL instructions are true XML elements that are embedded in an XML family hierarchy of XSL elements, often referred to as parents, children, siblings, ancestors and descendants. Taking the family analogy a step further, think of a variable assignment as a piece of advice that you are allowed to give to certain family members. You can give your advice only to your younger siblings (those that follow you) and their descendants. Your older siblings will not listen, neither will your parents or any of your ancestors. To stretch the analogy a bit, it is an error to try to give different advice under the same name to the same group of listeners (in other words, to redefine the variable). Keep in mind that this family is not the elements of your document, but just the XSL instructions in your stylesheet. To help you keep track of such scopes in hand-written stylesheets, it helps to indent nested XSL elements. The following is an edited snippet from the DocBook stylesheet file pi.xsl that illustrates different scopes for two variables:

 1 <xsl:template name="dbhtml-attribute">
 2 ...
 3    <xsl:choose>
 4       <xsl:when test="$count>count($pis)">
 5          <!-- not found -->
 6       </xsl:when>
 7       <xsl:otherwise>
 8          <xsl:variable name="pi">
 9             <xsl:value-of select="$pis[$count]"/>
10          </xsl:variable>
11          <xsl:choose>
12             <xsl:when test="contains($pi,concat($attribute, '='))">
13                <xsl:variable name="rest" \
                    select="substring-after($pi,concat($attribute,'='))"/>
14                <xsl:variable name="quote" \
                         select="substring($rest,1,1)"/>
15                <xsl:value-of \
                    select="substring-before(substring($rest,2),$quote)"/>
16             </xsl:when>
17             <xsl:otherwise>
18             ...
19             </xsl:otherwise>
20          </xsl:choose>
21       </xsl:otherwise>
22    </xsl:choose>
23 </xsl:template>

The scope of the variable pi begins on line 8 where it is defined in this template, and ends on line 20 when its last sibling ends.[1] The scope of the variable rest begins on line 13 and ends on line 15. Fortunately, line 15 outputs an expression using the value before it goes out of scope.

What happens when an <xsl:apply-templates/> element is used within the scope of a local variable? Do the templates that are applied to the document children get the variable? The answer is no. The templates that are applied are not actually within the scope of the variable. They exist elsewhere in the stylesheet and are not following siblings or their descendants.

To pass a value to another template, you pass a parameter using the <xsl:with-param> element. This parameter passing is usually done with calls to a specific named template using <xsl:call-template>, although it works with <xsl:apply-templates> too. That's because the called template must be expecting the parameter by defining it using a <xsl:param> element with the same parameter name. Any passed parameters whose names are not defined in the called template are ignored.

The following is an example of parameter passing from docbook.xsl:

<xsl:call-template name="head.content">
   <xsl:with-param name="node" select="$doc"/>
</xsl:call-template>

Here a template named head.content is being called and passed a parameter named node whose content is the value of the $doc variable in the current context. The top of that template looks like the following:

<xsl:template name="head.content">
   <xsl:param name="node" select="."/>
   ...

The template is expecting the parameter because it has a <xsl:param> defined with the same name. The value in this definition is the default value. This would be the parameter value used in the template if the template was called without passing that parameter.

Generating HTML output

You generate HTML from your DocBook XML files by applying the HTML version of the stylesheets. This is done by using the HTML driver file docbook/html/docbook.xsl as your stylesheet. That is the master stylesheet file that uses <xsl:include> to pull in the component files it needs to assemble a complete stylesheet for producing HTML.

The way the DocBook stylesheet generates HTML is to apply templates that output a mix of text content and HTML elements. Starting at the top level in the main file docbook.xsl:

<xsl:template match="/">
  <xsl:variable name="doc" select="*[1]"/>
  <html>
  <head>
    <xsl:call-template name="head.content">
      <xsl:with-param name="node" select="$doc"/>
    </xsl:call-template>
  </head>
  <body>
    <xsl:apply-templates/>
  </body>
  </html>
</xsl:template>

This template matches the root element of your input document, and starts the process of recursively applying templates. It first defines a variable named doc and then outputs two literal HTML elements <html> and <head>. Then it calls a named template head.content to process the content of the HTML <head>, closes the <head> and starts the <body>. There it uses <xsl:apply-templates/> to recursively process the entire input document. Then it just closes out the HTML file.

Simple HTML elements can be generated as literal elements as shown here. But if the HTML being output depends on the context, you need something more powerful to select the element name and possibly add attributes and their values. The following is a fragment from sections.xsl that shows how a heading tag is generated using the <xsl:element> and <xsl:attribute> elements:

 1 <xsl:element name="h{$level}">
 2   <xsl:attribute name="class">title</xsl:attribute>
 3   <xsl:if test="$level<3">
 4     <xsl:attribute name="style">clear: all</xsl:attribute>
 5   </xsl:if>
 6   <a>
 7     <xsl:attribute name="name">
 8       <xsl:call-template name="object.id"/>
 9     </xsl:attribute>
10     <b><xsl:copy-of select="$title"/></b>
11   </a>
12 </xsl:element>

This whole example is generating a single HTML heading element. Line 1 begins the HTML element definition by identifying the name of the element. In this case, the name is an expression that includes the variable $level passed as a parameter to this template. Thus a single template can generate <h1>, <h2>, etc. depending on the context in which it is called. Line 2 defines a class="title" attribute that is added to this element. Lines 3 to 5 add a style="clear all" attribute, but only if the heading level is less than 3. Line 6 opens an <a> anchor element. Although this looks like a literal output string, it is actually modified by lines 7 to 9 that insert the name attribute into the <a> element. This illustrates that XSL is managing output elements as active element nodes, not just text strings. Line 10 outputs the text of the heading title, also passed as a parameter to the template, enclosed in HTML boldface tags. Line 11 closes the anchor tag with the literal </a> syntax, while line 12 closes the heading tag by closing the element definition. Since the actual element name is a variable, it could not use the literal syntax.

As you follow the sequence of nested templates processing elements, you might be wondering how the ordinary text of your input document gets to the output. In the file docbook.xsl you will find the following template that handles any text not processed by any other template:

<xsl:template match="text()">
  <xsl:value-of select="."/>
</xsl:template>

This template's body consists of the "value" of the text node, which is just its text. In general, all XSL processors have some built-in templates to handle any content for which your stylesheet does not supply a matching template. This template serves the same function but appears explicitly in the stylesheet.

Generating formatting objects

You generate formatting objects from your DocBook XML files by applying the fo version of the stylesheets. This is done by using the fo driver file docbook/fo/docbook.xsl as your stylesheet. That is the master stylesheet file that uses <xsl:include> to pull in the component files it needs to assemble a complete stylesheet for producing formatting objects. Generating a formatting objects file is only half the process of producing typeset output. You also need an XSL-FO processor such as FOP.

The DocBook fo stylesheet works in a similar manner to the HTML stylesheet. Instead of outputting HTML tags, it outputs text marked up with <fo:something> tags. For example, to indicate that some text should be kept in-line and typeset with a monospace font, it might look like the following:

<fo:inline-sequence  font-family="monospace">/usr/man</fo:inline-sequence>

The templates in docbook/fo/inline.xsl that produce this output for a DocBook <filename> element look like the following:

<xsl:template match="filename">
  <xsl:call-template name="inline.monoseq"/>
</xsl:template>

<xsl:template name="inline.monoseq">
  <xsl:param name="content">
    <xsl:apply-templates/>
  </xsl:param>
  <fo:inline-sequence font-family="monospace">
    <xsl:copy-of select="$content"/>
  </fo:inline-sequence>
</xsl:template>

There are dozens of XSL-FO tags and attributes specified in the XSL standard. It is beyond the scope of this document to cover how all of them are used in the DocBook stylesheets. Fortunately, this is only an intermediate format that you probably will not have to deal with very much directly unless you are writing your own stylesheets.



[1] Technically, the scope extends to the end tag of the parent of the <xsl:variable> element. That is effectively the last sibling.