Chapter 36. Data, XML, and XPath

Table of Contents

1. Data in LZX
2. What is XML?
2.1. Whitespace
3. XPath
3.1. Supported XPath functionality
3.2. More XPath Documentation

The term data refers to numbers, characters, or text in a form that can be displayed or manipulated by a Laszlo application. The specific form that data takes in a Laszlo application is either an XML document (or fragment) or a Javascript object. This chapter contains a brief introduction to XML and the associated XPath syntax. Javascript objects are described in the section on Objects in the Chapter 2, Language Preliminaries.

1. Data in LZX

LZX is designed to make it easy to manipulate data and tie that data to a user interface. In particular, LZX provides for:

  • embedding data directly into an application

  • receiving data from or sending data to a remote data source at runtime

  • receiving data from or sending data to a web service

  • creating and manipulating data at runtime

  • binding data to the user-interface declaratively as well as with script

For a gentle introduction to databinding and manipulation in OpenLaszlo applications, you may start with the tutorials

2. What is XML?

Chances are, if you're reading this document, you already know something about XML. Briefly put, XML is a markup language for describing structured data. XML syntax is very well-defined. This enables a large number of systems that understand data encoded as XML to inter-operate. (LZX itself is actually an application of XML [see Chapter 2, Language Preliminaries]).

If you don't already know what the words document, element and attribute mean in the context of XML, you should probably read one of the following decent introductions or grab a book:

In general, the structure of an XML document is hierarchical with nodes in the tree called elements and data associated with each node called an attribute. The following is a sample XML document:

Example 36.1. An XML Document

<?xml version="1.0" encoding="ISO-8859-1"?>
<bookshelf>
  <book binding="paperback">
    <title>Acts of the Apostles</title>
    <author>John F.X. Sundman</author>
    <publisher>Rosalita Associates</publisher>
    <price>15.00</price>
    <year>1999</year>
    <category>thriller</category>
    <rating>4.5</rating>
  </book>
  <book binding="casebound">
    <title>Shock</title>
    <author>Robin Cook</author>
    <publisher>Putnam</publisher>
    <price>24.95</price>
    <year>2001</year>
    <category>thriller</category>
    <rating>3.5</rating>
  </book>
  <book binding="paperback">
    <title>Cheap Complex Devices</title>
    <editor>John Compton Sundman</editor>
    <publisher>Rosalita Associates</publisher>
    <price>11.00</price>
    <year>2002</year>
    <category>metafiction</category>
    <rating>5.0</rating>
  </book>
</bookshelf>  

As all XML documents must, it has exactly one outer-most, or document root element: <bookshelf>. The root element has three child elements all named <book>. Each element has a single attribute named "binding". Inside each <book> are several other elements. In general XML documents can be arbitrarily deep and the structure of elements and attributes is highly variable.

For those interested in the details, the XML specification itself can be found at the W3C website. OpenLaszlo supports the full XML 1.0 specification with the following exceptions:

  • Limited support for namespaces. Namespaces are stripped when the XML document is parsed.

  • Limited character set support. (Only 8-bit characters from the Microsoft Windows Cp1252 character set are supported regardless of the declaracter character set in the XML file.)

  • No support for external entity declarations.

  • There are some restrictions on data size. There is a maximum of 64 KBytes for the text content for an element and a maximum of 64 KBytes for the combined length of an element's attributes and the name of the element.

2.1. Whitespace

Handling of "whitespace" (spaces, tabs, linefeeds and carriage returns) is one of the reasons why XML can be problematic as an unambiguous data interchange format, because applications make different assumptions about how whitespace is to be handled. In OpenLaszlo applications this problem is complicated by the fact that whitespace is handled differently on proxied and SOLO applications.

In OpenLaszlo applications, the runtime doesn't trim whitespace. Rather, it totally removes text nodes which have all whitespace. Like, for example

<foo>
<bar/>

</foo>

would have two text nodes of all whitespace, before and after the <bar> tag.

This is because programs typically don't expect to get all those whitespace nodes, and furthermore the client XML parser used in SOLO applications cannot handle data that does not conform. But it does mean that you can't have an all-whitespace text content. You get no node at all instead, which in some cases is not really the right thing either.

The lesson here is that handling whitespace can be tricky and that you should pay careful attention to it if your data does not appear as you expect, especially if the behavior changes between SWF and DHTML implementations of the same program.

[Warning]
[DHTML]

For apps running in DHTML, in SOLO mode, the web browser's XML parser is very picky about the data source. This may result in applications which run when compiled to SWF failing to run when compiled to SWF.

The server delivering the data must mark the HTTP content type header as text/xml and it doesn't hurt to have an <?xml ... ?> declaration either, and no whitespace between the start of the file, the <?xml...?> declaration, and the start of the data.

In a JSP file, you can do this at the top, to avoid whitespace:

<%@ page import="java.util.*" %><%@ page import="java.io.*" %><%@ page
contentType="text/xml; charset=UTF-8" %><?xml version='1.0' encoding='UTF-8'
standalone='yes' ?><%@ page import="java.util.*,org.jdom.output.*"%>

3. XPath

LZX data access and binding makes heavy use of the W3C XPath standard for identifying parts of an XML document. LZX supports a subset of Xpath. The complete list of Xpath expressions supported in LZX is listed in a table in a later section.

Because XML documents have a tree structure, XPath is basically a set of syntax rules for identifying tree nodes. XPath rules are based on a path notation, hence the name. XPath includes expressions and a library of functions for manipulating data. For example, the name() function returns the name of a node, and text() returns its text content. Thus XPath serves as a pattern-matching language uniquely suited to matching patterns in XML documents.

XPath notation is similar to the notation used to identify files in modern operating systems. Paths can be relative or absolute; absolute paths start at the topmost node, called the root, and begin with the slash (/) character. The language of genealogy is used to denote the relationship of nodes to their near neighbors. A node can have, for example, a parent, a grandparent, children, grandchildren, and siblings.

The slash / is used to separate parents from children. Consider the XML document example above. The XPath expression /bookshelf/book selects all book elements. The XPath expression /bookshelf/book/title selects all titles, and so forth.

Square brackets are used to further specify elements. For example, the XPath expression /bookshelf/book[1] selects the first child element of the element book.

The @ character is used as a shorthand to refer to an XML attribute. Thus, /bookshelf/book[@price] selects all books with a price attribute.

The XPath expression/bookshelf/book[@binding='paperback'] selects the nodes for books having the attribute paperback.

The XPath expression /bookshelf/book[@price]/@price selects the prices of all books with price attributes.

The concepts involved in using XPath are few and simple, and the notation is straightforward. Nevertheless, by using XPath functions it is possible to perform increasingly sophisticated tests on XML nodes.

The full power of this syntax/pattern-matching language can be seen in programming examples, for example, such as can be written in XSLT. The XPath specification is online at http://www.w3.org/TR/xpath.

See the Chapter 37, Data Access and Binding for an explanation of how XPath is used in LZX to provide powerful databinding and data manipulation.

3.1. Supported XPath functionality

XPath is an extensive specification that is largely, but not entirely, implemented in LZX.

The following table shows XPath functionality implemented in OpenLaszlo:

Example 36.2. Datasets

<canvas height="80" width="500" >
  <dataset name="myData">
    <myXML>
      <person show="simpsons">
        <firstName>Homer</firstName>
        <lastName>Simpson</lastName>
      </person>
      <person show="simpsons">
        <firstName>Marge</firstName>
        <lastName>Simpson</lastName>
      </person>
      <person show="simpsons">
        <firstName>Montgomery</firstName>
        <lastName>Burns</lastName>
      </person>
    </myXML>
  </dataset>
</canvas>

Example Meaning In this case
myData:/myXML[1]/person[1]
Just the first "person" node. Homer
myData:/myXML[1]/person
All the "person" nodes Homer, Marge, Montgomery
myData:/myXML[1]/person[2-3]
"person" nodes 2 to 3 inclusive Marge, Montgomery
myData:/myXML[1]/person[2-]
"person" nodes 2 and onwards Marge, Montgomery
myData:/myXML[1]/person[-2]
"person" nodes up to and including 2 Homer, Marge
myData:/myXML[1]/person[@show]
All "person" nodes that have a "show" attribute Homer, Marge, Montgomery
myData:/myXML[1]/person[@show = 'simpsons']
All "person" nodes that have a "show" attribute which equals "Simpsons" (compare is case-sensitive) Homer, Marge, Montgomery
myData:/myXML/*/firstname
All "firstname" nodes under any node in "myXML" Homer, Marge, Montgomery
Attributes and Functions    
myData:/myXML[1]/person[1]/@show
The "show" attribute of the first "person" node simpsons
myData:/myXML[1]/person[1]/lastname/text()
The text of the "lastname" node of the first "person" node Simpson
myData:/myXML[1]/person[1]/last()
The number of "person" nodes 3
myData:/myXML[1]/person[1]/position()
When used for a replicated view, this will be the position of the view in the set n/a

3.2. More XPath Documentation

XPath is commonly used with XSLT, a language for transforming one XML document into another XML document, and also by some web browsers. Decent XSLT documentation often contains good documentation on XPath. You may also find the following online documents useful: