Chapter 10. Herding XML in Scala

XML has long since become the lingua franca of machine-to-machine communication on the Internet. The format’s combination of human readability, standardization, and tool support has made working with XML an inevitability for programmers. Yet, writing code that deals in XML is an unpleasant chore in most programming languages. Scala improves this situation.

No comments yet

As with the Actor functionality we learned about in Chapter 9, Robust, Scalable Concurrency with Actors, Scala’s XML support is implemented partly as a library, with some built-in syntax support. It feels to the programmer like an entirely natural part of the language. Convenient operators add a spoonful of syntactic sugar to the task of diving deep into complex document structures, and pattern matching further sweetens the deal. Outputting XML is just as pleasant.

No comments yet

Unusual in programming languages and particularly handy, Scala allows inline XML. Most anywhere you might put a string, you can put XML. This feature makes templating and configuration a breeze, and lets us test our use of XML without so much as opening a file.

No comments yet

Let’s explore working with XML in Scala. First, we’ll look at reading and navigating an XML document. Finally, we’ll produce XML output programmatically and demonstrate uses for inline XML.

No comments yet

Reading XML

We’ll start with the basics: how to turn a string full of XML into a data structure we can work with.

// code-examples/XML/reading/from-string-script.scala

import scala.xml._

val someXMLInAString = """
<sammich>
  <bread>wheat</bread>
  <meat>salami</meat>
  <condiments>
    <condiment expired="true">mayo</condiment>
    <condiment expired="false">mustard</condiment>
  </condiments>
</sammich>
"""

val someXML = XML.loadString(someXMLInAString)
assert(someXML.isInstanceOf[scala.xml.Elem])

All fine and well. We’ve transformed the string into a NodeSeq, Scala’s type for storing a sequence of XML nodes. Were our XML document in a file on disk, we could have used the loadFile method from the same package.

Since we’re supplying the XML ourselves, we can skip the XML.loadString step and just assign a chunk of markup to a val or var.

// code-examples/XML/reading/inline-script.scala

import scala.xml._

val someXML =
<sammich>
  <bread>wheat</bread>
  <meat>salami</meat>
  <condiments>
    <condiment expired="true">mayo</condiment>
    <condiment expired="false">mustard</condiment>
  </condiments>
</sammich>

assert(someXML.isInstanceOf[scala.xml.Elem])

Exploring XML

If we paste the previous example into the interpreter, we can explore our sandwich using some handy tools provided by NodeSeq.

scala> someXML \ "bread"
res2: scala.xml.NodeSeq = <bread>wheat</bread>

That backslash - what the documentation calls a projection function - says, “find me elements named bread”. We’ll always get a NodeSeq back when using a projection function. If we’re only interested in what’s between the tags, we can use the text method.

3 comments

  1. Daniel Sobral Posted 1 month, 7 days and 21 hours ago

    and \ represent a poor subset of XPath. But you can combine them with filters and maps and flatMaps. Which means, of course, you can combine them with for comprehensions. Some examples like the title of all books who have Dean Wampler as one of the books authors, comparing XPath and for-comprehensions might be good.

  2. Daniel Sobral Posted 1 month, 7 days and 21 hours ago

    That was supposed to be \\ and \. :-)

  3. Dean Wampler Posted 29 days and 22 hours ago

    Will add some text about this.

Add a comment

scala> (someXML \ "bread").text
res3: String = wheat

Tip

It’s valid syntax to say someXML \ "bread" text, without parentheses or the dot before the call to text. You’ll still get the same result, but it’s harder to read. Parentheses make your intent clear.

We’ve only inspected the outermost layer of our sandwich. Let’s try to get a NodeSeq of the condiments.

scala> someXML \ "condiment"
res4: scala.xml.NodeSeq =

What went wrong? The \ function doesn’t descend into child elements of an XML structure. To do that, we use its sister function, \\ (two backslashes).

scala> someXML \\ "condiment"
res5: scala.xml.NodeSeq = <condiment expired="true">mayo</condiment>
  <condiment expired="false">mustard</condiment>

Much better. (We split the single output line into two lines so it would fit on the page.) We dove into the structure and pulled out the two <condiment> elements. Looks like one of the condiments has gone bad, though. We can find out if any of the condiments has expired by extracting its expired attribute. All it takes is an @ before the attribute name.

scala> (someXML \\ "condiment")(0) \ "@expired"
res6: scala.xml.NodeSeq = true

We used the (0) to pick the first of the two condiments that were returned by (someXML \\ "condiment").

Looping & Matching XML

The previous bit of code extracted the value of the expired attribute (true, in this case), but it didn’t tell us which condiment is expired. If we were handed an arbitrary XML sandwich, how would we identify the expired condiments? We can loop through the XML.

// code-examples/XML/reading/for-loop-script.scala

for (condiment <- (someXML \\ "condiment")) {
  if ((condiment \ "@expired").text == "true")
    println("the " + condiment.text + " has expired!")
}

Because NodeSeq inherits the same familiar attributes that most Scala collection types carry, tools like for loops apply directly. In the above example, we extract the <condiment> nodes, loop over each of them, and test whether or not their expired attribute equals the string “true”. We have to specify that we want the text of a given condiment, otherwise we’d get a string representation of the entire line of XML.

We can also use pattern matching on XML structures. Cases in pattern matches can be written in terms of XML literals; expressions between curly braces ({}) escape back to standard Scala pattern matching syntax. To match all XML nodes in the escaped portion of a pattern match, use a underscore (wildcard) followed by a asterisk (_*). To bind what you’ve matched on to a variable, prefix the match with the variable name and an @ sign.

Let’s put all that together into one example. We’ll include the original XML document again so you can follow along as we pattern match on XML.

// code-examples/XML/reading/pattern-matching-script.scala

import scala.xml._

val someXML =
<sammich>
  <bread>wheat</bread>
  <meat>salami</meat>
  <condiments>
    <condiment expired="true">mayo</condiment>
    <condiment expired="false">mustard</condiment>
  </condiments>
</sammich>

someXML match {
  case <sammich>{ingredients @ _*}</sammich> => {
    for (cond @ <condiments>{_*}</condiments> <- ingredients)
      println("condiments: " + cond.text)
  }
}

Here, we bind the contents of our <sammich> structure (that is, what’s inside the opening and closing tag) to a variable called ingredients. Then, as we iterate through the ingredients in a for loop, we assign the elements that are between the <condiments> tags to a temporary variable, cond. Each cond is printed.

The same tools that let us easily manipulate complex data structures in Scala are readily available for XML processing. As a readable alternative to XSLT, Scala’s XML library makes reading and parsing XML a breeze. It also gives us equally powerful tools for writing XML, which we’ll explore in the next section.

Writing XML

While some languages construct XML through complex object serialization mechanisms, Scala’s support for XML literals makes writing XML far simpler. Essentially, when you want XML, just write XML. To interpolate variables and expressions, escape out to Scala with curly braces, as we did in the pattern matching examples above.

scala> var name = "Bob"
name: java.lang.String = Bob

scala> val bobXML =
     | <person>
     |   <name>{name}</name>
     | </person>
bobXML: scala.xml.Elem =
<person>
  <name>Bob</name>
</person>

As we can see, the name variable was substituted when we constructed the XML document assigned to bobXML. That evaluation only occurs once; were name subsequently redefined, the <name> element of bobXML would still contain the string "Bob".

A Real-World Example

For a more complete example, let’s say we’re designing that favorite latter-day “hello world”, a blogging system. We’ll start with a class to represent an Atom-friendly blog post.

// code-examples/XML/writing/post.scala

import java.text.SimpleDateFormat
import java.util.Date

class Post(val title: String, val body: String, val updated: Date) {
  lazy val dashedDate = {
    val dashed = new SimpleDateFormat("yy-MM-dd")
    dashed.format(updated)
  }

  lazy val atomDate = {
    val rfc3339 = new SimpleDateFormat("yyyy-MM-dd'T'h:m:ss'-05:00'")
    rfc3339.format(updated)
  }

  lazy val slug = title.toLowerCase.replaceAll("\\W", "-")
  lazy val atomId  = "tag:example.com," + dashedDate + ":/" + slug
}

Beyond the obvious title and body attributes, we’ve defined several lazily-loaded values in our Post class. These attributes will come in handy when we transmute our posts into an Atom feed, the standard way to syndicate blogs between computers on the Web. Atom documents are a flavor of XML, and a perfect application for demonstrating the process of outputting XML with Scala.

We’ll define an AtomFeed class that takes a sequence of Post objects as its sole argument.

// code-examples/XML/writing/atom-feed.scala

import scala.xml.XML

class AtomFeed(posts: Seq[Post]) {
  val feed =
  <feed xmlns="http://www.w3.org/2005/Atom">
    <title>My Blog</title>
    <subtitle>A fancy subtitle.</subtitle>
    <link href="http://example.com/"/>
    <link href="http://example.com/atom.xml" rel="self"/>
    <updated>{posts(0).atomDate}</updated>
    <author>
      <name>John Doe</name>
      <uri>http://example.com/about.html</uri>
    </author>
    <id>http://example.com/</id>
    {for (post <- posts) yield
    <entry>
      <title>{post.title}</title>
      <link href={"http://example.com/" + post.slug + ".html"} rel="alternate"/>
      <id>{post.atomId}</id>
      <updated>{post.atomDate}</updated>
      <content type="html">{post.body}</content>
      <author>
        <name>John Doe</name>
        <uri>http://example.com/about.html</uri>
      </author>
    </entry>
    }
  </feed>

  def write = XML.saveFull(Config.atomPath, feed, "UTF-8", true, null)
}

We’re making heavy use of the ability to escape out to Scala expressions in this example. Whenever we need a piece of dynamic information - for example, the date of the first post in the sequence, formatted for the Atom standard - we simply escape out and write Scala as we normally would. In the latter half of the <feed> element, we use a for comprehension to yield successive blocks of dynamically formatted XML.

The write method of AtomFeed demonstrates the use of the saveFull method, provided by the scala.xml library. saveFull writes an XML document to disk, optionally in different encoding schemes and with different document type declarations. Alternately, the save method within the same package will make use of any java.io.Writer variant, should you need buffering, piping, etc.

Writing XML with Scala is straightforward: construct the document you need with inline XML, use interpolation where dynamic content is to be substituted, and make use of the handy convenience methods to write your completed documents to disk or to other output streams.

Recap and What’s Next

XML has become ubiquitous in software applications, yet few languages make working with XML a simple task. We learned how Scala accelerates XML development by making it easy to read and write XML.

In the next chapter, we’ll learn how Scala provides rich support for creating your own Domain-Specific Languages (DSLs).

You must sign in or register before commenting