XML has long since become the lingua franca of machine-to-machine communication on the Internet. The format’s combination of human readability, standardization, and tool support has made working with XML an inevitability for programmers. Yet, writing code that deals in XML is an unpleasant chore in most programming languages. Scala improves this situation.
As with the Actor functionality we learned about in Chapter 9, Robust, Scalable Concurrency with Actors, Scala’s XML support is implemented partly as a library, with some built-in syntax support. It feels to the programmer like an entirely natural part of the language. Convenient operators add a spoonful of syntactic sugar to the task of diving deep into complex document structures, and pattern matching further sweetens the deal. Outputting XML is just as pleasant.
Unusual in programming languages and particularly handy, Scala allows inline XML. Most anywhere you might put a string, you can put XML. This feature makes templating and configuration a breeze, and lets us test our use of XML without so much as opening a file.
Let’s explore working with XML in Scala. First, we’ll look at reading and navigating an XML document. Finally, we’ll produce XML output programmatically and demonstrate uses for inline XML.
We’ll start with the basics: how to turn a string full of XML into a data structure we can work with.
// code-examples/XML/reading/from-string-script.scala import scala.xml._ val someXMLInAString = """ <sammich> <bread>wheat</bread> <meat>salami</meat> <condiments> <condiment expired="true">mayo</condiment> <condiment expired="false">mustard</condiment> </condiments> </sammich> """ val someXML = XML.loadString(someXMLInAString) assert(someXML.isInstanceOf[scala.xml.Elem])
All fine and well. We’ve transformed the string into a NodeSeq
, Scala’s type for storing a sequence of XML nodes. Were our XML document in a file on disk, we could have used the loadFile
method from the same package.
Since we’re supplying the XML ourselves, we can skip the XML.loadString
step and just assign a chunk of markup to a val
or var
.
// code-examples/XML/reading/inline-script.scala import scala.xml._ val someXML = <sammich> <bread>wheat</bread> <meat>salami</meat> <condiments> <condiment expired="true">mayo</condiment> <condiment expired="false">mustard</condiment> </condiments> </sammich> assert(someXML.isInstanceOf[scala.xml.Elem])
If we paste the previous example into the interpreter, we can explore our sandwich using some handy tools provided by NodeSeq
.
scala> someXML \ "bread" res2: scala.xml.NodeSeq = <bread>wheat</bread>
That backslash - what the documentation calls a projection function - says, “find me elements named bread”. We’ll always get a NodeSeq
back when using a projection function. If we’re only interested in what’s between the tags, we can use the text
method.
and \ represent a poor subset of XPath. But you can combine them with filters and maps and flatMaps. Which means, of course, you can combine them with for comprehensions. Some examples like the title of all books who have Dean Wampler as one of the books authors, comparing XPath and for-comprehensions might be good.
That was supposed to be \\ and \. :-)
Will add some text about this.
scala> (someXML \ "bread").text res3: String = wheat
It’s valid syntax to say someXML \ "bread" text
, without parentheses or the dot before the call to text
. You’ll still get the same result, but it’s harder to read. Parentheses make your intent clear.
We’ve only inspected the outermost layer of our sandwich. Let’s try to get a NodeSeq
of the condiments.
scala> someXML \ "condiment" res4: scala.xml.NodeSeq =
What went wrong? The \
function doesn’t descend into child elements of an XML structure. To do that, we use its sister function, \\
(two backslashes).
scala> someXML \\ "condiment" res5: scala.xml.NodeSeq = <condiment expired="true">mayo</condiment> <condiment expired="false">mustard</condiment>
Much better. (We split the single output line into two lines so it would fit on the page.) We dove into the structure and pulled out the two <condiment>
elements. Looks like one of the condiments has gone bad, though. We can find out if any of the condiments has expired by extracting its expired
attribute. All it takes is an @
before the attribute name.
scala> (someXML \\ "condiment")(0) \ "@expired" res6: scala.xml.NodeSeq = true
We used the (0)
to pick the first of the two condiments that were returned by (someXML \\ "condiment")
.
The previous bit of code extracted the value of the expired
attribute (true
, in this case), but it didn’t tell us which condiment is expired. If we were handed an arbitrary XML sandwich, how would we identify the expired condiments? We can loop through the XML.
// code-examples/XML/reading/for-loop-script.scala for (condiment <- (someXML \\ "condiment")) { if ((condiment \ "@expired").text == "true") println("the " + condiment.text + " has expired!") }
Because NodeSeq
inherits the same familiar attributes that most Scala collection types carry, tools like for
loops apply directly. In the above example, we extract the <condiment>
nodes, loop over each of them, and test whether or not their expired
attribute equals the string “true”. We have to specify that we want the text
of a given condiment
, otherwise we’d get a string representation of the entire line of XML.
We can also use pattern matching on XML structures. Cases in pattern matches can be written in terms of XML literals; expressions between curly braces ({}
) escape back to standard Scala pattern matching syntax. To match all XML nodes in the escaped portion of a pattern match, use a underscore (wildcard) followed by a asterisk (_*
). To bind what you’ve matched on to a variable, prefix the match with the variable name and an @
sign.
Let’s put all that together into one example. We’ll include the original XML document again so you can follow along as we pattern match on XML.
// code-examples/XML/reading/pattern-matching-script.scala import scala.xml._ val someXML = <sammich> <bread>wheat</bread> <meat>salami</meat> <condiments> <condiment expired="true">mayo</condiment> <condiment expired="false">mustard</condiment> </condiments> </sammich> someXML match { case <sammich>{ingredients @ _*}</sammich> => { for (cond @ <condiments>{_*}</condiments> <- ingredients) println("condiments: " + cond.text) } }
Here, we bind the contents of our <sammich>
structure (that is, what’s inside the opening and closing tag) to a variable called ingredients
. Then, as we iterate through the ingredients in a for
loop, we assign the elements that are between the <condiments>
tags to a temporary variable, cond
. Each cond
is printed.
The same tools that let us easily manipulate complex data structures in Scala are readily available for XML processing. As a readable alternative to XSLT, Scala’s XML library makes reading and parsing XML a breeze. It also gives us equally powerful tools for writing XML, which we’ll explore in the next section.
While some languages construct XML through complex object serialization mechanisms, Scala’s support for XML literals makes writing XML far simpler. Essentially, when you want XML, just write XML. To interpolate variables and expressions, escape out to Scala with curly braces, as we did in the pattern matching examples above.
scala> var name = "Bob" name: java.lang.String = Bob scala> val bobXML = | <person> | <name>{name}</name> | </person> bobXML: scala.xml.Elem = <person> <name>Bob</name> </person>
As we can see, the name
variable was substituted when we constructed the XML document assigned to bobXML
. That evaluation only occurs once; were name
subsequently redefined, the <name>
element of bobXML
would still contain the string "Bob".
For a more complete example, let’s say we’re designing that favorite latter-day “hello world”, a blogging system. We’ll start with a class to represent an Atom-friendly blog post.
// code-examples/XML/writing/post.scala import java.text.SimpleDateFormat import java.util.Date class Post(val title: String, val body: String, val updated: Date) { lazy val dashedDate = { val dashed = new SimpleDateFormat("yy-MM-dd") dashed.format(updated) } lazy val atomDate = { val rfc3339 = new SimpleDateFormat("yyyy-MM-dd'T'h:m:ss'-05:00'") rfc3339.format(updated) } lazy val slug = title.toLowerCase.replaceAll("\\W", "-") lazy val atomId = "tag:example.com," + dashedDate + ":/" + slug }
Beyond the obvious title
and body
attributes, we’ve defined several lazily-loaded values in our Post
class. These attributes will come in handy when we transmute our posts into an Atom feed, the standard way to syndicate blogs between computers on the Web. Atom documents are a flavor of XML, and a perfect application for demonstrating the process of outputting XML with Scala.
We’ll define an AtomFeed
class that takes a sequence of Post
objects as its sole argument.
// code-examples/XML/writing/atom-feed.scala import scala.xml.XML class AtomFeed(posts: Seq[Post]) { val feed = <feed xmlns="http://www.w3.org/2005/Atom"> <title>My Blog</title> <subtitle>A fancy subtitle.</subtitle> <link href="http://example.com/"/> <link href="http://example.com/atom.xml" rel="self"/> <updated>{posts(0).atomDate}</updated> <author> <name>John Doe</name> <uri>http://example.com/about.html</uri> </author> <id>http://example.com/</id> {for (post <- posts) yield <entry> <title>{post.title}</title> <link href={"http://example.com/" + post.slug + ".html"} rel="alternate"/> <id>{post.atomId}</id> <updated>{post.atomDate}</updated> <content type="html">{post.body}</content> <author> <name>John Doe</name> <uri>http://example.com/about.html</uri> </author> </entry> } </feed> def write = XML.saveFull(Config.atomPath, feed, "UTF-8", true, null) }
We’re making heavy use of the ability to escape out to Scala expressions in this example. Whenever we need a piece of dynamic information - for example, the date of the first post in the sequence, formatted for the Atom standard - we simply escape out and write Scala as we normally would. In the latter half of the <feed>
element, we use a for
comprehension to yield
successive blocks of dynamically formatted XML.
The write
method of AtomFeed
demonstrates the use of the saveFull
method, provided by the scala.xml
library. saveFull
writes an XML document to disk, optionally in different encoding schemes and with different document type declarations. Alternately, the save
method within the same package will make use of any java.io.Writer
variant, should you need buffering, piping, etc.
Writing XML with Scala is straightforward: construct the document you need with inline XML, use interpolation where dynamic content is to be substituted, and make use of the handy convenience methods to write your completed documents to disk or to other output streams.
XML has become ubiquitous in software applications, yet few languages make working with XML a simple task. We learned how Scala accelerates XML development by making it easy to read and write XML.
In the next chapter, we’ll learn how Scala provides rich support for creating your own Domain-Specific Languages (DSLs).
No comments yet
Add a comment