Authoring a HowTo document

The HowTo documents are authored in XML, using the DocBook XML DTD. They are maintained in the Darwin CVS archive and will soon be built regularly.

What is XML?

XML is a structured document language that describes text according to what kind of data it is, rather than how it should look. If you are used to simply making text bold when you want it to be bold, then it can take a bit of adjustment to understand the XML authoring model.

In traditional text editors if you wanted to make someone's email address stand out in a paragraph, you would provide some special formatting. In an RTF document you might make it bold, or if you are editing an HTML document, you might look up the formatting for a mailTo anchor so poeple can click on the address to send email. XML makes your life easier. All you need to do is identify the data as an email address, such as <email>[email protected]</email>, and when the document is processed the tools decide what to do. So for RTF it might become bold, for HTML it might become a hot link.

With a structured editing language like XML, the writer's job changes to one of providing content and identifying what that content is. How the content is produced is handled seperately.

Authoring XML is a bit like programming. After you create a document, you need a 'compiler' to parse it and make sure it is a legal document, so that the production tools will be able to do their job. This is known as validating the document. Some editors will do the validation for you, but many do not and require you to use a separate tool.

Choosing an XML editor

Given the young nature of XML there are not a lot of high-quality editors to be found. Hopefully this situation will change soon. The important features to consider in an editor are:

  • automatic tag insertion

  • tag constraints: not allowing you to use illegal tags while editing

  • validation against the Document Type Definition (DTD)

If you don't find an XML editor you like that provides validation or tag constraints, then don't worry about it and just use your favorite editor. You'll just have to do a bit of work when you do run a validating parser on your document, to clean up any errors you've made.

Editors you might want to check out are:

There are a couple of free GUI-based XML editors in development:

Getting the DocBook DTD

Now that you have an editor, you need the Document Type Definition (DTD). The DocBook DTD is available at the Oasis site, http://www.oasis-open.org/docbook/xml.

Make sure you get the XML DTD, not the SGML version.

Technically you don't need to have this on your local system, given that your XML document can point to it on the web (as the released versions of the HOWTO documents do). You may find that during the authoring of a document it's faster and more convenient to point your document to a local copy of the DTD. The HOWTO template provides an example of this.

Learning the DocBook DTD

DocBook is the standard DTD used for creating technical documentation in the Open Source world. It's highly flexible, which means you can create documents as diverse as an API reference or this HowTo article with it.

This flexibility also leads to the criticism that DocBook has too many tags and is too complex for a mere human to use. Norm Walsh has responded to this with a simplified DocBook DTD, at http://www.nwalsh.com/docbook/simple/index.html, which you can use if you prefer.

While DocBook can be complex if you are authoring API reference or using tables, the typical HowTo document is really quite easy to author. This document so far has used less than ten unique tags for the body text (the author and revision information do use a few extra tags, but that's the kind of thing you can just copy anyway).

Here are some good tutorials and other resources to get you started using DocBook. Some of these may refer to the SGML version, which is practically identical except for using a different identifier at the top and using mixed-case tags:

Using the HOWTO template

The XML version of this document serves as a pretty comprehensive template for HOWTOs. However, that's a lot of deleting to do to get started on your own writing. You might find it easier to use the Darwin HOWTO Template, which you can get here:

http://www.opensource.apple.com/projects/documentation/howto/xml/Darwin_HOWTO_Template.xml.

Processing XML

Creating an XML document is one thing, but processing it into HTML or PDF is another. There are multiple tools used in the process, and many alternatives to choose from, most of which are in various states of development.

The Darwin Documentation project is still setting up the build system, and the tools used are subject to change at any time. Right now, we use the following:

With these tools installed, your CLASSPATH needs to include a pointer to saxon.jar, similar to this line from my .tcshrc file:

CLASSPATH=/Volumes/RonStuff/Users/rhayden/DocBook/saxon/saxon.jar

The Java command to transform the XML into HTML looks like this, where the docbook.xsl path is the location of the xsl stylesheets on your system:

java com.icl.saxon.StyleSheet myfile.xml /usr/local/sgml/docbook-xsl/html/docbook.xsl > myfile.html

Given that this solution is Java-based, it cannot currently be run on the Darwin release. We're looking into some Perl-based solutions and other alternatives to change that in the near future.