Chapter 1. XML and DocBook

Table of Contents

1.1. DocBook
1.2. A few words about XSLT
1.3. Presentation of the DocBook XML Tools installed at CERN
1.4. My First DocBook File
1.5. Translation to HTML
1.6. What about those Style Sheets?
1.7. Translation to PDF
1.8. Anatomy of a DocBook Tag
1.9. The Structure of a DocBook File
1.10. The Document Type Declaration

The aim of this chapter is to give the reader an introduction to XML and one of its best-known vocabularies, DocBook.

1.1. DocBook

The DocBook markup system is described in the printed book DocBook - The Definitive Guide [ TDG1999]. There is an up-to-date local HTML copy at CERN [ TDG2002]. For maximal efficiency, this tutorial must thus be used along with the above reference documentation. Indeed, by far not all of the (almost) 400 DocBook elements are covered. Hence, the complete list, as well as to all available attributes and entity sets, are explained in that reference document. This present tutorial mainly limits itself to the principles of the DocBook markup language. With the help of examples we introduce you to the use of the rich DocBook element tag set.

There are two DocBook-related discussion lists, one about Docbook itself [DBLIST] and one about its applications [DBAPPSLIST].

1.1.1. DocBook's past and future

DocBook is already more than ten years old. It began in 1991 as a joint project of HaL Computer Systems and O'Reilly. Its popularity grew, and in 1994, the Davenport Group became an officially chartered entity responsible for DocBook's maintenance. DocBook V1.2.2 was published at that time. In mid-1998, it became a Technical Committee (TC) of the Organization for the Advancement of Structured Information Standards (OASIS).

DocBook V3.1 was released in February 1999, and was, like its previous versions SGML only. In February 2001, when DocBook V4.1 and DocBook XML V4.1.2 became an OASIS Standard support SGML and XML is now available on an equal basis. More details are on the OASIS DocBook history Web page. The current V4.2 DTD versions for both XML and SGML were released in July 2002. Version 5 is foreseen to become available at the beginning of 2003.

The OASIS TC observes a very cautious policy regarding changes to the DTD. Backward-incompatible changes can only be introduced in major releases (4.0, 5.0, 6.0, and so on), and only if the change was described in comments in the DTD in the previous major release. In particular, the next version of DocBook, 5.0, will be XML compliant, and this will introduce a lot of changes.

1.1.2. Who uses DocBook?

DocBook has a very rich element set, is freely available, well documented, and comes with a good set of production tools, DocBook has become wide spread and has been adopted by many producers of software (open source as well as commertial), by students, and faculty for courses and theses, and by document managers everywhere. The following is an non-exhaustive list of who uses DocBook at present.

  • Many books at O'Reilly, the publishing house, are marked up in DocBook (in particular [DocBook, the definitive Guide].

  • A lot of computer documentation, such as that for the following projects: xfree86, GNOME, KDE, FreeBSD, PHP, the Linux Documentation Project.

  • Similarly, a lot of the documentation for commercial software and hardware vendors. For instance Sun, Red Hat, SuSE, HP, Cogent Real-Time Systems, Conectiva, Rational, Mandrakesoft, Caldera, Apple (Darwin docs), etc. use DocBook.

  • Producing generated documentation from code comments (GNOME, Linux kernel, XSLT Standard Library).

  • For training material, where from a single document, one can produce presentation slides, sample files, and printed handouts.

  • Whitepapers, like system or formal specifications (e.g., the RELAX NG specification) and proposals.

  • For maintaining websites, in particular those hosting FAQs.

  • Producing presentation slides, courseware, theses and dissertations.

  • Using a single docbook source with various stylesheets to document applications in various ways: Online manuals (PDF, HTML), Context-sensitive help (HTML, HTML Help), Man pages and formatted text (using also non-XSLT HTML to text conversion), Filtering conditionalized versions (using profiling).

  • The complete development chain of a product, including a description of the product itself, the automated test tool documentation, the defect tracking database, related software (O/S, networking, Apache, SQL, etc.).

  • Single sourcing to ensure consistency, by generating three targets from the same DocBook document: product API specifications (targeted at internal developers),API refenrence manual (targeted at our customers), API validation code (different programming languages).

1.1.3. Why DocBook?

The DocBook markup language, maintained by the OASIS consortium, is specifically suited for for technical documentation. It provides a rich set of tags to describe the content of especially software documentation.

A number of key points that help understand what DocBook is follow.

Docbook is a markup language.

It is very similar to HTML in this respect. The tags give some structure to your document, and appear intermixed with the informational text.

This pecular point makes it quite different from DTP (Desktop Publishing Tools) that spend most of their time "making the text look nice". In the case of DocBook and its associated XML tools, the rendering is done indirectly by using a transformation stage to generate the format for a given output medium.

DocBook was mainly developed for technical documentation.

DocBook is perfectly suited for car engine parts documentation. However, it is strongly biased towards computer programs documentation.

DocBook is maintained by an independent consortium.

The OASIS consortium is in charge of maintaining and making this standard evolve through the DocBook Technical Committee. This is a guarantee of independence with respect to proprietary software and standards.

Major players of the industry like Boeing or IBM are members of OASIS. A more complete members list is at the OASIS site.

Technically, DocBook is a (SGML or) XML DTD.

This means that one can take profit of the many (SGML and) XML aware tools that are (often freely) avialable. While DocBook as an XML implementation is quite recent, it has a long history as a SGML application.

DocBook is not a presentation language.

DocBook carefully cares about not specifying how the final documentation looks like. This allows the writer to concentrate on the organization and meaning of the document being written. All the presentation issues are devolved to style sheets.

This ensures all your documents have a consistent appearence, whoever is the technical writer.

DocBook is customizable.

It is quite easy to customize the DTD to meet user need thanks to its modular organization. However, one must be aware that customization must respect SGML/XML conventions and that it can lead to incompatibilities.

If DocBook is used in conjunction with Norman Walsh's modular XSLT stylesheets, it is also possible to customize the way a DocBook file can be printed or put online.

DocBook is comprehensive.

The large number of tags defined in DocBook guarantees that it can accomodate a wide range of situations and of processing expectations.

This in turn makes it a bit difficult to learn, but one can manage writing documentation knowing only a limited set of tags and referring to the reference documentation when needed.

DocBook uses long and understable tags.

Example of such tags are <itemizedlist> or <literallayout>. This makes DocBook source text easier to read than an HTML source, for example. As a drwback, it can become quite tedious to type those long tags, but in that case specialized modes in editors, such as Emacs' psgml mode, can alleviate this burden. They also exist some complete DocBook authoring tools, but these are mostly not free.