1.10. The Document Type Declaration

In non-trivial document the document type declaration (the very first lines of a DocBook file) can be slightly more complicated than the one or two lines presented in the document skeleton previously.

1.10.1. Internal general entities

The following example shows how entities allow one to define abbreviations for text strings. They can also be used to declare a convenient name for a character that is unavailable on a standard keyboard.

Example 1.8. Entities used to share text

  1 <?xml version="1.0" encoding="iso-8859-1" ?>
    <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
      <!ENTITY CERN "Conseil Européen pour la Recherche Nucléaire">
      <!ENTITY WWW "<application>World Wide Web</productname>">
  5   <!ENTITY W3C "&WWW; Consortium">
    ]>
     
    <article id="The.Web" lang="en">
    <articleinfo>
 10 <title>The origin of the &WWW;</title>
    </articleinfo>
     
    <section id="introduction">
    <title>General Introduction</title>
 15  
    <para> 
    Tim Berners-Lee and collaborators invented the &WWW; at 
    <acronym>CERN</acronym> (&CERN;) at the beginning of the 
    nineteen nineties.
 20 </para>
    
    <para>
    Today all further developments are coordinated by the &W3C; 
    (MIT, INRIA, Keio).
 25 </para>
     
    </section>
    </article>

Several important points should be noted in Example 1.8. Firstly, line 1 shows that we have to declare an explicit encoding when we want to use non-ASCII characters (more precisely, characters that are not encoded in Unicode's UTF8 or UTF16). In particular, on line 3, where we define the CERN entity, we make use of the é (accented e) which is part of the ISO-8859-1 encoding, as specified on line 1. On the other hand we could use a Unicode character reference and replace line 3 by the following statement.

<!ENTITY CERN "Conseil Europ&#xe9;en pour la Recherche Nucl&#xe9;aire">

In this case we do not have to declare an explicit encoding. Further, line 4 defines the WWW application, showing that one can use markup inside entity definitions, while the entity definition of W3C indicates that one can use entity references to previously declared entities (&WWW; in this case).

In the body of the text the defined entities are referenced on lines 10, 18, 19, and 23.

When you run the above file dbw3cexa.xml through the docbook2html procedure you obtain the file dbw3cexa.html, which is shown, as displayed by the Lynx line browser in Example 1.9.

Example 1.9. HTML version generated from a DocBook file as displayed by Lynx

                                      The origin of the World Wide Web

The origin of the World Wide Web
     ________________________________________________________________

   Table of Contents

   General Introduction

General Introduction

   Tim Berners-Lee and collaborators invented the World Wide Web at CERN
   (Conseil  Européen de la Recherche Nucléaire) at the beginning of the
   nineteen nineties.

   Today  all further developments are coordinated by the World Wide Web
   Consortium (MIT, INRIA, Keio).


Commands: Use arrow keys to move, '?' for help, 'q' to quit, '<-' to go back.
  Arrow keys: Up and Down to move.  Right to follow a link; Left to go back.
 H)elp O)ptions P)rint G)o M)ain screen Q)uit /=search [delete]=history list 

The consistent use of entities has a number of advantages:

  • you do not have to type several times a possibly quite long identical text string;

  • it allows you to centralize changes to commonly-used phrases and terms in a single place, thus guaranteeing consistency;

  • if the entity name is well chosen, it makes the source documents easier to read and maintain.

1.10.2. General external entities

For larger documetnts, it is convenient to keep chapters or other large text chunks in individual files that can be maintained separately. In such a case the various files that have to be included in the master document are declared in the document type declaration as general external entities.

For instance, the present tutorial document is split at the chapter and appendix level into separate files, that are included with a single main driver file, as follows (only part of the file is shown):

Example 1.10. Including external files using entities

  1 <!DOCTYPE book SYSTEM
      "/afs/cern.ch/sw/XML/XMLBIN/share/www.oasis-open.org/docbook/xmldtd-4.2/docbookx.dtd"
    [
      <!ENTITY bookinfo SYSTEM "bookinfo.xml">
  5   <!ENTITY introduction SYSTEM "introduction.xml">
        ....
      <!ENTITY emacs SYSTEM "emacs.xml">
      <!ENTITY examples SYSTEM "examples.xml">
      <!ENTITY glossary SYSTEM "glossary.xml">
 10 ]>
    <book id="dbatcern" lang="en">
    &bookinfo;
    &introduction;
      ...
 15 &emacs;
    &examples;
    &glossary;
    </book>
    

There exists a one-to-one correspondence between the entity definition using the <!ENTITY ...> declaration and the entity references, using the &xxx; syntax, where xxx is an entity that corresponds to a file that has to exist on the system as defined by the entity declaring xxx. A reference to an non-existing file will result in an error. For instance line 3 declares the bookinfo entity as an external reference to the file bookinfo.xml, that will be included on by the entity reference &bookinfo; on line 13.

Entity names have to be unique thoughout the document, but of course entities can be references several times (e.g., &WWW; on lines 10 and 17 of Example 1.9). For internal entities multiple references could be useful in the case of a copyright or other company message that has to be repeated on every page or at the beginning of each chapter.