Validating with XIncludes

The major advantage of XIncludes is that they let you create modular files that can be individually validated. But there is a small problem. The xi:include element itself is not a valid DocBook element, so the master document will not validate. The usual way around this is to resolve the includes before validating. That replaces the nonvalid xi:include with the DocBook content it references, which should be valid when it is inserted. It is also possible to customize the DTD to permit validation before resolution of xincludes.

The xmllint utility that is included with the xsltproc toolkit can be used to resolve XIncludes and validate XML documents without modifying the DocBook DTD. It has an --xinclude option that resolves XIncludes, and a --postvalid option that validates after the includes are resolved. You should also add the --noent (“no entities”) option so that all system entities are resolved before validating. So to validate a book document that has XIncludes for its chapters, you could use this command:

xmllint --noout --xinclude --postvalid --noent book.xml

The effect of this command is to replace each xi:include element with its content, and then validate the result. The validation process never sees the xi:include element, so there is no conflict with the DTD. The --noout option suppresses the normal output of xmllint, which is the complete XML content, so that it only reports validation errors. You can omit the --noout option if you want to examine the resolved document.

For Java, the Xerces-J parser (version 2.5.0 and later) will resolve XIncludes. It can validate the including and included files separately, but it cannot validate the merged content in one step. So validation with Xerces requires adding the xi:include element to the DocBook DTD to avoid validation errors, as described in the next section. To validate the assembled document, you will need to resolve the XIncludes into a temporary file, and then validate the temporary file. See the section “Java processors and XIncludes” for information on what is available.

Even if you resolve your XIncludes, you may still run into a problem with xml:base attributes that are inserted into the resolved document in some circumstances. See the section “Adding xml:base to the DTD” for more information.

DTD customizations for XIncludes

There are situations with modular documentation files where you would rather not have to resolve your XIncludes before validating. That is, you would like to be able to validate a file that has an xi:include element. Also, other tools may not resolve XIncludes. For example, if you load the master book document into a validating editor, then it will complain about the xi:include elements.

Adding XInclude to the DTD

You might think you could just add the xi:include element to the DocBook DTD. But declaring the element is not enough. It has to be added into content models of elements as it would appear in documents. But since an XInclude can replace many combinations of elements, trying to cover all possible uses of XInclude would make the content models in the DTD hopelessly complex.

But if you are willing to limit where you put XIncludes in your document, then you can create a DTD customization to support your usage. You have to declare the XInclude elements and then add them to the content models of certain elements. The following is an example that lets you create XIncludes that contain chapters, appendixes, and other immediate children of the book element, a fairly typical use of XIncludes. First you create a system entity (a file) that contains your DTD modifications, such as the following.

Example 23.2. DTD customization for XIncludes

<!ELEMENT xi:include (xi:fallback?) >
<!ATTLIST xi:include
    xmlns:xi   CDATA       #FIXED    "http://www.w3.org/2001/XInclude"
    href       CDATA       #IMPLIED
    parse      (xml|text)  "xml"
    xpointer   CDATA       #IMPLIED
    encoding   CDATA       #IMPLIED 
    accept     CDATA       #IMPLIED
    accept-language CDATA  #IMPLIED >

<!ELEMENT xi:fallback ANY>
<!ATTLIST xi:fallback
    xmlns:xi   CDATA   #FIXED   "http://www.w3.org/2001/XInclude" >

<!ENTITY % local.chapter.class "| xi:include">

All lines but the last declare the xi:include and xi:fallback elements and their attributes. Those declarations make the elements available, but do not put them in any content models. The last line adds the xi:include element to the local.chapter.class parameter entity (be sure to include the pipe symbol). That entity is used in the DocBook DTD to extend the list of elements permitted as children of book. So wherever you could put a chapter element, now you can put an xi:include element that points to a file that contains a chapter. The following shows some other possibilities for placement of XIncludes.

<!-- inside chapter or section elements -->
<!ENTITY % local.divcomponent.mix "| xi:include">
<!-- inside para, programlisting, literallayout, etc. -->   
<!ENTITY % local.para.char.mix "| xi:include">
<!-- inside bookinfo, chapterinfo, etc. -->      
<!ENTITY % local.info.class "| xi:include">         

Now these DTD extensions need to be made available to your documents. You can do that with a customization layer for the DocBook DTD, or you can add them to the internal DTD subset in each file. For example, if you put the above content in a file named xinclude.mod, you can reference that DTD module as follows:

<!DOCTYPE book SYSTEM "docbookx.dtd" [
<!ENTITY % xinclude SYSTEM "xinclude.mod">
%xinclude;
]>
<book>
<title>User's Guide</title>
<xi:include  href="intro.xml" xmlns:xi="http://www.w3.org/2001/XInclude"  /> 
...

This declares a system entity and then references the system entity with %xinclude; in the internal subset of the DTD. With these changes in place, the document can be validated without resolving its XIncludes, as long as they fit the DTD changes you have specified.

Adding xml:base to the DTD

If you are using DocBook DTD version 4.2 or earlier, you may get validation errors after resolving your XIncludes, from xml:base attributes. When an included file is in a directory that is different from the including file's directory, the XInclude processor inserts an xml:base attribute in its containing element. This attribute enables any relative file references in the included file to be resolved relative to the included file, rather than relative to the including document. Because xml:base was only added starting in version 4.3 of the DocBook DTD, this generates a validation error for documents written to earlier versions of the DTD. If you use a customized DTD, you could add it yourself to the local.common.attrib parameter entity in the DTD as follows:

<!ENTITY % local.common.attrib  "xml:base  CDATA  #IMPLIED">

Or you could put this in the DOCTYPE declaration for documents that have this problem:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
     "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [
<!ENTITY % local.common.attrib "xml:base  CDATA  #IMPLIED">
]>

This DTD extension to version 4.2 permits the xml:base attribute in almost all elements in DocBook, so validation will succeed. This DTD extension will have to be removed when you upgrade to DocBook version 4.3 or later which supports it natively.