SGML provides a mechanism to indicate that particular pieces of the document should be processed in a special way. These are termed “marked sections”.
As you would expect, being an SGML construct, a marked section starts with <!.
The first square bracket begins to delimit the marked section.
KEYWORD describes how this marked section should be processed by the parser.
The second square bracket indicates that the content of the marked section starts here.
The marked section is finished by closing the two square brackets, and then returning to the document context from the SGML context with >.
These keywords denote the marked sections content model, and allow you to change it from the default.
When an SGML parser is processing a document it keeps track of what is called the “content model”.
Briefly, the content model describes what sort of content the parser is expecting to see, and what it will do with it when it finds it.
The two content models you will probably find most useful are CDATA and RCDATA.
CDATA is for “Character Data”. If the parser is in this content model then it is expecting to see characters, and characters only. In this model the < and & symbols lose their special status, and will be treated as ordinary characters.
RCDATA is for “Entity references and character data”. If the parser is in this content model then it is expecting to see characters and entities. < loses its special status, but & will still be treated as starting the beginning of a general entity.
This is particularly useful if you are including some verbatim text that contains lots of < and & characters. While you could go through the text ensuring that every < is converted to a < and every & is converted to a &, it can be easier to mark the section as only containing CDATA. When the SGML parser encounters this it will ignore the < and & symbols embedded in the content.
Note: When you use CDATA or RCDATA in examples of text marked up in SGML, keep in mind that the content of CDATA is not validated. You have to check the included SGML text using other means. You could, for example, write the example in another document, validate the example code, and then paste it to your CDATA content.
Example 3-15. Using a CDATA Marked Section
<para>Here is an example of how you would include some text that contained many <literal><</literal> and <literal>&</literal> symbols. The sample text is a fragment of HTML. The surrounding text (<para> and <programlisting>) are from DocBook.</para> <programlisting> <![ CDATA [ <p>This is a sample that shows you some of the elements within HTML. Since the angle brackets are used so many times, it is simpler to say the whole example is a CDATA marked section than to use the entity names for the left and right angle brackets throughout.</p> <ul> <li>This is a listitem</li> <li>This is a second listitem</li> <li>This is a third listitem</li> </ul> <p>This is the end of the example.</p> ]]> </programlisting>
If you look at the source for this document you will see this technique used throughout.
If the keyword is INCLUDE then the contents of the marked section will be processed. If the keyword is IGNORE then the marked section is ignored and will not be processed. It will not appear in the output.
Example 3-16. Using INCLUDE and IGNORE in Marked Sections
<![ INCLUDE [ This text will be processed and included. ]]> <![ IGNORE [ This text will not be processed or included. ]]>
By itself, this is not too useful. If you wanted to remove text from your document you could cut it out, or wrap it in comments.
It becomes more useful when you realize you can use parameter entities to control this. Remember that parameter entities can only be used in SGML contexts, and the keyword of a marked section is an SGML context.
For example, suppose that you produced a hard-copy version of some documentation and an electronic version. In the electronic version you wanted to include some extra content that was not to appear in the hard-copy.
Create a parameter entity, and set its value to INCLUDE. Write your document, using marked sections to delimit content that should only appear in the electronic version. In these marked sections use the parameter entity in place of the keyword.
When you want to produce the hard-copy version of the document, change the parameter entity's value to IGNORE and reprocess the document.
Example 3-17. Using A Parameter Entity to Control a Marked Section
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [ <!ENTITY % electronic.copy "INCLUDE"> ]]> ... <![ %electronic.copy [ This content should only appear in the electronic version of the document. ]]>
When producing the hard-copy version, change the entity's definition to:
<!ENTITY % electronic.copy "IGNORE">
On reprocessing the document, the marked sections that use %electronic.copy as their keyword will be ignored.
Create a new file, section.sgml, that contains the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [ <!ENTITY % text.output "INCLUDE"> ]> <html> <head> <title>An example using marked sections</title> </head> <body> <p>This paragraph <![ CDATA [contains many < characters (< < < < <) so it is easier to wrap it in a CDATA marked section.]]></p> <![ IGNORE [ <p>This paragraph will definitely not be included in the output.</p> ]]> <![ %text.output [ <p>This paragraph might appear in the output, or it might not.</p> <p>Its appearance is controlled by the %text.output parameter entity.</p> ]]> </body> </html>
Normalize this file using osgmlnorm and examine the output. Notice which paragraphs have appeared, which have disappeared, and what has happened to the content of the CDATA marked section.
Change the definition of the text.output entity from INCLUDE to IGNORE. Re-normalize the file, and examine the output to see what has changed.