Entities are a mechanism for assigning names to chunks of content. As an SGML parser processes your document, any entities it finds are replaced by the content of the entity.
This is a good way to have re-usable, easily changeable chunks of content in your SGML documents. It is also the only way to include one marked up file inside another using SGML.
There are two types of entities which can be used in two different situations; general entities and parameter entities.
You cannot use general entities in an SGML context (although you define them in one). They can only be used in your document. Contrast this with parameter entities.
Each general entity has a name. When you want to reference a general entity (and therefore include whatever text it represents in your document), you write &entity-name;. For example, suppose you had an entity called current.version which expanded to the current version number of your product. You could write:
<para>The current version of our product is ¤t.version;.</para>
When the version number changes you can simply change the definition of the value of the general entity and reprocess your document.
You can also use general entities to enter characters that you could not otherwise include in an SGML document. For example, < and & cannot normally appear in an SGML document. When the SGML parser sees the < symbol it assumes that a tag (either a start tag or an end tag) is about to appear, and when it sees the & symbol it assumes the next text will be the name of an entity.
Fortunately, you can use the two general entities < and & whenever you need to include one or other of these.
A general entity can only be defined within an SGML context. Typically, this is done immediately after the DOCTYPE declaration.
Example 3-10. Defining General Entities
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [ <!ENTITY current.version "3.0-RELEASE"> <!ENTITY last.version "2.2.7-RELEASE"> ]>
Notice how the DOCTYPE declaration has been extended by adding a square bracket at the end of the first line. The two entities are then defined over the next two lines, before the square bracket is closed, and then the DOCTYPE declaration is closed.
The square brackets are necessary to indicate that we are extending the DTD indicated by the DOCTYPE declaration.
Like general entities, parameter entities are used to assign names to reusable chunks of text. However, whereas general entities can only be used within your document, parameter entities can only be used within an SGML context.
Parameter entities are defined in a similar way to general entities. However, instead of using &entity-name; to refer to them, use %entity-name;[1]. The definition also includes the % between the ENTITY keyword and the name of the entity.
Example 3-11. Defining Parameter Entities
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" [ <!ENTITY % param.some "some"> <!ENTITY % param.text "text"> <!ENTITY % param.new "%param.some more %param.text"> <!-- %param.new now contains "some more text" --> ]>
This may not seem particularly useful. It will be.
Add a general entity to example.sgml.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" [ <!ENTITY version "1.1"> ]> <html> <head> <title>An Example HTML File</title> </head> <!-- You might well have some comments in here as well --> <body> <p>This is a paragraph containing some text.</p> <p>This paragraph contains some more text.</p> <p align="right">This paragraph might be right-justified.</p> <p>The current version of this document is: &version;</p> </body> </html>
Validate the document using onsgmls.
Load example.sgml into your web browser (you may need to copy it to example.html before your browser recognizes it as an HTML document).
Unless your browser is very advanced, you will not see the entity reference &version; replaced with the version number. Most web browsers have very simplistic parsers which do not handle proper SGML[2].
The solution is to normalize your document using an SGML normalizer. The normalizer reads in valid SGML and outputs equally valid SGML which has been transformed in some way. One of the ways in which the normalizer transforms the SGML is to expand all the entity references in the document, replacing the entities with the text that they represent.
You can use osgmlnorm to do this.
% osgmlnorm example.sgml > example.html
You should find a normalized (i.e., entity references expanded) copy of your document in example.html, ready to load into your web browser.
If you look at the output from osgmlnorm you will see that it
does not include a DOCTYPE declaration at the start. To include this you need to use the
-d
option:
% osgmlnorm -d example.sgml > example.html
[1] |
Parameter entities use the Percent symbol. |
[2] |
This is a shame. Imagine all the problems and hacks (such as Server Side Includes) that could be avoided if they did. |