(American Standard Code for Information Interchange) This standard character encoding scheme is used extensively in data transmission.


(American National Standards Institute) This group is the U.S. member organization that belongs to the ISO, the International Organization for Standardization.


Attributes augment the element on which they appear; they also provide additional information about the element.

Attributes appear as name-value pairs in the element's start-tag. For example, to assign the value hostname to the role attribute of systemitem, you would use the mark up: <systemitem role="hostname">.


A pointer, verbal or graphical or both, to a component of an illustration or a text object.


“Cooked” data, as distinct from “raw,” is a collection of elements and character data that's ready for presentation. The processor is not expected to rearrange, select, or suppress any of the elements, but simply present them as specified.

See Also Raw.

document type declaration (DTD)

A set of declarations that defines the names of the elements and their attributes, and that specifies rules for their combination or sequence. You can store a DTD at the beginning of a document or externally in a separate file.


Elements define the hierarchical structure of a document. that may They contain either text or other subelements such as a paragraph, a chapter, and so on. In XML all elements have start and end tags and contain some part of the document content. Empty elements have no content.

element declaration

A statement in the DTD defining an element and declaring the order in which it may appear in the document and what other elements it may include.


A name assigned (by means of a declaration) to some chunk of data so it can be referred to by that name; the data can be of various kinds: a string of characters, a special character (e.g., unavailable on a standard keyboard), a external chapter or graphics file, or a set of declarations in a DTD, for instance. The way an entity is referred to depends on the type of data and where it is being referenced. XML has parameter, external, internal, and data entities.

entity declaration

A statement in the DTD or document that assigns an XML/L name to an entity so you can reference it.

external entity

An external entity is a general entity that refers to another document. External entities are often used to incorporate parsable text documents, like legal notices or chapters, into larger units, like chapters or books.

external subset

Element, attribute, and other declarations that compose (part of) a document type definition that are stored in an external entity, and referenced from a document's document type declaration using a public or system identifier.


Text objects like sidebars, figures, tables, and graphics are said to float when their actual place in the document is not fixed. For presentation on a printed page, for instance, a graphic may float to the top of the next page if it is too tall to fit on the page in which it actually falls, in the sequence of words and the sequence of other like objects in a document.

general entity

An entity referenced by a name that starts with an ampersand (&) and ends with a semicolon. Most of the time general entities are used in SGML/XML documents, not in the DTD. There are two types, external and internal entities, and they refer either to special characters or to text objects like commonly repeated phrases or names or chapters.


(HyperText Markup Language) This is the format of files published on the World Wide Web. HTML is an application of SGML/XML; to author in HTML using SGML/XML-based authoring software, you simply need the HTML DTD.

internal entity

A general entity that references a piece of text (including its markup and even other internal entities), usually as a keyboard shortcut.

internal subset

Element, attribute, and other declarations that compose (part of) a document type definition that are stored in a document, within the document type declaration.


The Internet is a worldwide communications network originally developed by the U.S. Department of Defense as a distributed system with no single point of failure. The Internet has seen an explosion in commercial use since the development of easy-to-use software for accessing the Internet.


(International Organization for Standardization) The ISO is an industry-supported organization that establishes worldwide standards for everything from data interchange formats to film speed specifications.


Markup is anything added to the content of the document that describes the text.


Meta-information is information about a document, such as the specification of its author or its date of composition, as opposed to the content of a document itself.

parameter entity

An entity usually referenced in the DTD by a name that starts with a percent sign (%) and ends with a semicolon. In DocBook, parameter entities are mainly used to facilitate customization of the DTD, but they can also be used to control marked sections of a document.


A parser is a specialized software program that recognizes XML markup in a document. A parser that reads a DTD and checks and reports on markup errors is a validating XML parser. A parser can be built into an XML editor to prevent incorrect tagging and to check whether a document contains all the required elements.

processing instruction

An essentially arbitrary string preceded by a question mark and delimited by angle brackets that is intended to convey information to an application that processes an XML instance. For example, the processing instruction <?linebreak> might cause the formatter to introduce a line break at the position where the processing instruction occurs.

In XML documents, processing instructions should have the form:

<?pitarget param1="value1" param2="value2"?>

The pitarget should be a name that the processing application will recognize. Additional information in the PI should be added using “attribute syntax.”

public identifier

An abstract identifier for an SGML or XML document, DTD, or external entity.


“Raw” data is just a collection of elements, with no additional punctation or information about presentation. To continue the cooking metaphor, raw data is just a set of ingredients. It's up to the processor to select appropriate elements, arrange them for display, and add required presentational information.

See Also Cooked.


Standard Generalized Markup Language, an international standard (ISO 8879) that specifies the rules for the creation of platform-independent markup languages for electronic texts.


A file that specifies the presentation or appearance of a document; there are several standards for such stylesheets, including CSS, FOSIs, DSSSL, and, most recently, XSL. Vendors often have proprietary stylesheet formats as well.

system identifier

In SGML/XML, a local, system-dependent identifier for a document, DTD, or external entity. Usually a filename on the local system.

In XML, a system identifer is required to be a URI.


In the world of SGML/XML, a tag is a marker embedded in a document that indicates the purpose or function of the element. Each element has a beginning tag and an end tag, or is empty Iis syntax is a name enclosed in angle brackets (<>). For instance, <para> is a tag in DocBook used to mark the beginning of a paragraph, </para> marks the end of a paragraph, while <xref ... /> is an empty tag.


The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard, which specifies the representation of multi-lingual text in modern software products and standards.

Unicode supports characters that are one to four bytes wide rather than the 8-bit codes currently supported by most systems. In each of its seventeen 16-bit planes (numbered 0 to 16) Unicode encodes 65,536 characters rather than only 256 for one-byte encodings, such as ASCII, ISO-8859-1, etc. Thus, in total, Unicode allows one to encode well over 1 million characters.

In its Basic Multilingual Plane (Plane 0, BMP) it encodes most of the world's commonly-used languages, with rarer, ancient, or specialized characters encoded in the higher planes. The first 128 code points coincide with ASCII, so that for text using that character set only one byte is needed, whereas for mots other languages two byte suffice.

Of particular interest are the XML standard encodings UTF-8 (Unicode Transformation Format, 8-bit encoding form), which serializes a Unicode scalar value (code point) as a sequence of one to four bytes, and UTF-16 (Unicode Transformation Format, 16-bit encoding form), which serializes a Unicode scalar value (code point) as a sequence of two bytes, in either big-endian or little-endian format.


Uniform Resource Identifier, the W3C's codification of the name and address syntax of present and future objects on the Internet. In its most basic form, a URI consists of a scheme name (such as file, http, ftp, news, mailto, gopher) followed by a colon, followed by a path whose nature is determined by the scheme that precedes it (see RFC 1630).

URI is the umbrella term for URNs, URLs, and all other Uniform Resource Identifiers.


Uniform Resource Locator, a name and address for an existing object accessible over the Internet. is an example of a URL (see RFC 1738).


Uniform Resource Name, the result of an evolving attempt to define a name and address syntax for persistent objects accessible over the Internet; urn:foo:a123,456 is a legal URN consisting of three colon-separated fields: urn followed by a namespace identifier, followed by a namespace specifier (see RFC 1737 and RFC 2141 for details).


The World Wide Web Consortium (

World Wide Web

Often referred to as WWW or the Web, this usually refers to information available on the Internet that can be easily accessed with software usually called a “browser.” Organizations publish their information on the Web in a format known as HTML (or more recently in XML with an accompanying CSS or XSL stylesheet). This information is usually referred to as their “home page” or “web site”.


Some elements, such as chapter, have important semantic significance. Other elements serve no obvious purpose except to contain a number of other elements. For example, bookinfo has no important semantics; it merely serves as a container for the meta-information about a book. Elements that are just containers are sometimes called “wrappers.”


The Extensible Markup Language, a subset of SGML designed specifically for use over the Web.


XML Style Language, an evolving language for stylesheets to be attached to XML documents. The stylesheet is itself an XML document.