|
||
Provides a generic, configurable facility for parsing XML and WBXML, with options to combine parsing with validation and auto-correction using a single interface.
The XML framework is based on the SAX (Simple API for XML) specification. It is also designed for use with WBXML (Wireless Binary XML).
The following key concepts refer to XML:
Angled brackets containing a tag name and, optionally, attributes:
for instance <book>
, <book author="Jane
Austen">
A sequence beginning with an opening tag and ending with a closing
tag, possibly containing nested elements: for instance <book>Pride
and Prejudice</book>
The text or character data (CDATA) contained inside the tags: for
instance "Pride and Prejudice"
.
A name-value pair separated by an equals sign: for instance
author="Jane Austen"
One of certain data types defined for attributes: for instance
CDATA
.
A document which defines a particular use of XML (the names, attributes and values permitted)
A well-formed XML document conforms to the general syntax of XML, which is defined in the official XML specification.
A valid XML document conforms to a DTD.
A tag specifying that content should be sent to a named content
processor, sometimes used to enclose scripts. For instance:
<?xml-stylesheet type="text/css" href="test.css?>"
Prefixed to a tag to identify the DTD which defines it.
The web address associated with a prefix: for instance
http://www.w3.org/XML/1998/namespace
.
The association between a prefix and a URI which distinguishes the prefix from synonymous prefixes associated with a different URI.
Outside the content, whitespace is generally meaningless and the parser should ignore it.
The scheme by which text is encoded as a sequence of chars or
integers: for instance "encoding=utf-8"
.
A prefix mapping is a tag which is used to convert the document tag
set into a unique namespace: it is implemented as an attribute with a URI as
its value. For instance: xmlns="http://www.w3.org/1999/xlink"
.
The following key concepts refer to WBXML:
A WBXML document is encoded and decoded by means of a table of frequently encountered strings which the body of the document references by index in order to compress the data.
WBXML extends XML syntax with extension tokens which are used differently by different applications. One use for them is to refer to a string table created specifically for each message and transmitted in the preamble of the message.
The parser framework conforms to the event-based SAX specification. It outputs an event when it starts or finishes reading one of the following:
a document
a start tag
an end tag
a prefix mapping
a processing instruction
character data
ignorable whitespace
Client programs call a single interface which selects plugins to provide the appropriate parser (based on the type of the document to be parsed) and, if required, sets up a chain of further plugins to perform various validations. The string dictionaries used to parse WBXML are also accessed as plugins.
The XML framework consists of classes which model the main constituents of the architecture: the framework as a whole, the parser plugins and extensions to XML, the content processor chain and the content handler mechanism.
The entire parser framework is represented by the
CParser
class: a client with an XML document to be parsed
creates a CParser
object and calls its parse functions.
The information needed to choose a plugin is held in the classes
CMatchData
(data about plugins) and
RDocumentParameters
(data about the document to be
parsed).
Individual parser plugins implement the MParser
interface. They are associated, through the
TParserInitParams
class, with a character set converter
(to convert other formats to Unicode), a string dictionary and an element
stack. The RElementStack
class is the data structure used
to store XML elements: tags are represented by the
RTagInfo
class and attributes by the
RAttribute
class. The Symbian OS Framework is delivered
with two parser plugins, one for XML and one for WBXML. The XML parser consists
of the class CXmlParser, which is wrapped around the class CExpat, an
implementation of the stream-based Expat parser. The WBXML parser is
implemented as the class CWmxmlParser.
The XML framework provides for extensions to XML: at present WBXML is
implemented. WBXML requires use of string dictionaries and extension tokens,
which are represented by the classes
RStringDictionaryCollection
,
MWbxmlExtensionHandler
and
TExtensionTokens
.
Content processors are plugins which perform further operations on
the output of a parser plugin. They implement the interface
MContentProcessor
and are associated, through the
TContentProcessorInitParams
class, with a string
dictionary and element stack. They are organised into chains by the
MContentSource
class which directs the output of each
plugin to the next plugin inthe chain.
A client application which is designed to react to the output of the
XML framework must implement the MContentHandler
interface. The functions to be implemented correspond to the SAX specification
discussed above.