The Symbian XML framework supplies an XML parser plugin which is based on Expat. However, regular users of XML may want to create a new parser plugin that is customised to provide the robustness and features required for their specific purposes.
You create a new parser plugin by implementing the
MParser
interface. A full discussion of how to write a
parser is beyond the scope of this document. The main data structures which you
need are contained in the class TParserInitParams
, which
is typically passed as a parameter to the constructor method of an
MParser
implementation.
TParserInitiParams
has the following member classes.
CCharSetConverter
Used to convert text to and from Unicode.
MContentHandler
The interface to the application which you write to handle the output of the parser: discussed in Using Symbian XML Framework.
RStringDictionaryCollection
A collection of string dictionaries: discussed in
Customising a Parser. A string
dictionary is an implementation of the MStringDictionary
interface: it is used to tokenise XML input into tagged elements in accordance
with the DTD associated with the document to be parsed.
RElementStack
An array structure used to stack elements in the order in which the parser encounters them.
The Symbian OS XML framework defines certain standard features which a parser may have, and in designing your parser you should consider which features it will provide. The features concern the formatting of the input and output and the information which the parser reports to the calling program in addition to the tags and parsed text. The following is a list of features.
The parser reports unrecognised tags.
The parser reports an error when it encounters unrecognised tags.
The parser reports the namespace.
The parser reports the namespace prefix.
The parser reports the namespace mappings.
The parser converts elements and attributes to lower case: that is, it is case-insensitive like an HTML parser.
The parser describes the data in a specified encoding: the default is UTF-8.
Allow external entities to appear as attribute values: the default is to raise an error when this happens.
The parser accepts XML 1.0 and XML 1.1. The default is to accept XML 1.0 only.
The parser sends all the content for an element in a single chunk: selection of this feature affects the implementation of the parsing methods.
The following list contains the six methods of the
MParser
which must be implemented in a parser plugin.
Three of them concern the parser features listed above. Two other methods
perform the parsing: their purpose is to implement the parse functions of the
CParser
class discussed in
Using Symbian XML Framework.
The method Release()
is provided because interfaces do not
have destructor functions.
EnableFeature():
Enables one of the parser
features. The input parameter is a flag defined in the enumeration
.TParserFeature
.
DisableFeature():
Disables one of the parser
features. The input parameter is a flag defined in the
enumeration.TParserFeature
.
IsFeatureEnabled():
Checks whether one of the
parser features is enabled. The input parameter is a flag defined in the
enumeration .TParserFeature
.
ParseChunkL():
Parses part of a document.
Implements CParser::ParseL().
ParseLastChunkL():
Parses the last part of a
document: may be called with null input. Implements CParser::ParseEndL().
Release():
Must be called to release resources
when the framework has finished using the parser implementation.
Some documents contain markup from more than one XML application, which
means that the parser may encounter tags and attributes which look the same but
belong to different namespaces. This is why the MParser
interface provides for the reporting of namespaces. XML associates tags and
attributes with namespaces by adding a prefix to them and the prefixes are
mapped to the URI where the namespace is defined. The class
RTagInfo
is provided to hold this information. It is
initialised with three strings representing the URI, prefix and local name, and
has three functions to retrieve the information: in its three members
Uri()
, Prefix()
and
LocalName()
. If you want your application to parse
documents which combine multiple namespaces, your implementation of
MParser
should hold a parsed tag and attributes in an
RTagInfo
object. The content handler will then have
sufficient information to react differently to tags in different namespaces.
Some XML applications, notably WBXML, extend XML syntax by adding
extension tokens to the markup language. The
WBXML specification defines
nine global extension tokens but does not assign semantics to them. The meaning
of extension tokens is specific to the document in which they are used (users
are free to give them any significance whatever), but they are typically used
in combination with compression to identify certain data which needs to be
compressed in a specific way. For instance, extension tokens are sometimes used
to identify data as being variables not constants, or as having a particular
data type. To handle extension tokens, a parser plugin must implement the
method WbxmlExtensionHandler::OnExtensionL()
with three
parameters aData
, aToken
,
aErrorCode
. The first parameter holds the actual data, the
second specifies the global extension token and the third is the error code.
When a parse fails, the parser object must be destroyed. This means that
the implementation of the MParser
and
MContentHandler
methods must contain calls to
User::LeaveIfError()
with an error code as parameter.
Specific error codes are supplied for various cases: they are discussed in the
Error Codes section of this
document.
class CMyParser : public MParser
{
public:
/** Enable a feature. */
virtual TInt EnableFeature(TInt aParserFeature)
{ // your code here to enable the specified feature
}
/** Disable a feature. */
virtual TInt DisableFeature(TInt aParserFeature)
{ // your code here to disable the specified feature
}
/** See if a feature is enabled. */
virtual TBool IsFeatureEnabled(TInt aParserFeature) const
{ // your code here to check if the specified feature is enabled
}
/** Parses a descriptor that contains part of a document. */
virtual void ParseChunkL(const TDesC8& aChunk)
{ // your code here
}
/** Parses a descriptor that contains the last part of a document. */
virtual void ParseLastChunkL(const TDesC8& aFinalChunk)
{ // your code here
}
/** Interfaces don't have a destructor, so we have an explicit method instead. */
virtual void Release()
{ // your code here
}
};
You sometimes want to use one of the parser plugins supplied with the XML framework but need to modify it to suit the structure of a particular document: in particular it is common to modify the WBXML parser. This section explains how to customise the WBXML parser.
You have to parse a WBXML document with a DTD which has not previously been implemented for the Symbian OS XML framework. This means that you need to add a new string table representing the DTD.
Parsers use string dictionaries to convert a file of text strings into a stringpool of RString objects: these are a Symbian OS C++ construct designed to perform comparison and manipulation of strings very rapidly. String pools are discussed in the Symbian OS Guide.The principle behind them is to construct a table of frequently occurring strings, to calculate integer constants representing the offset of each string from the beginning of the table, and to process the integers instead of the strings. A tool exists to perform these calculations and create the C++ code: all you have to do is create the input to the tool in the form of a string table.
A string table is a text file having the extension .st. It contains the name of C++ enumeration constants paired with the strings they refer to. Each pair occupies a line of text and its two elements are separated by white space, as in this example
stringtable Wml1_1CodePage00TagTable
EA a
EAnchor anchor
EAccess access
The simplest use for a string table arises when you use WBXML as a method of compressing generic XML. In such a case you simply create a single .st file for all the frequent strings which you expect it to encounter. In our example scenario the task is slightly more complex because you are parsing a specific XML application which conforms to a DTD. A DTD specifies elements, and perhaps also attributes and attribute values, and these must be held in three separate .st files. You create the files as described above: it is the file containing attribute values which requires care. The left hand column of an attribute value string table must be exactly the same as the left hand column of the corresponding attribute string table. That is, it must list the same constant names and list them in the same order. The right hand column of an attribute value table contains the values defined for the attributes. However it may be that no value is defined for some attributes: in this case the attribute value table contains a line consisting only of the constant name, followed not by white space but by the end of the line. The following two examples show a fragment of an attribute string table and the corresponding attribute value string table.
EAcceptcharset accept-charset
EAlign1 align
EAlign2 align
EAlign3 align
EAcceptcharset
EAlign1
EAlign2 bottom
EAlign3 top
In this example, the attribute 'accept-charset' has no value defined for it, so the constant name 'EAcceptCharset' is paired with nothing in the attribute value table. The attribute 'align' may take no value or the values 'bottom' and 'top': therefore the first table pairs it with three different constant names and the second table pairs the constant names with nothing, with 'bottom' and with 'top'.
The data structure used to define a DTD is called a code page: a set of string tables as described above is an implementation of a code page. When the string tables are converted into C++ the data in them is held in a structure called a string dictionary. Since the same XML application may have more than one DTD, there may be more than one code page and the associated string dictionaries are held in a structure called a string dictionary collection, with functionality to switch between one code page and another.
You convert the string tables to C++ by invoking the conversion tool from
the build files when you compile your parser. The conversion tool can be found
in ...\epoc32\tools\
The Symbian OS Guide explains how to customise the .mpp and bld.inf files for your
project to call the tool at build time.
The XML framework is designed to manage numerous parser implementations
and has functionality to choose the implementation most suited to the current
document. The criteria used to make the selection are held in the
Xml::CMatchData
class. When this information does not
force the selection of exactly one parser, the framework defaults firstly to
choose a Symbian-supplied parser if present: otherwise it will choose the one
with the lowest UID. When you have created a parser implementation you also
create a resource file which supplies this information. The field
implementation_uid should contain the UID of the plugin, the default_data field
should contain the document type it can parse, and the opaque_data field should
specify the supplier (Symbian or other). The following is a specimen resource
file.
RESOURCE REGISTRY_INFO validatorInfo
{
dll_uid = 0x10273863;
interfaces =
{
INTERFACE_INFO
{
interface_uid = 0x101FAA0B;
implementations =
{
IMPLEMENTATION_INFO
{
implementation_uid = 0x10273864;
version_no = 2;
display_name = "Example parser";
default_data = "text/xml||text/wbxml";
opaque_data = "LicenseeX";
}
};
}
};
}
Error codes are supplied in the header file xmlframeworkerrors.h. They refer to six areas of functionality and the names are self-explanatory. When a parser fails, it typically generates a Leave() function with the appropriate error code as a parameter. A plugin may not require some of the error codes, depending on its functionality: for instance if string dictionaries are not used neither is the associated error code.
Plugin selection errors are returned by the framework when ECom fails to
supply a plugin. KErrXmlGeneratorPluginNotFound
is
supplied although the current framework does not include an XML generator.
KErrXmlPluginNotFound
is returned when a call to construct
a content processor fails.
|
Charset converter errors are returned by CCharSetConverter. A character set may be either not supported at all or not available: not available means that there is no functionality to convert to and from that character set.
|
String dictionary errors These are returned by the automatically generated string dictionary code.
|
General errors refer to an entire document rather than local parse failures.
|
There is only one error code associated with the parser selection
functionality. KErrXmlMoreThanOneParserMatched
is only an
error if the flag KXmlLeaveOnManyFlag
is set.
|
The constants KErrXmlFirst
and
KErrXmlLast
are not error codes but the bounds of the XML
error message space: they allow you to specify that you only want to handle XML
errors.
|