Symbian
Symbian OS Library

SYMBIAN OS V9.3

[Index] [Spacer] [Previous] [Next]



Extending Symbian OS XML Framework


Writing a Parser Plugin

Example scenario

The Symbian XML framework supplies an XML parser plugin which is based on Expat. However, regular users of XML may want to create a new parser plugin that is customised to provide the robustness and features required for their specific purposes.

Implementing MParser

You create a new parser plugin by implementing the MParser interface. A full discussion of how to write a parser is beyond the scope of this document. The main data structures which you need are contained in the class TParserInitParams, which is typically passed as a parameter to the constructor method of an MParser implementation. TParserInitiParams has the following member classes.

CCharSetConverter

Used to convert text to and from Unicode.

MContentHandler

The interface to the application which you write to handle the output of the parser: discussed in Using Symbian XML Framework.

RStringDictionaryCollection

A collection of string dictionaries: discussed in Customising a Parser. A string dictionary is an implementation of the MStringDictionary interface: it is used to tokenise XML input into tagged elements in accordance with the DTD associated with the document to be parsed.

RElementStack

An array structure used to stack elements in the order in which the parser encounters them.

The Symbian OS XML framework defines certain standard features which a parser may have, and in designing your parser you should consider which features it will provide. The features concern the formatting of the input and output and the information which the parser reports to the calling program in addition to the tags and parsed text. The following is a list of features.

The following list contains the six methods of the MParser which must be implemented in a parser plugin. Three of them concern the parser features listed above. Two other methods perform the parsing: their purpose is to implement the parse functions of the CParser class discussed in Using Symbian XML Framework. The method Release() is provided because interfaces do not have destructor functions.

Some documents contain markup from more than one XML application, which means that the parser may encounter tags and attributes which look the same but belong to different namespaces. This is why the MParser interface provides for the reporting of namespaces. XML associates tags and attributes with namespaces by adding a prefix to them and the prefixes are mapped to the URI where the namespace is defined. The class RTagInfo is provided to hold this information. It is initialised with three strings representing the URI, prefix and local name, and has three functions to retrieve the information: in its three members Uri(), Prefix() and LocalName(). If you want your application to parse documents which combine multiple namespaces, your implementation of MParser should hold a parsed tag and attributes in an RTagInfo object. The content handler will then have sufficient information to react differently to tags in different namespaces.

Some XML applications, notably WBXML, extend XML syntax by adding extension tokens to the markup language. The WBXML specification defines nine global extension tokens but does not assign semantics to them. The meaning of extension tokens is specific to the document in which they are used (users are free to give them any significance whatever), but they are typically used in combination with compression to identify certain data which needs to be compressed in a specific way. For instance, extension tokens are sometimes used to identify data as being variables not constants, or as having a particular data type. To handle extension tokens, a parser plugin must implement the method WbxmlExtensionHandler::OnExtensionL() with three parameters aData, aToken, aErrorCode. The first parameter holds the actual data, the second specifies the global extension token and the third is the error code.

When a parse fails, the parser object must be destroyed. This means that the implementation of the MParser and MContentHandler methods must contain calls to User::LeaveIfError() with an error code as parameter. Specific error codes are supplied for various cases: they are discussed in the Error Codes section of this document.

Sample code

class CMyParser : public MParser
{
public:
/** Enable a feature. */
virtual TInt EnableFeature(TInt aParserFeature)
{ // your code here to enable the specified feature
}

/** Disable a feature. */
virtual TInt DisableFeature(TInt aParserFeature)
{ // your code here to disable the specified feature
}

/** See if a feature is enabled. */
virtual TBool IsFeatureEnabled(TInt aParserFeature) const
{ // your code here to check if the specified feature is enabled
}
 
/** Parses a descriptor that contains part of a document. */
virtual void ParseChunkL(const TDesC8& aChunk)
{ // your code here
}
 
/** Parses a descriptor that contains the last  part of a document. */
virtual void ParseLastChunkL(const TDesC8& aFinalChunk)
{ // your code here
}
 
/** Interfaces don't have a destructor, so we have an explicit method instead. */
virtual void Release()
{ // your code here
}
};

[Top]


Customising a Parser

You sometimes want to use one of the parser plugins supplied with the XML framework but need to modify it to suit the structure of a particular document: in particular it is common to modify the WBXML parser. This section explains how to customise the WBXML parser.

Example scenario

You have to parse a WBXML document with a DTD which has not previously been implemented for the Symbian OS XML framework. This means that you need to add a new string table representing the DTD.

Implementing a new string table

Parsers use string dictionaries to convert a file of text strings into a stringpool of RString objects: these are a Symbian OS C++ construct designed to perform comparison and manipulation of strings very rapidly. String pools are discussed in the Symbian OS Guide.The principle behind them is to construct a table of frequently occurring strings, to calculate integer constants representing the offset of each string from the beginning of the table, and to process the integers instead of the strings. A tool exists to perform these calculations and create the C++ code: all you have to do is create the input to the tool in the form of a string table.

A string table is a text file having the extension .st. It contains the name of C++ enumeration constants paired with the strings they refer to. Each pair occupies a line of text and its two elements are separated by white space, as in this example

stringtable Wml1_1CodePage00TagTable
EA              a
EAnchor         anchor
EAccess         access

The simplest use for a string table arises when you use WBXML as a method of compressing generic XML. In such a case you simply create a single .st file for all the frequent strings which you expect it to encounter. In our example scenario the task is slightly more complex because you are parsing a specific XML application which conforms to a DTD. A DTD specifies elements, and perhaps also attributes and attribute values, and these must be held in three separate .st files. You create the files as described above: it is the file containing attribute values which requires care. The left hand column of an attribute value string table must be exactly the same as the left hand column of the corresponding attribute string table. That is, it must list the same constant names and list them in the same order. The right hand column of an attribute value table contains the values defined for the attributes. However it may be that no value is defined for some attributes: in this case the attribute value table contains a line consisting only of the constant name, followed not by white space but by the end of the line. The following two examples show a fragment of an attribute string table and the corresponding attribute value string table.

EAcceptcharset                  accept-charset
EAlign1                         align
EAlign2                         align
EAlign3                         align

EAcceptcharset
EAlign1
EAlign2                         bottom
EAlign3                         top

In this example, the attribute 'accept-charset' has no value defined for it, so the constant name 'EAcceptCharset' is paired with nothing in the attribute value table. The attribute 'align' may take no value or the values 'bottom' and 'top': therefore the first table pairs it with three different constant names and the second table pairs the constant names with nothing, with 'bottom' and with 'top'.

The data structure used to define a DTD is called a code page: a set of string tables as described above is an implementation of a code page. When the string tables are converted into C++ the data in them is held in a structure called a string dictionary. Since the same XML application may have more than one DTD, there may be more than one code page and the associated string dictionaries are held in a structure called a string dictionary collection, with functionality to switch between one code page and another.

You convert the string tables to C++ by invoking the conversion tool from the build files when you compile your parser. The conversion tool can be found in ...\epoc32\tools\ The Symbian OS Guide explains how to customise the .mpp and bld.inf files for your project to call the tool at build time.

[Top]


Using your parser as a plugin

The XML framework is designed to manage numerous parser implementations and has functionality to choose the implementation most suited to the current document. The criteria used to make the selection are held in the Xml::CMatchData class. When this information does not force the selection of exactly one parser, the framework defaults firstly to choose a Symbian-supplied parser if present: otherwise it will choose the one with the lowest UID. When you have created a parser implementation you also create a resource file which supplies this information. The field implementation_uid should contain the UID of the plugin, the default_data field should contain the document type it can parse, and the opaque_data field should specify the supplier (Symbian or other). The following is a specimen resource file.

RESOURCE REGISTRY_INFO validatorInfo
    {
    dll_uid = 0x10273863;
    interfaces = 
        {
        INTERFACE_INFO
            {
            interface_uid = 0x101FAA0B;
            implementations = 
                {
                IMPLEMENTATION_INFO
                    {
                    implementation_uid = 0x10273864;
                    version_no = 2;
                    display_name = "Example parser";
                    default_data = "text/xml||text/wbxml";
                    opaque_data = "LicenseeX";
                    }
                };
            }
        };
    }

[Top]


Error codes

Error codes are supplied in the header file xmlframeworkerrors.h. They refer to six areas of functionality and the names are self-explanatory. When a parser fails, it typically generates a Leave() function with the appropriate error code as a parameter. A plugin may not require some of the error codes, depending on its functionality: for instance if string dictionaries are not used neither is the associated error code.

Plugin selection errors are returned by the framework when ECom fails to supply a plugin. KErrXmlGeneratorPluginNotFound is supplied although the current framework does not include an XML generator. KErrXmlPluginNotFound is returned when a call to construct a content processor fails.

KErrXmlStringDictionaryPluginNotFound

KErrXmlParserPluginNotFound

KErrXmlGeneratorPluginNotFound

KErrXmlPluginNotFound

Charset converter errors are returned by CCharSetConverter. A character set may be either not supported at all or not available: not available means that there is no functionality to convert to and from that character set.

KErrXmlBadCharacterConversion

KErrXmlUnsupportedCharacterSet

KErrXmlUnavailableCharacterSet

String dictionary errors These are returned by the automatically generated string dictionary code.

KErrXmlUnsupportedElement

KErrXmlUnsupportedAttribute

KErrXmlUnsupportedAttributeValue

KErrXmlUnsupportedAttributeValue

KErrXmlMissingStringDictionary

General errors refer to an entire document rather than local parse failures.

KErrXmlUnsupportedDocumentVersion

KErrXmlDocumentCorrupt

KErrXmlStringPoolTableNotFound

KErrXmlBadIndex

KErrXmlUnsupportedExtInterface

There is only one error code associated with the parser selection functionality. KErrXmlMoreThanOneParserMatched is only an error if the flag KXmlLeaveOnManyFlag is set.

KErrXmlMoreThanOneParserMatched

The constants KErrXmlFirst and KErrXmlLast are not error codes but the bounds of the XML error message space: they allow you to specify that you only want to handle XML errors.

KErrXmlFirst

KErrXmlLast