MTXml.h File Reference


Detailed Description

MoSync Tiny XML parser.

MTXml is a simple XML parser which can handle most XML 1.0 and 1.1 documents. It has a SAX-like interface, and is re-entrant in that it can take a partial XML document and request additional data when needed.

In the interests of performance, MTXml is not a conforming XML processor, as defined by the W3C Recommendation. It does not validate documents, and it only checks a few of the well-formed-ness criteria. It even ignores some "fatal errors". Still, it should be able to properly parse a well-formed document.

It must be fed sufficiently large portions of data to parse at least one Name or Attribute in one go.

MTXml is a destructive parser, which means that it modifies the input data during parsing. For example, it overwrites some key bytes with null terminators for callback delivery. The processing of UTF-8 and entity data replaces the original bytes with Latin-1 characters.

MTXml does not keep any state beyond the bare minimum required for its parsing. For example, there is no tag stack. Not even the name of the latest tag is saved. Thus, you will probably need to keep your own state variables, different for each document type that you parse.

If you use mtxFeed(), the parser treats all text as Latin-1, passing UTF-8 without decoding.

If you use mtxFeedProcess(), the parser will determine if UTF-8 is used and, if so, convert all strings reported by MTX callbacks to Latin-1. It will also convert standard entity references. (XML and HTTP 4.01 Latin-1)

UTF-16 is not supported.

It should not crash or freeze on any input, but it will give strange data in callbacks if fed a document that is not well-formed, or a document it doesn't support.

See also:
http://www.w3.org/TR/xml11/ - The W3C Specification of XML 1.1 (online)

http://www.w3.org/TR/html401/sgml/entities.html#h-24.2 - HTTP 4.01 character entity references, Latin-1 (online)

#include <ma.h>

Namespaces

namespace  Mtx

Classes

struct  MTXContext
class  Mtx::XmlListener
class  Mtx::MtxListener
class  Mtx::Context

Typedefs

typedef MTXContext MTXContext

Functions

void mtxStart (MTXContext *context)
int mtxFeed (MTXContext *context, char *data)
int mtxFeedProcess (MTXContext *context, char *data)
void mtxStop (MTXContext *context)
int mtxProcess (MTXContext *context, char *data)
unsigned char mtxBasicUnicodeConvert (MTXContext *context, int unicode)


Typedef Documentation

typedef struct MTXContext MTXContext
 


Function Documentation

void mtxStart MTXContext context  ) 
 

Initializes a context's internal state. Must be called before the first call to mtxFeed() with this context.

int mtxFeed MTXContext context,
char *  data
 

Parses data in a context.

The data is null-terminated. It needn't be the entire XML document; MTXContext::dataRemains() will be called with any data that couldn't be completely parsed. You can then call this function again when you have more data.

You must not call this function from within an MTXml callback. Doing so would corrupt the parser's internal state.

Returns:
Non-zero if mtxStop() was called from a callback within the call to this function, zero otherwise.

int mtxFeedProcess MTXContext context,
char *  data
 

Parses data in a context.

Data sent to callbacks will have its UTF-8 characters and standard entities converted to Latin-1.

See also:
mtxFeed()

void mtxStop MTXContext context  ) 
 

If called from within an MTXml callback, this function ensures that no more callbacks will be called during this feed and that it is safe to call mtxStart() on the context.

However, it is still not allowed to call mtxFeed() from within a an MTXml callback.

If called from outside an MTXml callback, this function has no effect.

int mtxProcess MTXContext context,
char *  data
 

Processes data, converting UTF-8 and standard entities to Latin-1. Returns the length of the processed string, or < 0 on error. Does not cause any callbacks, even on error, except MTXContext::unicodeCharacter(). Does not modify the context.

unsigned char mtxBasicUnicodeConvert MTXContext context,
int  unicode
 

A basic implementation of MTXContext::unicodeCharacter(). Converts the following codepoints into their Latin-1 equivalents:
2000 - 200A: space.
2010 - 2015: dash.
2018 - 201B: apostrophe.
201C - 201F: double quotes.
2022: middle dot.
20AC: euro sign.
2122: trade mark sign.

Returns 0 for all other input, which causes a parse error.

Note:
The number of conversions performed by this function may be increased in future versions.
Parameters:
context Ignored.
unicode The unicode character to be converted.


Generated on Sat Feb 13 00:15:38 2010 for MoSync 2 beta 1 by  doxygen 1.4.6-NO