Author: | Dave Kuhlman |
---|---|
Address: | dkuhlman@rexx.com http://www.rexx.com/~dkuhlman |
Revision: | 1.0a |
Date: | Nov. 20, 2003 |
Copyright: | Copyright (c) 2003 Dave Kuhlman. This documentation and relaxngcompact software is covered by The MIT License: http://www.opensource.org/licenses/mit-license. |
Abstract
This document describes a parser for the RELAX NG compact syntax written in Python.
relaxngcompact is a parser for (most of) the RELAX NG compact syntax. It produces a parse tree composed of instances of class ASTNode. The parser is build on PLY.
The parser is not complete. There are grammatical constructions that it does not process. I hope to fill it out later. In the mean time, because PLY makes it reasonably easy to add new grammar rules, it should be fairly easy to extend the parser to cover constructions that you need. (A few notes on how to do this are included below.) If you add new rules, please pass them back to me. If you do not believe that you can add them yourself, please send me a request.
A distribution file containing the source, a few sample compact syntax files that the parser has been tested on, and this documentation is at: relaxngcompact-1.0a.tar.gz.
Use the standard Python Disutils installation. Run:
python setup.py build python setup.py install # as root
Use the run_tests.py script in the tests sub-directory to test the parser. For example:
cd tests python run_tests.py
From the command line -- Examples:
Get help:
$ python relaxngcompact.py --help usage: relaxngcompact.py [options] <infile.rng> example: relaxngcompact.py infile.rng Parse RELAX NG compact syntax. options: --version show program's version number and exit -h, --help show this help message and exit -d, --debug run lexer and parser in debug mode
Parse a file and show the parse tree:
$ python relaxngcompact.py test6.rng
From within Python code -- Here is an example:
import sys import relaxngcompact tree = relaxngcompact.parse('test5.rng') tree.show(sys.stdout, 0)
The show method in class ASTNode provides an example which shows how to walk the parse tree:
# # Representation of a node in the AST (abstract syntax tree). # class ASTNode: o o o def show(self, outstream, level): self.show_level(outstream, level) if self.children: childCount = len(self.children) else: childCount = 0 if self.value is None: value = 'None' else: value = '"%s"' % self.value # Show the node type and value (if there is one). outstream.write('Type: %s Value: %s Child #: %d\n' % ( NodeType2NameDict[self.nodeType], value, childCount)) # Show the children. for child in self.children: child.show(outstream, level + 1) def show_level(self, outstream, level): for idx in range(level): outstream.write(' ')
Here are a few additional notes:
Constants are provided for the various node types, for example:
NodeType_elementpattern = 1 NodeType_textpattern = 2 NodeType_pattern = 3 o o o
There is a dictionary (NodeType2NameDict) that maps the node type constants to descriptive names. For example:
>>> import relaxngcompact >>> relaxngcompact.NodeType2NameDict[relaxngcompact.NodeType_textpattern] 'TextPattern'
Class ASTNode provides access to the following attributes for each node in the parse tree:
As mentioned above, the parser is not complete. There are a few grammatical constructions in the RELAX NG compact syntax are not handled. This section gives guidance that will help with adding support for those constructions, among other possible extensions.
Note that this section requires an understanding of PLY. In fact, if you are familiar with PLY, then you may be able to skip this section altogether.
In order to add a new AST node type, do the following:
Add a node type constant. For example:
NodeType_newnodename = 36
Add an element to the NodeType2NameDict dictionary. For example:
NodeType_newnodename: 'NewNodeName',
In order to enable the lexer to recognize a new token (type), do the following:
Add a string to the tuple relaxngcompact.tokens. This is the name that you will use in your grammatical function to refer to the token.
Define the token:
If the token is punctionation, then add a constant with a name beginning with "t_". For example:
t_PERIOD = r'\.'
If the token is a keyword, then add a new element to the relaxngcompact.reserved dictionary. For example:
'newtoken': 'NEWTOKEN',
where:
In order to add a new grammar rule that recognizes a new grammatical expression, do the following:
Add a new AST node type to represent the expression. See above.
If the rule must recognize any new tokens not already defined, then add new token types. See above.
Add a new parsing function to implement the grammar rule. Here is an example:
def p_newrulename(t): 'newrulename : NEWTOKEN other_rule_1 COMMA other_rule_2' node = ASTNode(NodeType_newrule, t[2], t[4]) t[0] = node
Explanation:
If the rule must recognize any complex, nested expressions (for example other_rule1 and other_rule_2, in our example), and these are not already defined, then add grammar rules for those expressions.
The parser creates a parse tree composed of nodes that are instances of class ASTNode. There are several ways that you can process this tree.
Add methods to the class relaxngcompact.ASTNode.
Write a function that walks the tree and performs some operation at each node. Here is an example that you can use as a skeleton. This one counts the nodes in the tree:
import sys import relaxngcompact def count_nodes(tree): count = 1 for child in tree.getChildren(): count += count_nodes(child) return count def main(): args = sys.argv[1:] tree = relaxngcompact.parse(args[0]) count = count_nodes(tree) print 'count: %d' % count if __name__ == '__main__': main()
Copyright (c) 2003 Dave Kuhlman
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Version 1.0a (11/20/2003)
Initial release.
RELAX NG home page: A starting point for information about RELAX NG.
PLY (Python Lex-Yacc) -- "yet another implementation of lex and yacc for Python".
Python home: The Python Web site.
Docutils: Python Documentation Utilities -- This document was formatted with Docutils.