=============================================== A Python Parser for the RELAX NG Compact Syntax =============================================== :author: Dave Kuhlman :address: dkuhlman@rexx.com http://www.rexx.com/~dkuhlman :revision: 1.0a :date: Nov. 20, 2003 :copyright: Copyright (c) 2003 Dave Kuhlman. This documentation and relaxngcompact software is covered by The MIT License: http://www.opensource.org/licenses/mit-license. :abstract: This document describes a parser for the RELAX NG compact syntax written in Python. .. sectnum:: :depth: 4 .. contents:: :depth: 4 Introduction ============ ``relaxngcompact`` is a parser for (most of) the RELAX NG compact syntax. It produces a parse tree composed of instances of class ASTNode. The parser is build on PLY_. .. _PLY: http://systems.cs.uchicago.edu/ply/ Limitations ----------- The parser is *not* complete. There are grammatical constructions that it does not process. I hope to fill it out later. In the mean time, because PLY makes it reasonably easy to add new grammar rules, it should be fairly easy to extend the parser to cover constructions that you need. (A few notes on how to do this are included below.) If you add new rules, please pass them back to me. If you do not believe that you can add them yourself, please send me a request. Requirements ------------ - You will need Python, of course. - You will need to install PLY, which is available at: http://systems.cs.uchicago.edu/ply/. Installation Etc ================ Where to get it --------------- A distribution file containing the source, a few sample compact syntax files that the parser has been tested on, and this documentation is at: `relaxngcompact-1.0a.tar.gz`_. .. _`relaxngcompact-1.0a.tar.gz`: http://www.rexx.com/~dkuhlman/relaxngcompact-1.0a.tar.gz Installation ------------ Use the standard Python Disutils installation. Run:: python setup.py build python setup.py install # as root Testing the installation ------------------------ Use the ``run_tests.py`` script in the ``tests`` sub-directory to test the parser. For example:: cd tests python run_tests.py How to Use the Parser ===================== Running the parser ------------------ 1. From the command line -- Examples: - Get help:: $ python relaxngcompact.py --help usage: relaxngcompact.py [options] example: relaxngcompact.py infile.rng Parse RELAX NG compact syntax. options: --version show program's version number and exit -h, --help show this help message and exit -d, --debug run lexer and parser in debug mode - Parse a file and show the parse tree:: $ python relaxngcompact.py test6.rng 2. From within Python code -- Here is an example:: import sys import relaxngcompact tree = relaxngcompact.parse('test5.rng') tree.show(sys.stdout, 0) Using the parse tree -------------------- The ``show`` method in class ``ASTNode`` provides an example which shows how to walk the parse tree:: # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: o o o def show(self, outstream, level): self.show_level(outstream, level) if self.children: childCount = len(self.children) else: childCount = 0 if self.value is None: value = 'None' else: value = '"%s"' % self.value # Show the node type and value (if there is one). outstream.write('Type: %s Value: %s Child #: %d\n' % ( NodeType2NameDict[self.nodeType], value, childCount)) # Show the children. for child in self.children: child.show(outstream, level + 1) def show_level(self, outstream, level): for idx in range(level): outstream.write(' ') Here are a few additional notes: - Constants are provided for the various node types, for example:: NodeType_elementpattern = 1 NodeType_textpattern = 2 NodeType_pattern = 3 o o o - There is a dictionary (``NodeType2NameDict``) that maps the node type constants to descriptive names. For example:: >>> import relaxngcompact >>> relaxngcompact.NodeType2NameDict[relaxngcompact.NodeType_textpattern] 'TextPattern' - Class ``ASTNode`` provides access to the following attributes for each node in the parse tree: - ``getType()``: Returns the node type (e.g. one of ``NodeType_elementpattern``, ``NodeType_textpattern``, ...). - ``getValue()``: Returns the value of the node. If the node contains text, then the value will contain this string. - ``getChildren()``: Returns a list of children (instances of ``ASTNode``). How to Extend the Parser ======================== As mentioned above, the parser is not complete. There are a few grammatical constructions in the RELAX NG compact syntax are not handled. This section gives guidance that will help with adding support for those constructions, among other possible extensions. Note that this section requires an understanding of PLY. In fact, if you are familiar with PLY, then you may be able to skip this section altogether. Adding a new AST node type -------------------------- In order to add a new AST node type, do the following: 1. Add a node type constant. For example:: NodeType_newnodename = 36 2. Add an element to the ``NodeType2NameDict`` dictionary. For example:: NodeType_newnodename: 'NewNodeName', Adding a new token type ----------------------- In order to enable the lexer to recognize a new token (type), do the following: 1. Add a string to the tuple ``relaxngcompact.tokens``. This is the name that you will use in your grammatical function to refer to the token. 2. Define the token: - If the token is punctionation, then add a constant with a name beginning with "t\_". For example:: t_PERIOD = r'\.' - If the token is a keyword, then add a new element to the ``relaxngcompact.reserved`` dictionary. For example:: 'newtoken': 'NEWTOKEN', where: - ``newtoken`` is the string to be recognized by the lexer. - ``NEWTOKEN`` is the name that will be used to refer to the token in grammar rules/functions. Adding a new grammar rule ------------------------- In order to add a new grammar rule that recognizes a new grammatical expression, do the following: 1. Add a new AST node type to represent the expression. See above. 2. If the rule must recognize any new tokens not already defined, then add new token types. See above. 3. Add a new parsing function to implement the grammar rule. Here is an example:: def p_newrulename(t): 'newrulename : NEWTOKEN other_rule_1 COMMA other_rule_2' node = ASTNode(NodeType_newrule, t[2], t[4]) t[0] = node Explanation: - The function name must begin with "p\_". - The doc string for the function contains the grammar rule. - In the body of the function/rule, ``t[1]``, ``t[2]``, ... refer to the first, second, ... components of the matched expression. - The rule/function returns the node that it constructs by assigning a value to ``t[0]``. 4. If the rule must recognize any complex, nested expressions (for example ``other_rule1`` and ``other_rule_2``, in our example), and these are not already defined, then add grammar rules for those expressions. How to Process the Parse Tree ============================= The parser creates a parse tree composed of nodes that are instances of class ``ASTNode``. There are several ways that you can process this tree. 1. Add methods to the class ``relaxngcompact.ASTNode``. 2. Write a function that walks the tree and performs some operation at each node. Here is an example that you can use as a skeleton. This one counts the nodes in the tree:: import sys import relaxngcompact def count_nodes(tree): count = 1 for child in tree.getChildren(): count += count_nodes(child) return count def main(): args = sys.argv[1:] tree = relaxngcompact.parse(args[0]) count = count_nodes(tree) print 'count: %d' % count if __name__ == '__main__': main() License ======= Copyright (c) 2003 Dave Kuhlman Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Change History ============== - Version 1.0a (11/20/2003) Initial release. See Also ======== `RELAX NG home page`_: A starting point for information about RELAX NG. .. _`RELAX NG home page`: http://www.relaxng.org/ `PLY (Python Lex-Yacc)`_ -- "yet another implementation of lex and yacc for Python". .. _`PLY (Python Lex-Yacc)`: http://systems.cs.uchicago.edu/ply/ `Python home`_: The Python Web site. .. _`Python home`: http://www.python.org `Docutils`_: Python Documentation Utilities -- This document was formatted with Docutils. .. _`Docutils`: http://docutils.sourceforge.net/