A Python Parser for the RELAX NG Compact Syntax

Author: Dave Kuhlman
Address:
dkuhlman@rexx.com
http://www.rexx.com/~dkuhlman
Revision: 1.0a
Date: Nov. 20, 2003
Copyright: Copyright (c) 2003 Dave Kuhlman. This documentation and relaxngcompact software is covered by The MIT License: http://www.opensource.org/licenses/mit-license.

Abstract

This document describes a parser for the RELAX NG compact syntax written in Python.

Contents

1���Introduction

relaxngcompact is a parser for (most of) the RELAX NG compact syntax. It produces a parse tree composed of instances of class ASTNode. The parser is build on PLY.

1.1���Limitations

The parser is not complete. There are grammatical constructions that it does not process. I hope to fill it out later. In the mean time, because PLY makes it reasonably easy to add new grammar rules, it should be fairly easy to extend the parser to cover constructions that you need. (A few notes on how to do this are included below.) If you add new rules, please pass them back to me. If you do not believe that you can add them yourself, please send me a request.

1.2���Requirements

2���Installation Etc

2.1���Where to get it

A distribution file containing the source, a few sample compact syntax files that the parser has been tested on, and this documentation is at: relaxngcompact-1.0a.tar.gz.

2.2���Installation

Use the standard Python Disutils installation. Run:

python setup.py build
python setup.py install     # as root

2.3���Testing the installation

Use the run_tests.py script in the tests sub-directory to test the parser. For example:

cd tests
python run_tests.py

3���How to Use the Parser

3.1���Running the parser

  1. From the command line -- Examples:

    • Get help:

      $ python relaxngcompact.py --help
      usage:
          relaxngcompact.py [options] <infile.rng>
      example:
          relaxngcompact.py infile.rng
      
      Parse RELAX NG compact syntax.
      options:
        --version    show program's version number and exit
        -h, --help   show this help message and exit
        -d, --debug  run lexer and parser in debug mode
      
    • Parse a file and show the parse tree:

      $ python relaxngcompact.py test6.rng
      
  2. From within Python code -- Here is an example:

    import sys
    import relaxngcompact
    
    tree = relaxngcompact.parse('test5.rng')
    tree.show(sys.stdout, 0)
    

3.2���Using the parse tree

The show method in class ASTNode provides an example which shows how to walk the parse tree:

#
# Representation of a node in the AST (abstract syntax tree).
#
class ASTNode:
    o
    o
    o
    def show(self, outstream, level):
        self.show_level(outstream, level)
        if self.children:
            childCount = len(self.children)
        else:
            childCount = 0
        if self.value is None:
            value = 'None'
        else:
            value = '"%s"' % self.value
        # Show the node type and value (if there is one).
        outstream.write('Type: %s  Value: %s  Child #: %d\n' % (
            NodeType2NameDict[self.nodeType],
            value,
            childCount))
        # Show the children.
        for child in self.children:
            child.show(outstream, level + 1)

    def show_level(self, outstream, level):
        for idx in range(level):
            outstream.write('  ')

Here are a few additional notes:

  • Constants are provided for the various node types, for example:

    NodeType_elementpattern =            1
    NodeType_textpattern =               2
    NodeType_pattern =                   3
        o
        o
        o
    
  • There is a dictionary (NodeType2NameDict) that maps the node type constants to descriptive names. For example:

    >>> import relaxngcompact
    >>> relaxngcompact.NodeType2NameDict[relaxngcompact.NodeType_textpattern]
    'TextPattern'
    
  • Class ASTNode provides access to the following attributes for each node in the parse tree:

    • getType(): Returns the node type (e.g. one of NodeType_elementpattern, NodeType_textpattern, ...).
    • getValue(): Returns the value of the node. If the node contains text, then the value will contain this string.
    • getChildren(): Returns a list of children (instances of ASTNode).

4���How to Extend the Parser

As mentioned above, the parser is not complete. There are a few grammatical constructions in the RELAX NG compact syntax are not handled. This section gives guidance that will help with adding support for those constructions, among other possible extensions.

Note that this section requires an understanding of PLY. In fact, if you are familiar with PLY, then you may be able to skip this section altogether.

4.1���Adding a new AST node type

In order to add a new AST node type, do the following:

  1. Add a node type constant. For example:

    NodeType_newnodename = 36
    
  2. Add an element to the NodeType2NameDict dictionary. For example:

    NodeType_newnodename: 'NewNodeName',
    

4.2���Adding a new token type

In order to enable the lexer to recognize a new token (type), do the following:

  1. Add a string to the tuple relaxngcompact.tokens. This is the name that you will use in your grammatical function to refer to the token.

  2. Define the token:

    • If the token is punctionation, then add a constant with a name beginning with "t_". For example:

      t_PERIOD = r'\.'
      
    • If the token is a keyword, then add a new element to the relaxngcompact.reserved dictionary. For example:

      'newtoken': 'NEWTOKEN',
      

      where:

      • newtoken is the string to be recognized by the lexer.
      • NEWTOKEN is the name that will be used to refer to the token in grammar rules/functions.

4.3���Adding a new grammar rule

In order to add a new grammar rule that recognizes a new grammatical expression, do the following:

  1. Add a new AST node type to represent the expression. See above.

  2. If the rule must recognize any new tokens not already defined, then add new token types. See above.

  3. Add a new parsing function to implement the grammar rule. Here is an example:

    def p_newrulename(t):
        'newrulename : NEWTOKEN other_rule_1 COMMA other_rule_2'
        node = ASTNode(NodeType_newrule, t[2], t[4])
        t[0] = node
    

    Explanation:

    • The function name must begin with "p_".
    • The doc string for the function contains the grammar rule.
    • In the body of the function/rule, t[1], t[2], ... refer to the first, second, ... components of the matched expression.
    • The rule/function returns the node that it constructs by assigning a value to t[0].
  4. If the rule must recognize any complex, nested expressions (for example other_rule1 and other_rule_2, in our example), and these are not already defined, then add grammar rules for those expressions.

5���How to Process the Parse Tree

The parser creates a parse tree composed of nodes that are instances of class ASTNode. There are several ways that you can process this tree.

  1. Add methods to the class relaxngcompact.ASTNode.

  2. Write a function that walks the tree and performs some operation at each node. Here is an example that you can use as a skeleton. This one counts the nodes in the tree:

    import sys
    import relaxngcompact
    
    def count_nodes(tree):
        count = 1
        for child in tree.getChildren():
            count += count_nodes(child)
        return count
    
    def main():
        args = sys.argv[1:]
        tree = relaxngcompact.parse(args[0])
        count = count_nodes(tree)
        print 'count: %d' % count
    
    if __name__ == '__main__':
        main()
    

6���License

Copyright (c) 2003 Dave Kuhlman

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

7���Change History

8���See Also

RELAX NG home page: A starting point for information about RELAX NG.

PLY (Python Lex-Yacc) -- "yet another implementation of lex and yacc for Python".

Python home: The Python Web site.

Docutils: Python Documentation Utilities -- This document was formatted with Docutils.