PySgrep - Python Wrappers for Sgrep

Author: Dave Kuhlman
Address:
dkuhlman@rexx.com
http://www.rexx.com/~dkuhlman
Revision: 1.1a
Date: Sept. 18, 2006
Copyright: Copyright (c) 2006 Dave Kuhlman. All Rights Reserved. This software is subject to the provisions of the MIT License http://www.opensource.org/licenses/mit-license.php.

Abstract

This document describes PySgrep, an Python wrapper for sgrep.

Contents

1   What is PySgrep?

PySgrep is a Python extension module that enables Python code to call and control the functionality in Sgrep. You can learn more about Sgrep and obtain the original sgrep source at the sgrep home page.

2   Where to Get PySgrep

You can obtain PySgrep in either a Zip file or a tar gzip file here: http://www.rexx.com/~dkuhlman/#pysgrep.

3   Building and Installing PySgrep

To build PySgrep, do the following:

  1. Unroll the original sgrep source code:

    tar xvzf sgrep-1.94a.tar.gz
    
  2. Change to the sgrep directory and unzip pysgrep.zip:

    cd sgrep-1.94a
    unzip pysgrep-???.zip
    
  3. Configure sgrep:

    ./configure
    
  4. Build PySgrep:

    python setup.py build
    
  5. To install PySgrep, do the following:

    python setup.py install
    

4   Calling PySgrep

In order to use PySgrep, do the following:

  1. Import the module:

    import sgreplib
    
  2. Set the call-back object (see below). For example:

    class Collector:
        def __init__(self):
            pass
        def write(self, msg):
            print '(Collector.write)', msg,
    o
    o
    o
    callBackObject = MyCallBackObject()
    sgreplib.set_callback_object(callBackObject)
    
  3. Execute the query. For example:

    query = '"<name>" __ "</name>"'
    options = (
        '-o',
        '%f %i %j %l "%r"\\n',
        #'-F',
        #'files1.dat',
        query,
        'data/people.xml',
        'data/outline.xml',
        'data/people_1.xml',
    )
    sgreplib.execute_query_with_args(options)
    

Rational for this style of interaction with PySgrep -- The option passing style was chosen for the following reasons: (1) It was easy to implement; it required the least amount of alteration to sgrep code. (2) Once you have learned sgrep command line options and arguments, you have learned the options needed to use PySgrep.

5   Receiving Results From PySgrep

PySgrep passes results back to the caller through a call-back object. A PySgrep call-back object is an instance of any class that implements a method named write which takes a single argument (the text returned). Here are some simple examples of PySgrep call-back classes:

class Collector:
    def __init__(self):
        pass
    def write(self, msg):
        print '(Collector.write)', msg

class ListCollector:
    def __init__(self):
        self.collection = []
    def write(self, msg):
        self.collection.append(msg)
    def getCount(self):
        return len(self.collection)
    def getCollection(self):
        return self.collection

During the execution of a query, the write method in the call-back object will be called whenever there is a carriage return in the result stream. Hint: You can exercise some control over when the write method is called by inserting carriage returns in the result stream with the "-o" option. For example, the following options would cause the write method to be called for each region found:

-o '%r\n'

And, the following options would cause the write method to be called twice for each region found:

-o '%f(%i)\n%r\n'

6   Interface to sgreplib

6.1   set_callback_object(object)

Set the object which will receive the query results. The object should be an instance of a class that implements a write method that takes a single argument, the result string. The write method will be called whenever there is a carriage return in the result stream.

6.2   set_error_callback_object(object)

Set the object which will receive the error messages. The object should be an instance of a class that implements a write method that takes a single argument, the error message string. The write method will be called whenever there is a carriage return in the error message stream.

6.3   execute_query_with_args(args)

Execute a query. args is a tuple or list of strings that obeys the sgrep rules for command line arguments. So, for example, if you wish to perform a query that is equivalent to the following:

$ sgrep -o '%f %i %j %l "%r"\n' '"<name>" __ "</name>"' people.xml outline.xml

then you might use the following python script:

query = '"<name>" __ "</name>"'
options = (
    '-o',
    '%f %i %j %l "%r"\\n',
    query,
    'people.xml',
    'outline.xml',
)
sgreplib.set_callback_object(Collector())
sgreplib.execute_query_with_args(options)