Dave's Page

Author: Dave Kuhlman
Address:
dkuhlman@rexx.com
http://www.rexx.com/~dkuhlman
Revision: 1.1f
Date: Jan. 2, 2007
Copyright: Copyright (c) 2004 Dave Kuhlman. This documentation is covered by The MIT License: http://www.opensource.org/licenses/mit-license.

Abstract

Open Source software projects by Dave Kuhlman. These projects are implemented in or for Python. These projects center around XML, parsing XML, etc. They provide tools for building data mapping and Web services. Keywords are: python, xml, editor, text processing, python training.

Contents

1   Caution

Work in Progress.

You are entering an alpha-ware zone.

2   Training Documents for Python

Here are several documents that are intended as part of a self-training course and as materials for Python training:

2.1   Python 101 -- Beginning Python

Beginning Python programmers and Pythonista-wannabe's can start with Python 101 -- Beginning Python.

2.2   Python 201 -- (Slightly) Advanced Python

Look at Python 201 -- (Slightly) Advanced Python for several slightly more advanced topics on Python programming.

2.3   Proposed Python courses

Here is a summary of several courses that I am prepared to teach: Python Course Descriptions.

And, here are more detailed course outlines of these courses:

3   Python Software and Python Extensions

3.1   generateDS.py -- Generate Python data bindings from XML Schema

generateDS.py generates Python data structures from an Xschema document. It generates a file containing: (1) a Python class for each element definition, (2) a parser (using minidom from PyXML) for XML documents that satisfy the Xschema document. The class definitions contain: (1) a constructor with initializers for member variables, (2) get and set methods for member variables, (3) a 'build' member function used during parsing to populate an instance, (4) an 'export' function that will re-create the XML element in an XML document.

The distribution now contains a SWIG subdirectory containing support for processing the XML documents that describe an interface which are generated by SWIG 1.3 (with SWIG's "-xml" switch). It can serve as a reasonably extensive example of the use of generateDS.py and can also be used as a basis for building processors for the XML output from SWIG.

Here is some documentation on generateDS.py.

And, here is the implementation of generateDS and associated files.

A bit of documentation and a sample/test file is included.

Here is an analysis document that compares the use of generateDS for performing transformations on XML documents with the use of XSLT for this same purpose.

I've also compared the use of generateDS.py with the Gnosis/objectify library. You can find this Gnosis/objectify comparison document here.

generateDS.py is used extensively in my work on FSM/REST. See, for example, my work with AOLserver and FSM/REST.

3.2   PySgrep

I'm working on Python wrappers for Sgrep, a tool that searches structured text. Here is more information about PySgrep.

You can build PySgrep by unrolling the original sgrep distribution (sgrep-1.94a.tar.gz), then unrolling pysgrep-1.1a.zip on top of it. (If you do not have unzip, use pysgrep-1.1a.tar.gz instead.) See pysgrep.html or README_pysgrep for details.

3.3   libxml_saxlib

libxml_saxlib is a Python extension module that enables you to use the SAX interface of libxml in order to parse XML documents.

You can read about libxml_saxlib at libxml_saxlib.html.

And, you can find a version that will build on Linux using Python's Distutils libxml_saxlib-1.1a.tar.gz.

3.4   libxml_domlib

libxml_domlib is a Python extension module that enables you to use the DOM interface of libxml in order to parse XML documents. You can read about libxml_domlib libxml_domlib.html. And, you can find a version that will build on Linux using Python's Distutils at libxml_domlib-1.2a.tar.gz.

3.5   libxsltmod

libxsltmod is a Python extension module that enables you to use libxslt to perform XSLT transformations from Python scripts. You can read about libxsltmod at libxsltmod.html. And, you can find a version that will build on Linux using Python's Distutils here: libxsltmod-1.5a.tar.gz.

3.6   rxpop -- SAX support for PyXML built on RXP

rxpop in intended as an alternative to pyexpat and sgmlop parsers. It is an alternative in the sense that the parser driver is implemented in C. It is an interface to the SAX parser in RXP, which is available at http://www.cogsci.ed.ac.uk/~richard/rxp.html.

A few notes on rxpop are at rxpop.html.

And, rxpop itself is at rxpop.zip.

3.7   libxmlop -- SAX support for PyXML built on libxml

libxmlop in intended as an alternative to pyexpat and sgmlop parsers. It is an alternative in the sense that the parser driver is implemented in C. It is an interface to the SAX parser in libxml2, which is available at http://xmlsoft.org.

A few notes on libxmlop are at libxmlop.html.

And, libxmlop itself is at libxmlop.zip.

3.8   Tree support for libxml using SWIG

pytreeswiglibxml provides SWIG generated wrappers for the tree support available in libxml2 which is available at http://xmlsoft.org.

First several qualifications -- There is tree support for libxml which comes with the libxml2 distribution. That is the official support, and the effort that has gone into it is much more extensive than what is describe in this document.

Still, the tree support provided here does show what can be done with SWIG. It shows that we can very quickly produce extensive and usable support for a large C library using SWIG.

There are restrictions and limitations:

  • There are some types of nodes that either must be avoided or, when used, must be used in a restricted way. Basically, to use this wrapping you will need to be aware of the different types of tree nodes implemented by libxml and the capabilities and restrictions of each type of node.
  • Some memory management must be explictly preformed. In particular, when you are finished with a document (tree), you should call xmlFreeDoc() and not use the document or nodes in it after you have done so.

On the positive side, this implementation does satisfy several desirable requirements:

  • The tree, the nodes in it, and the connections between those nodes are represented in C, not in Python. This means that creating the tree is quite fast and does not take up space for Python objects. And, because the links between nodes are represented in C and not in Python, we do not have to worry about circular references between Python objects.
  • Python objects that wrap the tree and nodes in it (the shadow classes generated by SWIG) (1) are created as needed and (2) are destroyed when not needed (e.g. when the reference count reaches zero). A consequence of this is that we can walk a tree and, if we re-use and over-write the same variable, we will not keep Python objects for a large number of nodes.

Tree support for libxml2 generated with SWIG is available at pytreeswiglibxml-1.0a.tar.gz.

3.9   dtGenerator.py -- Generate Python data type implementation

Python enables you to define new data-types that can be manipulated from Python scripts. (See http://www.python.org/doc/current/ext/defining-new-types.html for instructions on how to do that.) One way to implement a new Python data-type is to copy the file Objects/xxobject.c in the source distribution of Python and then start replacing text and hacking on it. xxobject.c puts you many steps ahead of starting from scratch (many thanks to the Python development crew for providing it), but in my work on libxml_domlib, I had to implement several data-types and even starting with xxobject.c became tedious.

So I wrote dtGenerator.py to do some of the work for me. Basically, dtGenerator.py begins with a template that is very similar to xxobject.c, then does some of the replacement for you. It also generates skeletons of "getter" functions.

You can read some notes about dtGenerator.py in dtGenerator.html.

You can find a copy of it at dtGenerator.zip.

3.10   SWIG XML -- Generate XML (for Python) from SWIG

Note: SWIG (since at least version 1.3.15) contains built-in support for generating XML. And, this support is more extensive that the SWIG extension described here. My recommendation is that the built-in support be used. The SWIG XML support described here may be of interest if you need to learn how to extend SWIG.

This package provides an extension to SWIG that enables SWIG to generate XML (instead of Python code, Perl code, Java code, etc). The generated XML code can serve as input to a code generator (possibly written in Python or a code analysis system.

Also included in this package is a Python module that can parse the XML output from the SWIG extension and create a tree of Python objects that represent the SWIG XML. Note that this Python module was generated by generateDS.py.

In order to use this SWIG extension, you must download the CVS development version of SWIG.

You can read more about swigxml at swigxml.html.

You can find the files needed to build this extension to SWIG at swigxml-1.0a.zip.

SWIG 1.3 now provides the ability to generate XML documents. You should consider that support more official than what I have implemented. It's output is very extensive, and you will not have to build anything extra. If you decide to use it, you may want to look at the SWIG sub-directory in generateDS distribution, which contains Python support for processing the XML output from SWIG 1.3. And, if you use the XML capability in SWIG 1.3, be sure to send a message to the SWIG group thanking them and encouraging them to keep supporting it.

3.11   Special purpose scriptable text editors

I've been experimenting with technology for constructing scriptable editors for specialized tasks. Python is my choice for a scripting language.

3.11.1   pyeditor -- A Python scriptable text editor

I've created a scriptable text editor containing a Python command line using pyscintilla and PyGtk. pyscintilla is a wrapping that exposes the Scintilla editor and its capabilities. PyGtk exposes the Gtk toolkit to Python.

I've implemented much of the higher level functionality in Python. You can extend the editor and add new operations in Python.

In order to use pyeditor, you will need pyscintilla from Archaeopteryx and also PyGtk.

Here is more information on pyeditor.

And here is a distribution file for the python scriptable editor.

3.11.2   WxEditor

WxEditor is an attempt to provide functionality similar to pyeditor, but built on top of wxPython and the wxStyledTextControl in wxPython. It needs more polish. pyeditor (above) is much more usable. But there is enough there to show what can be done.

In order to use WxEditor, you will need wxPython, which you find here http://wxPython.org.

And you can find WxEditor wxeditor-1.0a.zip.

A few words about what this technology gives you -- pyeditor is implemented in Python. It uses the Python wrapping (pyscintilla) implemented in Python and C++, which provides access to the editor component (Scintilla) implemented in C++. WxEditor is also implemented in Python, but is built on top of the wxPython GUI toolkit and the wxStyledTextCtrl which in turn is built on Scintilla. This make it relatively easy to add new features to either editor. For example, it is straight forward to add a new menu item and, depending on the complexity of the feature, to implement that feature in Python using calls to the GUI toolkit (pygtk or wxPython) and calls to the Scintilla text editor widget (pyscintilla of wxStyledTextCtrl). Basically, you can modify these editors to make them do anything that Python + pyscintilla/Scintilla + PyGtk can do, or in the case of WxEditor, what Python + wxStyledTextCtrl/Scintilla + wxPython can do.

I'm hoping that these editors can serve as a starting point and basis for the implementation of future customized and special-purpose text editors built on Python and either pyscintilla and PyGtk or wxPython. Of special interest: libxml_domlib, libxml_saxlib, and libxsltmod are usable from within the editor. For example, it is quite easy to feed the contents of the current selection to the SAX parser exposed by libxml_saxlib.

3.12   Data Mapping

I'm working on solutions to the problem of mapping (and converting) XML documents onto Python data structures and back. Here are some results from that work.

3.12.1   XSLT transformations

This technique uses XSLT to transform an XML document into a canonical XML document, and then to load that XML document into Python data structures.

You can learn more about this technique in this document on data mapping transforms

And a sample of how to use it is at XsltDatamapping.zip.

3.13   A Parser for RELAX NG Compact Syntax

I've implemented (most of) a parser in Python for the RELAX NG compact syntax.

It's written in Python and uses PLY (yet another implementation of lex and yacc for Python). It produces a parse tree whose nodes are instances of a class ASTNode, which is defined in the parser module.

It's recognizes most but not all of the compact syntax, but, hopefully, recognizes enough to make it useful, and can be extended when necessary.

You can find documentation on the parser here: http://www.rexx.com/~dkuhlman/relaxngcompact.html

And, you can find a distribution file here: http://www.rexx.com/~dkuhlman/relaxngcompact-1.0a.tar.gz

3.14   A Generator for Adapters/Wrappers for Java Code

generate_wrappers.py generates support files that enable Python to use the classes and methods in a Java source code file.

You can find documentation here: http://www.rexx.com/~dkuhlman/generate_wrappers.html.

And, the distribution is here: http://www.rexx.com/~dkuhlman/generate_wrappers-1.0a.tar.gz.

4   Pylons

Here is a quick start document on Pylons: Pylons Quick Site Development.

5   Zope, CMS, CPS, etc

Zope is powerful, but has a long, steep learning curve. This sections has documentation and support for Zope.

5.1   CPS

These documents offer support on building sites with CPS.

5.1.1   Notes on Customizing a CPS Site

I'm working through the process of customizing a CPS site and developing an application with CPS. You can read notes on this here: Notes on Customizing a CPS Site.

5.1.2   A Workflow Implementation Procedure

This document contains notes on how to implement a business process as a CPS workflow. You can read it here: A Workflow Implementation Procedure.

5.1.3   Understanding and using the CPS Remote Controller

Here is a document that explains CPSRemoteControl, which is a CPS product that enables you to manipulate your CPS site using XML-RPC. You can read it here: Understanding and using the CPS Remote Controller.

6   Applications and Samples and Documentation

6.2   Support for AOLserver and PyWX

6.2.1   AOLserver and PyWX

I've been exploring the use of AOLserver, PyWX (Python on top of AOLserver), and PostgreSQL (with AOLserver and Python).

Here is a how-to document on my experiences with AOLserver. You will also find sections on using the Quixote templating language and on using Quixote with AOLserver and PyWX.

6.3   Support for Quixote and REST Etc

You can find my writings about Quixote and REST and so on here: http://www.rexx.com/~dkuhlman/quixote_index.html.

6.4   Amazon Web services

Amazon.com has a Web services interface. It supports two styles, one of which is XML over HTTP, which is REST-like. Here is a bit of support for that XML over HTTP, REST-like interface to Amazon Web services. It helps you to parse and process the XML response documents from Amazon.com.

Here is documentation on Amazon Webservices support.

And, the code is at amazon_ws_support-1.0.tar.gz.

7   Text Processing

7.1   ODF writer for Docutils

rst2odt.py/odtwriter.py is a writer for Docutils that translates reST (reStructuredText) into an ODF (Open Document Format) .odt file which is usable with the OpenOffice.org toolset.

Documentation -- You can learn more about odtwriter here: documentation on Odtwriter for Docutils.

Distribution -- The distribution file is here: source distribution of odtwriter.

odtwriter is also available via Subversion from the Docutils repository under docutils/sandbox/dkuhlman/OpenDocument/. The following will download Docutils including odtwriter and associated files into your current directory:

$ svn checkout svn://svn.berlios.de/docutils/trunk docutils

For more information about access to the Docutils Subversion repository, see: http://docutils.sourceforge.net/docs/dev/repository.html.

7.2   A Docutils writer for the Documenting Python system

I've written a reStructuredText writer for use with the Python project's documentation tools. It is intended to be used as part of the Docutils tool set.

A brief introduction is at rstpythonlatex_intro.html.

And, there is a distribution file at rstpythonlatex-1.0b.zip. The distribution contains a README and a bit of additional documentation.

7.3   Python LaTeX Setup Information

I frequently use the "Documenting Python" system for producing documentation on Python topics. This system translates LaTeX into various viewable formats. So, I've written some documentation and some support on how to setup for processing documents with the Python LaTeX documentation system.

I also use reStructuredText (reST) to create the LaTeX files that I feed to the "Documenting Python" system. In order to do so, I've extended Docutils with the ability to translate reStructuredText to Python LaTeX. You can learn more about Docutils and reStructuredText at the Docutils home page.

Here is documentation that describes how to do the set-up. needed for this processing.

And, here is a distribution file for Python LaTeX setup. that contains the source document, Makefile, etc.

7.5   Macros for the JED Text Editor

JED is a powerful but light-weight text editor. I've used a variety of text editors, and JED is my favorite.

This document has a number of macros that I find especially useful. You can find it here: Macros for the JED Text Editor.

8   More Python Stuff

8.1   Python Comments

I've written various notes about Python and Jython and Training.

8.2   A Python XML FAQ and How-to

I've written a small document which might help you get started on processing XML with Python. You can find it at http://www.rexx.com/~dkuhlman/pyxmlfaq.html.

8.3   SciTE Python Properties

SciTE is a very nice text editor for editing Python code on both Linux and MS Windows. It has lots of features. However, I've found that I have had to customize the Python properties file so that SciTE will use 4 spaces and no tabs for indentation.

Here are a few lines of code that you can copy and paste into your python.properties file in order to get this behavior. Add them below the lines that define file.patterns.py, which is near the top of python.properties:

#
# Use standard Python indentation and block comment characters.
#
tabsize.$(file.patterns.py)=4
indent.size.$(file.patterns.py)=4
use.tabs.$(file.patterns.py)=0
comment.block.python=##
comment.block.at.line.start.python=1

9   Utilities and miscellaneous information

9.1   zip-ls -- A Zip file listing program written in Python

I've implemented a Zip file listing program that gives me some of the listing and formatting options that I've wanted from unzip -l and unzip -Z. It's written in Python using the zipfile module from the Python standard library.

Documentation on this program is at zip-ls.html.

And, there is a distribution file is at zip-ls.zip

9.2   Computer Assembly How-to

I've assembled my own computer. And, it works.

So, I've written a document that attempts to help you assemble a computer from components such as a case and power supply, CPU, motherboard, hard disk drive, etc. You can find this document here: Computer Assembly How-to.

9.3   Installation of IPTables-firewall on Debian

I've installed Arno's IPtables-firewall on my gateway machine. It provides a firewall and also does NAT (network address translation) and IP masquerading. So it both protects the machines on my small sub-net and gives them access to the Internet.

You can find instructions on how to install this firewall on a Debian system (Libranet Debian GNU/Linux, in my case) here: Installation of IPTables-firewall on Debian.