extract_doc.py -- Extract Python Source Code Documentation

Dave Kuhlman

[email protected]
http://www.rexx.com/~dkuhlman

Release 1.0a
July 22, 2003

Front Matter

Copyright (c) 2003 Dave Kuhlman. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Abstract:

This document describes extract_doc.py which is a program for extracting documentation from Python source code files and producing reStructuredText output.

1 1 Description

extract_doc extracts documentation embedded in Python source code files.

Currently, it generates reStructuredText. An extension that generates LaTeX for the Python LaTeX documentation system is being investigated.

extract_doc is derived from and uses code in pydoc.py from the Python standard library.

One goal of extract_doc is to provide code that is simple enough so that the implementation and the output it produces can be customized for specific applications or by specific users.

2 2 Where to Get It

You can find a distribution file for extract_doc at: http://www.rexx.com/~dkuhlman/extract_doc.zip.

3 3 How to Use extract_doc

Here is the usage information from extract_doc:

Usage:
    python extract_doc.py [options] <module_name>
Options:
    -h, --help      Display this help message.
    -r, --rest      Extract to decorated reST.
    -l, --latex     Extract to Python LaTeX (module doc type). Not implemented.
    -p, --pager     Use a pager; else write to stdout.
    -o, --over      Use over *and* under title adornment, else only under.
Example:
    python extract_doc.py -r mymodule1
    python extract_doc.py -p -o -r mymodule2

3.1 3.1 Command line flag descriptions

-r, --rest: Generate reStructuredText output.
-l, --latex: Generate LaTeX for the Python LaTeX documentation system. Not yet implemented.
-p, --pager: Use a pager, else write to stdout. Selects a pager and pushes generated output through the pager. On my system it selects less.
-o, --over: Generate over and under title adornment, else generate under title adornment only.

4 4 How to Modify extract_doc

extract_doc contains one important class: ReSTDoc. It is a subclass of class Doc in module pydoc in the Python standard library. As such, it should have followed other sub-classes of class Doc closely. However, it does not. ReSTDoc is a fairly radical re-write of TextDoc. This re-write had these goals:

Produce reStructuredText (rather than text or HTML).
Provide code that is simple, consistent, and clear enough so that others can understand and modify it.

Basically, I want it to produce reStructuredText and to enables others to customize the reStructuredText that it produces for their individual needs.

The current class ReSTDoc produces reStructuredText. You can try it for yourself.

Here is a bit of guidance for the second aspect of the goal, i.e. modifiability:

Output is accumulated by calling self.push(line) for each line of text to be produced.
There are four functions that produce output. They are as follows:
- docmodule is called for the module. It is responsible for producing the documentation for a module.
- docclass is called for each class. It is responsible for producing the documentation for a class.
- docroutine - Called for each method (in a class) and each function (at top level in a module). It is responsible for producing the documentation for a method or a function.
- docother - Called for data members. It is responsible for producing the documentation for a data member.
Module inspect from the Python standard library is used to obtain the internals of an object such as its members, to determine the type of an object (e.g. method or function), format the arguments for a function, etc.
Function getdoc in module pydoc is called to get the documentation for an object, for example the documentation for a module, a class, a method, or a function.
There is a method (emphasize) to emphasize a piece of text. It adds asterisks around the text.

In order to produce your own customized documentation extraction capability, you might want to do the following:

Copy class ReSTDoc.
Modify methods docmodule, docclass, docroutine, and docother in class ReSTDoc.
copy function extract_to_rest.
Modify function extract_to_rest:
- Add your own title, preferatory stuff, etc. Note where method genTitle is called and where the "Generated by ..." content is added.
- Add your own end-of-doc content. Add this after the call to formatter.document().

5 5 Related Work

5.1 5.1 PySource - Python Source Reader

This documentation extractor takes a very different approach. It is not modelled on pydoc in the Python standard library. It does not use the inspect module from the Python standard library. (I grepped for "inspect" in sandbox/davidg/pysource_reader.) The documentation says that it:

"... scans a parsed Python module, and returns an ordered tree containing the names, docstrings (including attribute and additional docstrings), and additional info ..."

The approach followed by PySource appears more complex than that of extract_doc, but also more powerful. I'm going to guess that the start-up time for a simple-minded programmer (like me) to begin modifying and customizing PySource for user specific needs would be longer than for extract_doc.

I'd appreciate any comments and comparisons that others might have.

6 6 Credits

Thanks to the developers of Docutils, in particular, David Goodger, project lead.

Thanks to Ka-Ping Yee for pydoc.

pydoc - Documentation generator and online help system http://www.rexx.com/ dkuhlman]

View document source. Generated by Docutils from reStructuredText source.

About this document ...

extract_doc.py -- Extract Python Source Code Documentation, July 22, 2003, Release 1.0a

This document was generated using the LaTeX2HTML translator.

LaTeX2HTML is Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds, and Copyright © 1997, 1998, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The application of LaTeX2HTML to the Python documentation has been heavily tailored by Fred L. Drake, Jr. Original navigation icons were contributed by Christopher Petrilli.

The reStructuredText to Python LaTeX translator (writer) was developed by Dave Kuhlman with extensive help from the Docutils project and is available from CVS at the Docutils project page at SourceForge.net in project ``sandbox'' under ``dkuhlman''.