23 SWIG and Modula-3

This chapter describes SWIG's support of Modula-3. You should be familiar with the basics of SWIG, especially typemaps.

23.1 Overview

The Modula-3 support is very basic and highly experimental! Many features are still not designed satisfyingly and I need more discussion about the odds and ends. Don't rely on any feature, incompatible changes are likely in the future! The Modula-3 generator was already useful for interfacing to the libraries

  1. PLPlot
  2. FFTW .

I took some more time to explain why I think it's right what I'm doing. So the introduction got a bit longer than it should ... ;-)

23.1.1 Why not scripting ?

SWIG started as wrapper from the fast compiled languages C and C++ to high level scripting languages like Python. Although scripting languages are designed to make programming life easier by hiding machine internals from the programmer there are several aspects of today's scripting languages that are unfavourable in my opinion.

Besides C, C++, Cluster (a Modula derivate for Amiga computers) I evaluated several scripting like languages in the past: Different dialects of BASIC, Perl, ARexx (a variant of Rexx for Amiga computers), shell scripts. I found them too inconsistent, too weak in distinguishing types, too weak in encapsulating pieces of code. Eventually I have started several projects in Python because of the fine syntax. But when projects became larger I lost the track. I got convinced that one can not have maintainable code in a language that is not statically typed. In fact the main advantages of scripting languages e.g. matching regular expressions, complex built-in datatypes like lists, dictionaries, are not advantages of the language itself but can be provided by function libraries.

23.1.2 Why Modula-3 ?

Modula-3 is a compiler language in the tradition of Niklaus Wirth's Modula 2, which is in turn a successor of the popular Pascal. I have chosen Modula-3 because of its logical syntax, strong modularization, the type system which is very detailed for machine types compared to other languages. Of course it supports all of the modern games like exceptions, objects, garbage collection, threads. While C++ programmers must control three languages, namely the preprocessor, C and ++, Modula-3 is made in one go and the language definition is really compact.

On the one hand Modula-3 can be safe (but probably less efficient) in normal modules while providing much static and dynamic safety. On the other hand you can write efficient but less safe code in the style of C within UNSAFE modules.

Unfortunately Modula's safety and strength requires more writing than scripting languages do. Today if I want to safe characters I prefer Haskell (similar to OCAML) - it's statically typed, too.

23.1.3 Why C / C++ ?

Although it is no problem to write Modula-3 programs that performs as fast as C most libraries are not written in Modula-3 but in C. Fortunately the binary interface of most function libraries can be addressed by Modula-3. Even more fortunately even non-C libraries may provide C header files. This is where SWIG becomes helpful.

23.1.4 Why SWIG ?

The C headers and the possibility to interface to C libraries still leaves the work for you to write Modula-3 interfaces to them. To make things comfortable you will also need wrappers that convert between high-level features of Modula-3 (garbage collecting, exceptions) and the low level of the C libraries.

SWIG converts C headers to Modula-3 interfaces for you. You could call the C functions without loss of efficiency but it won't be joy because you could not pass TEXTs or open arrays and you would have to process error return codes rather then exceptions. But using some typemaps SWIG will also generate wrappers that bring the whole Modula-3 comfort to you. If the library API is ill designed writing appropriate typemaps can be still time-consuming. E.g. C programmers are very creative to work-around missing data types like (real) enumerations and sets. You should turn such work-arounds back to the Modula-3 way otherwise you lose static safety and consistency.

But you have still a problem: C library interfaces are often ill. They lack for certain information because C compilers wouldn't care about. You should integrate detailed type information by adding typedefs and consts and you should persuade the C library programmer to add this information to his interface. Only this way other language users can benefit from your work and only this way you can easily update your interfaces when a new library version is released. You will realise that writing good SWIG interfaces is very costly and it will only amortise when considering evolving libraries.

Without SWIG you would probably never consider to call C++ libraries from Modula-3. But with SWIG this is worth a consideration. SWIG can write C wrappers to C++ functions and object methods that may throw exceptions. In fact it breaks down C++ libraries to C interfaces which can be in turn called from Modula-3. To make it complete you can hide the C interface with Modula-3 classes and exceptions.

Although SWIG does the best it can do it can only serve as a one-way strategy. That means you can use C++ libraries with Modula-3 (even with call back functions), but it's certainly not possible to smoothly integrate Modula-3 code into a C / C++ project.

23.2 Conception

23.2.1 Interfaces to C libraries

Modula-3 has an integrated support for calling C functions. This is also extensively used by the standard Modula-3 libraries to call OS functions. The Modula-3 part of SWIG and the corresponding SWIG library modula3.swg contain code that uses these features. Because of the built-in support there is no need for calling the SWIG kernel to generate wrappers written in C. All conversion and argument checking can be done in Modula-3 and the interfacing is quite efficient. All you have to do is to write pieces of Modula-3 code that SWIG puts together.

C library support integrated in Modula-3
Pragma <* EXTERNAL *> Precedes a declaration of a PROCEDURE that is implemented in an external library instead of a Modula-3 module.
Pragma <* CALLBACK *> Precedes a declaration of a PROCEDURE that should be called by external library code.
Module Ctypes Contains Modula-3 types that match some basic C types.
Module M3toC Contains routines that convert between Modula-3's TEXT type and C's char * type.

In each run of SWIG the Modula-3 part generates several files:

Module name scheme Identifier for %insert Description
ModuleRaw.i3 m3rawintf Declaration of types that are equivalent to those of the C library, EXTERNAL procedures as interface to the C library functions
ModuleRaw.m3 m3rawimpl Almost empty.
Module.i3 m3wrapintf Declaration of comfortable wrappers to the C library functions.
Module.m3 m3wrapimpl Implementation of the wrappers that convert between Modula-3 and C types, check for validity of values, hand-over resource management to the garbage collector using WeakRefs and raises exceptions.
m3makefile m3makefile Add the modules above to the Modula-3 project and specify the name of the Modula-3 wrapper library to be generated. Today I'm not sure if it is a good idea to create a m3makefile in each run, because SWIG must be started for each Modula-3 module it creates. Thus the m3makefile is overwritten each time. :-(

Here's a scheme of how the function calls to Modula-3 wrappers are redirected to C library functions:

Modula-3 wrapper
Module.i3
generated by Modula-3 part of SWIG
|
v
Modula-3 interface to C
ModuleRaw.i3
generated by Modula-3 part of SWIG
--> C library

I have still no good conception how one can split C library interfaces into type oriented interfaces. A Module in Modula-3 represents an Abstract DataType (or call it a static classes, i.e. a class without virtual methods). E.g. if you have a principal type, say Database, it is good Modula-3 style to set up one Module with the name Database where the database type is declared with the name T and where all functions are declared that operates on it.

The normal operation of SWIG is to generate a fixed set of files per call. To generate multiple modules one has to write one SWIG interface (different SWIG interfaces can share common data) per module. Identifiers belonging to a different module may ignored (%ignore) and the principal type must be renamed (%typemap).

23.2.2 Interfaces to C++ libraries

Interfaces to C++ files are much more complicated and there are some more design decisions that are not made, yet. Modula-3 has no support for C++ functions but C++ compilers should support generating C++ functions with a C interface.

Here's a scheme of how the function calls to Modula-3 wrappers a redirected to C library functions:

Modula-3 wrapper
Module.i3
generated by Modula-3 part of SWIG
C++ library
|
v
^
|
Modula-3 interface to C
ModuleRaw.i3
generated by Modula-3 part of SWIG
--> C interface to C++
module_wrap.cxx
generated by the SWIG core

Wrapping C++ libraries arises additional problems:

Be warned: There is no C++ library I wrote a SWIG interface for, so I'm not sure if this is possible or sensible, yet.

23.3 Preliminaries

23.3.1 Compilers

There are different Modula-3 compilers around: cm3, pm3, ezm3, Klagenfurth Modula-3, Cambridge Modula-3. SWIG itself does not contain compiler specific code but the library file modula3.swg may do so. For testing examples I use Critical Mass cm3.

23.3.2 Additional Commandline Options

There are some experimental command line options that prevent SWIG from generating interface files. Instead files are emitted that may assist you when writing SWIG interface files.

Modula-3 specific options Description
-generateconst <file> Disable generation of interfaces and wrappers. Instead write code for computing numeric values of constants to the specified file.
C code may contain several constant definitions written as preprocessor macros. Other language modules of SWIG use compute-once-use-readonly variables or functions to wrap such definitions. All of them can invoke C code dynamically for computing the macro values. But if one wants to turn them into Modula-3 integer constants, enumerations or set types, the values of these expressions has to be known statically. Although definitions like (1 << FLAG_MAXIMIZEWINDOW) must be considered as good C style they are hard to convert to Modula-3 since the value computation can use every feature of C.
Thus I implemented these switch to extract all constant definitions and write a C program that output the values of them. It works for numeric constants only and treats all of them as double. Future versions may generate a C++ program that can detect the type of the macros by overloaded output functions. Then strings can also be processed.
-generaterename <file> Disable generation of interfaces and wrappers. Instead generate suggestions for %rename.
C libraries use a naming style that is neither homogeneous nor similar to that of Modula-3. C function names often contain a prefix denoting the library and some name components separated by underscores or capitalization changes. To get library interfaces that are really Modula-3 like you should rename the function names with the %rename directive. This switch outputs a list of such directives with a name suggestion generated by a simple heuristic.
-generatetypemap <file> Disable generation of interfaces and wrappers. Instead generate templates for some basic typemaps.

23.4 Modula-3 typemaps

23.4.1 Inputs and outputs

Each C procedure has a bunch of inputs and outputs. Inputs are passed as function arguments, outputs are updated referential arguments and the function value.

Each C type can have several typemaps that apply only in case if a type is used for an input argument, for an output argument, or for a return value. A further typemap may specify the direction that is used for certain parameters. I have chosen this separation in order to be able to write general typemaps for the typemap library modula3.swg . In the library code the final usage of the type is not known. Using separate typemaps for each possible use allows appropriate definitions for each case. If these pre-definitions are fine then the direction of the function parameter is the only hint the user must give.

The typemaps specific to Modula-3 have a common name scheme: A typemap name starts with "m3", followed by "raw" or "wrap" depending on whether it controls the generation of the ModuleRaw.i3 or the Module.i3, respectively. It follows an "in" for typemaps applied to input argument, "out" for output arguments, "arg" for all kind of arguments, "ret" for returned values.

The main task of SWIG is to build wrapper function, i.e. functions that convert values between C and Modula-3 and call the corresponding C function. Modula-3 wrapper functions generated by SWIG consist of the following parts:

Typemap Example Description
m3wrapargvar $1: INTEGER := $1_name; Declaration of some variables needed for temporary results.
m3wrapargconst $1 = "$1_name"; Declaration of some constant, maybe for debug purposes.
m3wrapargraw ORD($1_name) The expression that should be passed as argument to the raw Modula-3 interface function.
m3wrapargdir out Referential arguments can be used for input, output, update. ???
m3wrapinmode READONLY One of Modula-3 parameter modes VALUE (or empty), VAR, READONLY
m3wrapinname New name of the input argument.
m3wrapintype Modula-3 type of the input argument.
m3wrapindefault Default value of the input argument
m3wrapinconv $1 := M3toC.SharedTtoS($1_name); Statement for converting the Modula-3 input value to C compliant value.
m3wrapincheck IF Text.Length($1_name) > 10 THEN RAISE E("str too long"); END; Check the integrity of the input value.
m3wrapoutname Name of the RECORD field to be used for returning multiple values. This applies to referential output arguments that shall be turned into return values.
m3wrapouttype Type of the value that is returned instead of a referential output argument.
m3wrapoutconv
m3wrapoutcheck
m3wrapretraw
m3wrapretname
m3wraprettype
m3wrapretvar
m3wrapretconv
m3wrapretcheck
m3wrapfreearg M3toC.FreeSharedS(str,arg1); Free resources that were temporarily used in the wrapper. Since this step should never be skipped, SWIG will put it in the FINALLY branch of a TRY .. FINALLY structure.

23.4.2 Subranges, Enumerations, Sets

Subranges, enumerations, and sets are machine oriented types that make Modula very strong and expressive compared with the type systems of many other languages.

Using them extensively makes Modula code very safe and readable.

C supports enumerations, too, but they are not as safe as the ones of Modula. Thus they are abused for many things: For named choices, for integer constant definitions, for sets. To make it complete every way of defining a value in C (#define, const int, enum) is somewhere used for defining something that must be handled completely different in Modula-3 (INTEGER, enumeration, SET).

I played around with several %features and %pragmas that split the task up into converting the C bit patterns (integer or bit set) into Modula-3 bit patterns (integer or bit set) and change the type as requested. See the corresponding example. This is quite messy and not satisfying. So the best what you can currently do is to rewrite constant definitions manually. Though this is a tedious work that I'd like to automate.

23.4.3 Objects

Declarations of C++ classes are mapped to OBJECT types while it is tried to retain the access hierarchy "public - protected - private" using partial revelation. Though the implementation is not really useful, yet.

23.4.4 Imports

Pieces of Modula-3 code provided by typemaps may contain identifiers from foreign modules. If the typemap m3wrapinconv for blah * contains code using the function M3toC.SharedTtoS you may declare %typemap("m3wrapinconv:import") blah * %{M3toC%}. Then the module M3toC is imported if the m3wrapinconv typemap for blah * is used at least once. Use %typemap("m3wrapinconv:import") blah * %{MyConversions AS M3toC%} if you need module renaming. Unqualified import is not supported.

It is cumbersome to add this typemap to each piece of Modula-3 code. It is especially useful when writing general typemaps for the typemap library modula3.swg . For a monolithic module you might be better off if you add the imports directly:

%insert(m3rawintf) %{
IMPORT M3toC;
%}

23.4.5 Exceptions

Modula-3 provides another possibility of an output of a function: exceptions.

Any piece of Modula-3 code that SWIG inserts due to a typemap can raise an exception. This way you can also convert an error code from a C function into a Modula-3 exception.

The RAISES clause is controlled by typemaps with the throws extension. If the typemap m3wrapinconv for blah * contains code that may raise the exceptions OSError.E you should declare %typemap("m3wrapinconv:throws") blah * %{OSError.E%}.

23.4.6 Example

The generation of wrappers in Modula-3 needs very fine control to take advantage of the language features. Here is an example of a generated wrapper where almost everything is generated by a typemap:

         (* %relabel  m3wrapinmode m3wrapinname m3wrapintype  m3wrapindefault *)
  PROCEDURE Name     (READONLY       str       :    TEXT    :=      ""       )
              (* m3wrapoutcheck:throws *)
     : NameResult RAISES {E} =
    CONST
      arg1name = "str";                  (* m3wrapargconst *)
    VAR
      arg0   : C.char_star;              (* m3wrapretvar *)
      arg1   : C.char_star;              (* m3wrapargvar *)
      arg2   : C.int;
      result : RECORD
           (*m3wrapretname  m3wraprettype*)
                 unixPath : TEXT;
           (*m3wrapoutname  m3wrapouttype*)
                 checksum : CARDINAL;
               END;
    BEGIN
      TRY
        arg1 := M3toC.SharedTtoS(str);   (* m3wrapinconv *)
        IF Text.Length(arg1) > 10 THEN   (* m3wrapincheck *)
          RAISE E("str too long");
        END;
 (* m3wrapretraw           m3wrapargraw *)
        arg0 := MessyToUnix  (arg1,   arg2);
        result.unixPath := M3toC.CopyStoT(arg0);  (* m3wrapretconv *)
        result.checksum := arg2;         (* m3wrapoutconv *)
        IF result.checksum = 0 THEN      (* m3wrapoutcheck *)
          RAISE E("invalid checksum");
        END;
      FINALLY
        M3toC.FreeSharedS(str,arg1);     (* m3wrapfreearg *)
      END;
    END Name;

23.5 More hints to the generator

23.5.1 Features

Feature Example Description
multiretval %m3multiretval get_box; or %feature("modula3:multiretval") get_box; Let the denoted function return a RECORD rather than a plain value. This RECORD contains all arguments with "out" direction including the return value of the C function (if there is one). If more than one argument is "out" then the function must have the multiretval feature activated, but it is explicitly requested from the user to prevent mistakes.
constnumeric %constnumeric(12) twelve; or %feature("constnumeric","12") twelve; This feature can be used to tell Modula-3's back-end of SWIG the value of an identifier. This is necessary in the cases where it was defined by a non-trivial C expression. This feature is used by the -generateconst option. In future it may be generalized to other kind of values such as strings.

23.5.2 Pragmas

Pragma Example Description
unsafe %pragma(modula3) unsafe="true"; Mark the raw interface modules as UNSAFE. This will be necessary in many cases.
library %pragma(modula3) library="m3fftw"; Specifies the library name for the wrapper library to be created. It should be distinct from the name of the library to be wrapped.

23.6 Remarks