A Compilation object manages the classes, methods, and
temporary state generated as a result of compiling a
single top-level ModuleExp.
class Compilation
{ ...;
ClassType[] classes;
boolean immediate;
public ClassType addClass
(LambdaExp lexp, String name)
{ ... }
public ClassType(ModuleExp exp, ...)
{ ...; addClass (exp, ...); }
}
Each Compilation may create one or more ClassType
objects, each of which generates
the bytecodes for one class. Each ClassType is generated
from a LambdaExp, including the top ModuleExp.
The boolean immediate is true if we are compiling for immediate loading,
and is false if the target is one or more .class files.
The addClass method does all the work to
compile a given LambdaExp. It creates a ClassType,
adds it to Compilation's classes array,
and generates Method objects for the constructor and the
main apply method.
Once the Xapply
XMethod has been created, addClass
emits some bytecodes to set up the
incoming parameters, and then invokes the virtual compile
method on the body of the LambdaExp, which generates
the code that does the actual work of the procedure.
The Compilation constructor gets a ModuleExp,
which it passes to addClass. The compile
method of LambdaExp (which gets called for all lambdas
except the dummy top-level) also calls addClass to generate
the class corresponding to the lambda, and then it emits
instructions to create a new instance of the generated Procedure class,
and pushes it on the Java stack.
Most operations in the Java VM leave their result on the VM stack,
where they are available for succeeding operations.
The obvious and general way to compile an expression is therefore to generate
bytecode instructions that leave the result (in the form
of a Object reference) on the stack.
This handles most cases quite well, but we can do better.
We specify a Target parameter when invoking
the compile method; the Target
specifies where to leave the result.
public abstract class Target
{ ...;
public abstract void compileFromStack
(Compilation comp, Type stackType);
public static final Target Ignore
= new IgnoreTarget();
}
An Expression's compile method
does not have to handle all the kinds of Targets,
as long as it can generate code to leave the result on the VM stack,
and then invoke compileFromStack,
which is responsible for moving the result to the actual target.
The simplest Target is an
IgnoreTarget. It is used when the
result of an expression will be ignored, but we still need to evaluate
it for possible side-effects. The implementation of
IgnoreTarget.compileFromStack just emits
an instruction to pop a value from the VM stack.
Expressions that have no side-effects can check if the target
is an IgnoreTarget, and then immediately
return. This saves a useless push-pop pair.
The usual Target is an
StackTarget. This specifies that an expression
should leave the result on the VM stack.
Normally, the type of the result is Object,
but a StackTarget can specify some other
expected type, when that can be determined.
The implementation of StackTarget.compileFromStack
is also trivial: If the type of the result on the stack is a sub-type
of the expected target type, nothing needs to be done;
otherwise, it generates code to do the type conversion.
Things get more interesting when we come to
ConditionalTarget.
public class ConditionalTarget
extends Target
{ ...;
public Label ifTrue, ifFalse;
}
A ConditionalTarget is used when compiling
the test expression in a conditional. The expression is
evaluated as a boolean value; if the result is true,
control transfers to ifTrue; otherwise
control transfers to ifFalse.
Using ConditionalTarget makes it straight-forward
to generate optimal code for nested conditionals, including
and and or macros, and
(when inlining) functions such as not
and eq?.
The ClassType and Method classes are in a separate
gnu.bytecode package, which is an intermediate-level
interface to code generation and Java .class files.
It is essentially independent of Scheme or the rest of Kawa.
class ClassType extends Type
{ ...;
CpoolEntry[] constant_pool;
Method methods; // List of methods.
Field fields; // List of fields.
public Field addField
(String name, Type type, int flags)
{ ...Create new field... }
public method addMethod(String name,...)
{ ...Create new method... }
public void writeToStream
(OutputStream stream) { ... }
public void writeToFile(String filename)
{ ... }
public byte[] writeToArray()
{ ... }
}
The ClassType class is the main class of the bytecode package.
It manages a list Fields, a list of Methods,
and the constant pool.
There are utility methods for adding new fields, methods, and constant
pool entries.
When the ClassType has been fully built, the
writeToFile method
can be used to write out the contents into a file.
The result has the format of a .class file [JavaVMSpec].
Alternatively, the class can be written
to an internal byte array (that has the same layout as a .class
file) using the writeToArray method.
The resulting byte array may be used by a ClassLoader
to define a new class for immediate execution.
Both of the these methods are implemented on top of the more
general writeToStream.
Each method is represented by a Method object.
class Method implements AttrContainer
{ ...;
Type[] arg_types;
Type return_type;
Attribute attributes;
}
An AttrContainer is an object that
contains zero or more Attributes.
The Java .class file format is quite extensible.
Much of the information is stored in named attributes.
There are standard attributes, but an application can also define
new ones (that are supposed to be ignored by applications that
do not understand them). Each class file may have a set of top-level
attributes. In addition, each field and method may have attributes.
Some standard attributes may have nested sub-attributes.
public abstract class Attribute
{ ...;
AttrContainer container;
String name;
}
An Attribute's container
specifies who owns the attribute.
The attribute also has a name, plus methods
to gets its size, write it out, etc.
The most interesting (and large) standard Attribute
occurs in a method and has the name "Code". It
contains the actual bytecode instructions of a non-native non-abstract method,
and we represent it using CodeAttr.
class CodeAttr extends Attribute
{ ...;
Variable addLocal(Type t, String name)
{ ... }
public void emitLoad(Variable var)
{ ... }
public void emitPushInt(int i)
{ ... }
public void putLineNumber(int lineno)
{ ... }
}
As an example of the level of functionality,
emitPushInt compiles code to push an integer
i on stack.
It selects the right instruction, and if i is too
big for one of the instructions that take an inline value,
it will create a constant pool entry for i,
and push that.
The method addLocal creates a new local variable
(and makes sure debugging information is emitted for it),
while emitLoad pushes the value of
the variable on the stack.
Kawa calls putLineNumber to indicate that the current location
corresponds to a given line number. These are emitted in the
.class file, and most Java interpreters will use them
when printing a stack trace.
We use gnu.bytecode mainly for generating
.class files, but it also has classes to
read .class files, and also classes to print
a ClassType in readable format.
The combination makes for a decent Java dis-assembler.
There are other toolkits for creating or analyzing
.class files, but gnu.bytecode
was written to provide a lot of support for code generation
while having little overhead. For example, some assemblers
represent each instruction using an Instruction
instance, whereas CodeAttr just stores all
the instruction in a byte array. Using a linked list of
Instructions may be more “object-oriented”,
and it does make it easier to do peep-hole optimizations,
but the time and space overhead compared to using an array of bytes is huge.
(If you do need to do peephole optimizations, then it makes
sense to use a doubly-linked list of Instructions,
but to use that in conjunction with CodeAttr.
You will in any case want a byte-array representation for input and output.)
A Scheme quoted form or self-evaluating form expands to a QuoteExp.
Compiling a QuoteExp would seem a trivial exercise, but it is not.
There is no way to embed (say) a list literal in Java code.
Instead we create a static field in the top-level class
for a each (different) QuoteExp in the body we are compiling.
The code compiled for a QuoteExp then just needs
to load the value from the corresponding static field.
The tricky part is making sure that the static field gets
initialized when the top-level class is loaded to the
value of the literal. This is easy when compiling for immediate
execution: after the compiled class has been loaded and initialized, we use
reflection to set the field to the literal value. When compiling
to a class file, things are harder.
The basic idea is that for:
(define (foo) '(3 . 4))
we compile the equivalent of:
class foo extends Procedure0
{
Object static lit1
= Pair.make(IntNum.make(3),
IntNum.make(4));
public Object apply0()
{ return lit1; }
}
When the compiled class foo is loaded, we do:
Class fooCl = Class.forName("foo");
Procedure fooPr
= (Procedure) fooCl.newInstance ();
// Using foo:
Object result = fooPr.apply0 ();
How does the Kawa compiler generate the appropriate Pair.make
expression as shown above? In earlier versions of Kawa, a class whose
instances could be literals implemented
the Compilable interface. Then the compiler just called
the methods in Compilable, and these would generate the
code needed to re-create the literal.
This had the advantage that the compiler was not limited to a few
classes of literals it knew about. One problem is that it caused
cross-package dependencies, since any class that could be a literal
has to implement gnu.expr.Compilable.
Another problem was that writing the Compilable
methods required knowing how gnu.bytecode works.
The key insight is that saving compile-time literals and then restoring
them at run-time is a kind of “object persistence”, similar
to that offered by Java serialization. One option considered was to
use standard serialization, but that required some way to store the
serialized data in a class file.
The article Long-Term
Persistence for JavaBeans suggested an alternative.
I realized that using Externalizable classes provided
the more abstract serialization we needed.
Let us start with a summary: An object that can be a literal must implement
the java.io.Externalizable interface. When the
compiler need to generate a reference to a literal value, it calls the
object's writeExternal method. The
writeExternal method expects an argument that
implements the java.io.ObjectOutput interface.
The actual argument is a LitTable object owned
by the compiler. A writeExternal method
can call the standard ObjectOutput methods, such
as writeObject, writeInt,
and so on. The LitTable remembers the
values that were passed to it in these writeObject,
writeInt, etc calls, and creates an
“argument list” from those values. When the
writeExternal returns, the LitTable
will take the argument list, and look for a matching constructor in
the literal's class, using reflection. It will also look for methods named
make or set.
If found, the compiler will generate a call to the matching method.
Simple values in the argument list will be pushed on the JVM stack directly;
object references will use values generated using previous
calls to writeExternal.
Here are more details on how the compiler selects which method
to invoke to re-create an object.
If the compiler detects a recursive writeObject call
with the object as an argument while writeExternal
is called on the object, then the object cyclically references itself.
In that case, the compiler must take care to construct the object so
that the cycle gets reproduced at run-time.
This is done by first generating a call to the default constructor,
and saving the result in a static field. The argument list is evaluated,
with recursive references using the saved static field. Then we
call a method to properly initialize the object using the evaluated
argument list. The compiler looks for a method named set
with a parameter list matching the argument list. If there is no such
method, the compiler gives up. (In the future, we could use JavaBeans
introspection, or look for a Java2 ObjectStreamField to
determine how to set the properties.)
If an object does not have a cycle, the compiler first looks for a
matching static method named make. If it
does not find one, it looks a matching constructor. In the latter
case, the compiler must also check to see if there is
a zero-argument readResolve method, which Java 2
serialization uses to replace an object by a canonical object.
(Kawa does not require Java 2, but LitTable
does respect this Java 2 extension to serialization.)
As a last resort, the compiler looks for a default constructor
followed by set, as in the cycle case.
The writeExternal method of some classes
may generate a variable number of values.
Therefore, if a constructor or method takes an array parameter, then
LitTable will consider it as matching the
argument list if the latter contains an int
(written using writeInt) followed by that
number of arguments of the component type of the array.
(One could argue that this feature is an unneeded kludge, since
there is the alternative that writeExternal
could just call writeObject
with a single array argument, instead of calling
writeObject a variable number of times.
However, that would expose the internal array into the externalization
protocol, which is wrong.)
Advantages of this implementation include:
No explicit dependency on any class in the the gnu.expr
package. All a class needs to be used in literals is to implement
Externalizable, and provide a matching method
for creating object instances.
No need to write special code to handle literals.
Literals are re-created using efficient code at class initialization time.
Automatically handles cycles and duplicate references. (Standard Scheme does not support self-referential constants, but for example Common Lisp does. See section 25.1.4 “Similarity of Constants” in [CommonLisp2].)
Only requires standard JDK 1.1 features.