The optimizations presented on this page are essentially term re-writing. The follow-up course develops optimizations in a semantic style, based on calculation rather than re-writing.
This article will demonstrate not only that context-sensitive transformations of tagless-final DSLs are possible -- such optimizations are modular, convenient and composable. They are also typed and type-preserving, which is clear not only to us but also to the compiler. As we extend the DSL with new expression forms (such as first-class functions, conditionals, exceptions, etc.) we may still be able to use the previously written optimizations as they are. Many optimizations, such as arithmetic expression simplifications, deal only with a subset of the language. The tagless-final approach lets us express them succinctly, abstracting from the irrelevant parts of the language. Not only optimizations are clearer; they are also robust with respect to language extensions, applicable to any extended language.
We will distinctly see the advantages of the tagless-final DSL embedding compared to the ``deep embedding'', that is, representing the DSL as a data type (AST) in the host language. The extensibility is not the only advantage. Although optimizations are AST transformations, each optimization prefers its own AST form. (The abstract syntax tree may have several subtly different realizations, as we shall see later.) The tagless-final framework lets each optimization work with its own preferred AST view.
This article is written as a tutorial of the tagless-final optimization framework and uses a very simple example. The framework has been successfully used in a complex, practical application, of compiling and optimizing language-integrated queries. Although this tutorial uses OCaml, it may just as well be presented in Haskell.
Tagless-final, systematic optimizations in Haskell and OCaml: Tutorial
The extended version of the tutorial on optimizations in
the tagless-final style -- with the different running example: the
DSL of logic circuits. The tutorial has been given in Haskell and OCaml.
Finally, Safely-Extensible and Efficient Language-Integrated Query
The present framework has been used for real-life projects,
such as the compositional and extensible language-integrated query.
The user-composed query expression is optimized into the form
from which the efficient SQL (without nested subqueries) is generated.
Tagless-final optimizations, algebraically and semantically
The follow-up tutorial, stressing semantics of optimizations, their
correctness, and calculation rather than re-writing
type expr = Int of int | Add of expr * exprThis data type represents the ``deep'' embedding of the DSL in OCaml. This is not the road we take. Rather, we introduce the signature for the two, so far, basic forms of our language; integer literals and the addition operation:
module type SYM = sig
type 'a repr
val int: int -> int repr
val add: int repr -> int repr -> int repr
end
Here, the (for now, abstract) type 'a repr stands for some
representation of language expressions, indexed by the expression
type. One may think of SYM as specifying the DSL syntax as a
context-free grammar: just read int repr as ``the start symbol'' and
val int and val add as two productions labeled int and add. We
do not need know what 'a repr really is to write sample DSL
expressions, such as:
module Ex2(I:SYM) = struct
open I
let res = add (add (int 1) (int 2))
(add (int 3) (int 4))
end
(As above, the name I will be used for a generic interpreter,
an instance of SYM.)
To evaluate that expression we write an interpreter for it:
module Ru = struct
type 'a repr = 'a
let int x = x
let add = (+)
end
The DSL operations are mapped directly to the corresponding OCaml
operations: Ru is a meta-circular interpreter for the tiny subset of OCaml.
Correspondingly, a DSL expression is represented by the OCaml value
it evaluates to. SYM hence can also be viewed as the signature
of the DSL interpreters -- or, in loftier terms, as the specification
of the denotational semantics for the language: 'a repr
defines the domain, and int and add give the meaning to DSL integer
literals and the addition in this domain. The denotation
for complex expressions is determined compositionally, from the denotations
of their sub-expressions. To evaluate the sample Ex2 we
interpret it with the Ru interpreter: let module M = Ex2(Ru) in M.res,
which gives the OCaml value 10, the denotation of Ex2
in Ru as the OCaml integer.
The interpreter Ru is not the only way to give the meaning to DSL
expressions. We way also interpret them as strings, so to display them.
The denotation for an expression is hence its printed representation:
module Sh = struct
type 'a repr = string
let int = string_of_int
let add x y = "(" ^ x ^ " + " ^ y ^ ")"
end
Interpreting the same Ex2 using Sh, as
let module M = Ex2(Sh) in M.res, now gives the string
"((1 + 2) + (3 + 4))".
Besides evaluating DSL expressions we may also want to transform them. A simple example is negating the expression. The negation transformation can also be written as an interpreter (after all, interpreting is the only thing we can do with tagless-final expressions).
module Neg(From:SYM) = struct
type 'a repr = 'a From.repr
let int x = From.int (-x)
let add e1 e2 = From.add e1 e2
end
Neg interprets the DSL in terms of another interpreter, From
(which we will be calling just F).
On the surface of it, Neg is the interpreter transformer. Dually,
it can also be regarded as the expression transformer:
module Ex2Neg(F:SYM) = Ex2(Neg(F))
Ex2Neg has the same type as the original Ex2: given an interpreter
SYM it computes the denotation of its DSL expression in that
SYM 's domain. That is, Ex2Neg is a tagless-final representation
of a DSL expression -- namely, the negated one. It can be interpreted
with the existing Ru and Sh interpreters, or even the Neg interpreter
(i.e., it can be negated again):
let module M = Ex2Neg(Ru) in M.res
-10
let module M = Ex2Neg(Sh) in M.res
"((-1 + -2) + (-3 + -4))"
let module Ex2NegNeg(F:SYM) = Ex2Neg(Neg(F)) in
let module M = Ex2NegNeg(Ru) in M.res
10
(The evaluation result is shown, indented, underneath each expression.)
In the following we write more such DSL transformers -- the interpreters that not only evaluate or print an expression but also to re-write it, in generally context-sensitive, apparently non-compositional ways.
The sample optimization is simple: apply the algebraic laws of
addition x+0 = 0+x = x to eliminate the additions to zero. We assume for
simplicity that all integer literals are non-negative. It is an
exercise to the reader to remove that assumption.
The optimization is expressed as an interpretation in the domain of partially-known values, which track and propagate the available static information. In our case, this information is whether the value is statically known to be zero.
module SuppressZeroPE(F:SYM) = struct
type 'a repr = {dynamic: 'a F.repr; known_zero : bool}
let int x = {dynamic = F.int x; known_zero = (x = 0)}
let add e1 e2 = match (e1,e2) with (* add is not recursive! *)
| ({known_zero=true},x) -> x
| (x,{known_zero=true}) -> x
| ({dynamic=x},{dynamic=y}) -> {dynamic=F.add x y; known_zero=false}
end
As typical of partially-known values, 'a repr has two components.
The dynamic part is the (transformed) expression to be interpreted
by the source F interpreter. Since this interpreter is abstract at
this point, nothing is known about 'a F.repr. The
second component of 'a repr tells if this partially-known value is
in fact fully known, as zero. We find this out when
interpreting DSL integer literals, which are known literally.
The interpretation of the addition propagates this knowledge
and applies the algebraic laws in the straightforward way.
Let us take a moment to convince ourselves of the correctness.
The inductive structure of interpreters makes structural
induction proofs straightforward. First we need a lemma that if e is
a value of the type 'a repr and e.known_zero is true, e
represents zero. The proof is trivial. The soundness can be stated as
the theorem that if E is a tagless-final encoding of an expression
and I is an interpreter in which the monoid law of addition
holds, then E(I).res and E(SuppressZeroPE(I)).res.dynamic
have the same value. We can also prove a stronger property: if e is
a value of the type 'a repr, it represents zero if and only if
e.known_zero is true. The theorem clearly holds for the base case
int x. In the inductive case add e1 e2 we apply the inductive
hypothesis to e1 and e2 and the assumption that literals are
non-negative. The stronger property entails the completeness of the
optimization: if e is a value of the type 'a repr then e.dynamic
represents the expression that has no additions to zero. The reader is
encouraged to write the detailed proof, perhaps with the help of
their favorite proof assistant.
The following example will demonstrate the optimization:
module Ex3(I:SYM) = struct
open I
let res = add (add (int 3) (add (int 5) (int 0)))
(add (int 1) (int 0))
end
Optimizing it and then getting the Sh interpreter to print out the result
produces
let module Ex3NoZerosPE(F:SYM) = Ex3(SuppressZeroPE(F)) in
let module M = Ex3NoZerosPE(Sh) in M.res
- : int SuppressZeroPE(Sh).repr =
{SuppressZeroPE(Sh).dynamic = "((3 + 5) + 1)"; known_zero = false}
The optimization has worked: zero summands have disappeared. The OCaml
output also shows the problem: what is printed is the partially known
value. These values are used during the optimization; at the end we
would like just the final result. There should be a way
to observe the optimization result and spare the user the
internals of the optimizer. The second problem is extensibility: if
we extend our DSL with, say, first-class functions, we likewise have
to extend SuppressZeroPE to interpret the new expression forms
(function abstraction and applications) in the 'a SuppressZeroPE(Sh).repr domain. These new interpretations are simple
homomorphisms. They are irrelevant for the zero-addition elimination and
are annoying boilerplate. More seriously, they break modularity: every time we
extend the language we have to change all optimizations, even those
that are not affected by the extension. Our optimization framework
overcomes these problems.
In the previous section we have seen that tagless-final DSL transformations are written as interpreter transformations. The transformed interpreter uses the suitable representation for DSL expressions, from which the optimization result is extracted, or `observed', at the end. It makes sense therefore to add the observation function to the interpreter signature:
module type SYMOBS = sig
include SYM
type 'a obs
val observe: 'a repr -> 'a obs
end
In our Ru and Sh interpreters, the representation type is already suitable
for observation and so observe is just the identity. The following functor
will add such a trivial observation:
module AddObs(I:SYM) = struct
include I
type 'a obs = 'a repr
let observe x = x
end
Let us recall again the SuppressZeroPE optimizer from the warm-up
section: SuppressZeroPE(F) interprets DSL expressions as values
of the data type
type 'a repr = {dynamic: 'a F.repr; known_zero : bool}
that includes 'a F.repr, the interpretation of the expression in
some F interpreter (which evaluates the optimized expression). In
addition, 'a repr contains extra data used during the
optimization. The zero-elimination optimization evaluates the DSL
expression and hence builds 'a repr bottom-up. The dynamic part is
the optimized F interpretation and is extracted at the end. The
following captures the pattern of two expression representations, one
with the extra data for the optimization:
module type Trans = sig
type 'a from
type 'a term
val fwd : 'a from -> 'a term (* reflection *)
val bwd : 'a term -> 'a from (* reification *)
end
The type 'a from is the representation of a DSL (sub)expression in
the From interpreter; one may think of it as the optimization result
so far. Since the From interpreter is abstract, nothing is known
about 'a from values and they cannot be inspected. The type 'a term represents the same DSL (sub)expression; it is also the
optimization result so far. Unlike 'a from, the values of 'a term
are known, at least to some extent, and can be inspected. These values
contain the working data for the optimization, its state. The
optimization process builds the 'a term values bottom up, inspecting
those for the already optimized sub-expressions. The next two sections
show the examples. The two representations are related, by the
operations fwd and bwd. The latter is used in particular at the
very end to extract the optimization result from the working data 'a term. The operation fwd builds the working data. The two operations
do not generally define bijection: usually fwd is not surjective and
bwd is not injective. That is, although the composition bwd . fwd
should be the identity, fwd . bwd is typically not. One may think of
bwd as an `abstraction' function: it throws away the data used
internally during the optimization, the optimization state. Then fwd
is a concretization function, which initializes the optimization
state, to some default, empty value. The composition fwd . bwd is
not the identity because the thrown away optimization state is
typically richer than the default state.
The two types and their relationship reminds one of the
normalization-by-evaluation (NBE). In NBE, 'a from will be called an
opaque object representation; 'a term will then be a meta-language
representation, the domain of the (non-standard) evaluation. The
function fwd is then reflection and bwd is reification.
To complete the framework we define a default, generic optimizer:
given a particular instance of Trans (we reserve the name X for
such instances), the optimizer builds the 'a term bottom-up,
from which the 'a from value can be extracted, or observed.
module SYMT(X:Trans)(F:SYMOBS with type 'a repr = 'a X.from) = struct
open X
type 'a repr = 'a term
type 'a obs = 'a F.obs
let int x = fwd (F.int x)
let add x y = fwd (F.add (bwd x) (bwd y))
let observe x = F.observe (bwd x)
end
The reader may have noticed that this optimizer is the silly identity
transformer: the optimization state initialized by fwd is not used
for anything and is later thrown away by bwd. Therefore, the
SYMT-optimized Ex2 is the same as the original Ex2: to be
precise, module M = Ex2(SYMT(X)(I)) in M.observe M.res will always
give the same result as module M = Ex2(I) in M.res for any X and
interpreter I. Concrete optimizers are built as 'deltas' to
SYMT, overriding the interpretations of some DSL forms and actually
using the optimization state. Interpretations of the other forms,
irrelevant for the optimization, remain homomorphisms. The example is
next.
Defining an optimization pass within the framework means specifying
two things. First is the optimization state, the data that govern the
optimization. They are defined by writing an instance of Trans.
Next we need to tell how to use this optimization state and actually
do the optimization. It is specified as a `partial interpreter': an
interpreter for only those DSL forms that are affected by the
optimization. We will be calling that second part IDelta. It is
convenient to put the two parts inside a functor, since they are both
parameterized by the From interpreter, which interprets the
optimized expression. In the case of the zero-addition elimination, the
optimization pass looks as follows.
module SuppressZeroPass(F:SYM) = struct
module X = struct
type 'a from = 'a F.repr
type 'a term =
| Unk : 'a from -> 'a term
| Zero : int term (* if the expression is zero *)
let fwd x = Unk x (* generic reflection *)
let bwd : type a. a term -> a from = function
| Unk x -> x
| Zero -> F.int 0
end
open X
module IDelta = struct
let int n = if n = 0 then Zero else Unk (F.int n)
let add : int term -> int term -> int term = fun x y ->
match (x,y) with
| (Zero,x) -> x
| (x,Zero) -> x
| (Unk x, Unk y) -> Unk (F.add x y)
end
end
The type 'a term is the GADT with two variants: Zero for the
literal zero and Unk for a DSL expression whose zeroness is
statically unknown or forgotten. The GADT hence tells the minimum we
need to know to carry out the zero-addition elimination. The operation
fwd marks its argument as unknown, and bwd forgets all static
knowledge, producing the opaque 'a from value. This is the typical
pattern. It should be clear that 'a term, albeit a data type, is not
at all the deep DSL embedding (such as the expr data type from the
Primer section). For one, 'a term is not recursive. Second, it
makes explicit only the bare minimum -- in our case, if the expression
is literally zero or not.
The partial interpreter shows how the literal zeroness is determined
and used. The match statement in the interpretation of add
expresses the rules of optimization clearly and directly. Since our
language is so simple, IDelta is in fact the full interpreter. Later
on, we extend the DSL with more expression forms; the IDelta however
will remain the same.
The complete optimizer is built by combining the optimization pass with
the default optimizer SYMT and overriding the default interpretations by
those in IDelta. The result is the full, optimizing interpreter:
module SuppressZero(F:SYMOBS) = struct
module OptM = SuppressZeroPass(F)
include SYMT(OptM.X)(F)
include OptM.IDelta (* overriding int and add in SYMT *)
end
Ex3 from the previous section demonstrates the optimization. The result
is shown underneath the expression, indented.
module SuppressZeroSh = SuppressZero(AddObs(Sh))
let module M = Ex3(SuppressZeroSh) in SuppressZeroSh.observe M.res
"((3 + 5) + 1)"
The next section shows a more interesting optimization and its composition
with the present zero-elimination pass.
(add 1 (add 2 (add 3 ...))). In this form, the
first argument of any addition is not itself an addition. Such
expressions can be compiled to a sequence of simple addition
instructions over a single accumulator (think of Z80). We later
compose this optimization with the zero-elimination of the previous
section.
The implementation follows the established pattern, of defining an
instance of Trans with the data type 'a term
representing the optimization state, and the partial interpreter
IDelta that builds and propagates this state.
module LinearizeAddPass(F:SYM) = struct
module X = struct
type 'a from = 'a F.repr
type 'a term =
| Unk : 'a from -> 'a term
(* non-empty difference list *)
| Add : (int from -> int from) * int from -> int term
let fwd x = Unk x (* generic reflect *)
let bwd : type a. a term -> a from = function
| Unk x -> x
| Add (f,n) -> f n
end
open X
module IDelta = struct
let int n = Add ((fun x -> x),F.int n)
let add : int term -> int term -> int term = fun x y ->
match (x,y) with
| (Add(fx,nx), Add(fy,ny)) ->
Add ((fun nil -> fx (F.add nx (fy nil))),ny)
| (Unk x, Add(fy,ny)) ->
Add ((fun nil -> F.add x (fy nil)),ny)
| (Add(fx,nx), Unk y) ->
Add ((fun nil -> fx (F.add nx nil)),y)
| _ -> Unk (F.add (bwd x) (bwd y))
end
end
Again, some may think that the data type 'a term is the deep DSL
embedding. It cannot be: it is not recursive. Further, the addition
is represented differently from the AST data type expr of the Primer
section: Add (f,n) : int term is not a node in a tree but a
non-empty difference list. The 'a term here is different from the
'a term in the previous section. Unlike the typical deep embedding
optimizations that traverse the same AST data type, tagless-final
optimizations employ whatever representation is the most convenient for
them. Different optimizations will likely use different expression
representations.
Right-associated additions are isomorphic to a non-empty list of
summands. We encode it as a non-empty difference list and
build bottom-up. Again the readers are encouraged to prove that
the code above really does so. The proof is an easy induction: after
all, the structural induction is apparent in the IDelta code, at least
from the fact add is not recursive.
Taking Ex3 again as the example, we linearize it and pretty-print the
result:
module LinearizeAdd(F:SYMOBS) = struct
module OptM = LinearizeAddPass(F)
include SYMT(OptM.X)(F)
include OptM.IDelta
end
module SLin = LinearizeAdd(AddObs(Sh))
let module M = Ex3(SLin) in SLin.observe M.res
"(3 + (5 + (0 + (1 + 0))))"
The result is indeed right-associated, but with zero summands. We have
a pass for that, which we tack on:
module SLinZero = SuppressZero(SLin)
let module M = Ex3(SLinZero) in SLinZero.observe M.res
"(3 + (5 + 1))"
Composing the LinearizeAdd optimization with the earlier defined
SuppressZero is as simple as the ordinary functional composition of
the transformers.
In the tagless-final approach, a DSL is defined by the
signature of its interpreters, like the SYM signature from the Primer.
The new DSL of higher-order functions and addition expression has
the following signature:
module type SYMHO = sig
include SYMOBS
val lam: ('a repr -> 'b repr) -> ('a->'b) repr
val app: ('a->'b) repr -> ('a repr -> 'b repr)
end
It reuses the already defined simple DSL of arithmetic expressions,
and its observations, adding to it two new expression forms.
Here is a sample expression of the extended language, with
lambda-abstractions and applications:
module Ex5(I:SYMHO) = struct
open I
let res = observe @@ lam @@ fun x ->
add (add x (add (int 5) (int 0)))
(add
(add (add (int 1) (int 0)) (int 7))
(add
(app (lam (fun y->y)) (int 4))
(int 2)))
end
The inferred signature
module Ex5 : functor (I : SYMHO) -> sig val res : (int -> int) I.obs endtells that our DSL has grown beyond solely
int expressions:
Ex5 represents an int->int function.
We need an Ru-like interpreter to evaluate it. We do not write it
from scratch: rather, we extend the Ru interpreter:
module RuHO = struct
include AddObs(Ru)
let lam f = f
let app f x = f x
end
As the proper extension, it can handle all earlier examples in the
original, simple DSL; it also interprets the extended language:
let module M = Ex2(RuHO) in M.res
10
let module M = Ex5(RuHO) in M.res 6
25
RuHO, like any other tagless-final interpreter, is typed, evaluating an
'a repr DSL expression into the OCaml value of the same type. We did
not talk about types before as
the DSL had only int expressions. It has grown: Ex5 represents an
int->int expression, and is evaluated into the corresponding OCaml
function; the latter is then applied to 6. The evaluation result has
no tags and other wrappers, justifying the name of the approach.
The ShHO interpreter pretty-prints expressions:
let module M = Ex5(ShHO) in M.res
"Lx. (x + (5 + 0)) + (((1 + 0) + 7) + ((Ly. y) 4 + 2))"
(We re-wrote the interpreter with precedence pretty-printing, to
omit useless parenthesis).
Let us see how to optimize the extended expressions. First, we need
a new default interpreter (which is again obtained by extending
SYMT and reusing all of its code):
module SYMTHO(X:Trans)(F:SYMHO with type 'a repr = 'a X.from) = struct
open X
include SYMT(X)(F)
let app x y = fwd (F.app (bwd x) (bwd y))
let lam f = fwd (F.lam (fun x -> bwd (f (fwd x))))
end
There is no second step. The old optimization passes
SuppressZeroPass and LinearizeAddPass can be used as they are:
module SuppressZeroHO(F:SYMHO) = struct
module OptM = SuppressZeroPass(F)
include SYMTHO(OptM.X)(F)
include OptM.IDelta
end
module SuppressZeroHOSh = SuppressZeroHO(ShHO)
let module M = Ex5(SuppressZeroHOSh) in M.res
"Lx. (x + 5) + ((1 + 7) + ((Ly. y) 4 + 2))"
Adding the next pass, to linearize the additions, is a simple
composition, as before:
module LinearizeAddHO(F:SYMHO) = struct
module OptM = LinearizeAddPass(F)
include SYMTHO(OptM.X)(F)
include OptM.IDelta
end
module SLinZeroHO = LinearizeAddHO(SuppressZeroHOSh)
let module M = Ex5(SLinZeroHO) in M.res
"Lx. x + (5 + (1 + (7 + ((Ly. y) 4 + 2))))"
The additions are all associated to the right, and there are no zeros
in sight. Incidentally, Ex5(SLinZeroHO) has the type
sig val res : (int -> int) SLinZeroHO.obs end. The result, just like
the representation of the original Ex5, is indexed by the same type
int->int. The optimization has preserved the type. All
tagless-final optimizations have this property by the very construction.
The intended audience are the users, researchers and students of functional programming. Experience with the typed functional programming and only the basic familiarity with Haskell or OCaml are assumed. The tutorial thus should be applicable across the modern functional programming languages. Rather than a lecture, the tutorial is designed to be an interactive session, explaining the technique on a series of progressively more complex examples and discussing the implementation points and problems with the audience.
Just as a Math talk cannot avoid looking at proofs, in a programming tutorial we cannot help but stare at the code. Befitting its informal style, the tutorial was presented by showing, describing, explaining and discussing the code. Below are the code files, in Haskell and OCaml, with many comments, questions, remarks and quizzes.
AND x x -> x; complete Boolean circuit simplifications; converting to DNF;
handling sharing.slides-PPL.pdf [82K]
A few introductory slides of that Haskell tutorial (in Japanese)
T4: Embedding and optimizing domain-specific languages in the
typed final style
The more extensive (3.5hrs) tutorial, presented at CUFP 2015
in Vancouver, BC, Canada on September 3, 2015
Functional Thursday #30
October 1, 2015. Taipei, Taiwan
The recorded lecture at the Functional Thursday Meetup Taipei
International Summer School on Metaprogramming
Robinson College, Cambridge, 8th to 12th August 2016
Three-lecture course using Haskell as the meta-language
Tagless-final optimizations, algebraically and semantically
The follow-up tutorial, stressing semantics of optimizations, their
correctness, and calculation rather than re-writing