Modular, composable, typed optimizations in the tagless-final style

Optimizations of tagless-final embedded DSLs are possible after all. They are typed and patently type-preserving. We present the optimization framework inspired by the normalization-by-evaluation and demonstrate that tagless-final optimizations are also modular, convenient and composable. We contrast them with the optimizations over the familiar deep DSL embedding as a data type.

The optimizations presented on this page are essentially term re-writing. The follow-up course develops optimizations in a semantic style, based on calculation rather than re-writing.

Introduction

The tagless-final embedding of a (domain-specific) language (DSL) in a host language centers around multiple interpreters. One interpreter may simply evaluate a DSL expression whereas another may pretty-print it or do some sort of static analysis. One question inevitably asked at any presentation of the approach is what to do about operations that are not thought of as interpreters, which are not (easily) expressible as folds over the DSL expression. The typical example is optimizations, which re-organize code in a context-sensitive way.

This article will demonstrate not only that context-sensitive transformations of tagless-final DSLs are possible -- such optimizations are modular, convenient and composable. They are also typed and type-preserving, which is clear not only to us but also to the compiler. As we extend the DSL with new expression forms (such as first-class functions, conditionals, exceptions, etc.) we may still be able to use the previously written optimizations as they are. Many optimizations, such as arithmetic expression simplifications, deal only with a subset of the language. The tagless-final approach lets us express them succinctly, abstracting from the irrelevant parts of the language. Not only optimizations are clearer; they are also robust with respect to language extensions, applicable to any extended language.

We will distinctly see the advantages of the tagless-final DSL embedding compared to the ``deep embedding'', that is, representing the DSL as a data type (AST) in the host language. The extensibility is not the only advantage. Although optimizations are AST transformations, each optimization prefers its own AST form. (The abstract syntax tree may have several subtly different realizations, as we shall see later.) The tagless-final framework lets each optimization work with its own preferred AST view.

This article is written as a tutorial of the tagless-final optimization framework and uses a very simple example. The framework has been successfully used in a complex, practical application, of compiling and optimizing language-integrated queries. Although this tutorial uses OCaml, it may just as well be presented in Haskell.

Version

The current version is September 2014

References

optimization_intro.ml [13K]
The complete OCaml code for this tutorial

Tagless-final, systematic optimizations in Haskell and OCaml: Tutorial
The extended version of the tutorial on optimizations in the tagless-final style -- with the different running example: the DSL of logic circuits. The tutorial has been given in Haskell and OCaml.

Finally, Safely-Extensible and Efficient Language-Integrated Query
The present framework has been used for real-life projects, such as the compositional and extensible language-integrated query. The user-composed query expression is optimized into the form from which the efficient SQL (without nested subqueries) is generated.

Tagless-final optimizations, algebraically and semantically
The follow-up tutorial, stressing semantics of optimizations, their correctness, and calculation rather than re-writing

Tagless-final primer

This section gives a brief introduction to the tagless-final approach, explaining how to embed a DSL and write its interpreters and transformers. The sample DSL is the simplest, to be extended later. It has only integer literals and the addition operation. Its abstract syntax tree could be represented by the following data type:

    type expr = Int of int | Add of expr * expr

This data type represents the ``deep'' embedding of the DSL in OCaml. This is not the road we take. Rather, we introduce the signature for the two, so far, basic forms of our language; integer literals and the addition operation:

    module type SYM = sig
      type 'a repr
      val int: int -> int repr
      val add: int repr -> int repr -> int repr
    end

Here, the (for now, abstract) type 'a repr stands for some representation of language expressions, indexed by the expression type. One may think of SYM as specifying the DSL syntax as a context-free grammar: just read int repr as ``the start symbol'' and val int and val add as two productions labeled int and add. We do not need know what 'a repr really is to write sample DSL expressions, such as:

    module Ex2(I:SYM) = struct
      open I
      let res = add (add (int 1) (int 2))
                    (add (int 3) (int 4))
    end

(As above, the name I will be used for a generic interpreter, an instance of SYM.)

To evaluate that expression we write an interpreter for it:

    module Ru = struct
      type 'a repr = 'a
      let int x = x
      let add   = (+)
    end

The DSL operations are mapped directly to the corresponding OCaml operations: Ru is a meta-circular interpreter for the tiny subset of OCaml. Correspondingly, a DSL expression is represented by the OCaml value it evaluates to. SYM hence can also be viewed as the signature of the DSL interpreters -- or, in loftier terms, as the specification of the denotational semantics for the language: 'a repr defines the domain, and int and add give the meaning to DSL integer literals and the addition in this domain. The denotation for complex expressions is determined compositionally, from the denotations of their sub-expressions. To evaluate the sample Ex2 we interpret it with the Ru interpreter: let module M = Ex2(Ru) in M.res, which gives the OCaml value 10, the denotation of Ex2 in Ru as the OCaml integer.

The interpreter Ru is not the only way to give the meaning to DSL expressions. We way also interpret them as strings, so to display them. The denotation for an expression is hence its printed representation:

    module Sh = struct
      type 'a repr = string
      let int = string_of_int
      let add x y = "(" ^ x ^ " + " ^ y ^ ")"
    end

Interpreting the same Ex2 using Sh, as let module M = Ex2(Sh) in M.res, now gives the string "((1 + 2) + (3 + 4))".

Besides evaluating DSL expressions we may also want to transform them. A simple example is negating the expression. The negation transformation can also be written as an interpreter (after all, interpreting is the only thing we can do with tagless-final expressions).

    module Neg(From:SYM) = struct
      type 'a repr = 'a From.repr
      let int x     = From.int (-x)
      let add e1 e2 = From.add e1 e2
    end

Neg interprets the DSL in terms of another interpreter, From (which we will be calling just F). On the surface of it, Neg is the interpreter transformer. Dually, it can also be regarded as the expression transformer:

    module Ex2Neg(F:SYM) = Ex2(Neg(F))

Ex2Neg has the same type as the original Ex2: given an interpreter SYM it computes the denotation of its DSL expression in that SYM 's domain. That is, Ex2Neg is a tagless-final representation of a DSL expression -- namely, the negated one. It can be interpreted with the existing Ru and Sh interpreters, or even the Neg interpreter (i.e., it can be negated again):

    let module M = Ex2Neg(Ru) in M.res
         -10
    let module M = Ex2Neg(Sh) in M.res
         "((-1 + -2) + (-3 + -4))"
    let module Ex2NegNeg(F:SYM) = Ex2Neg(Neg(F)) in
      let module M = Ex2NegNeg(Ru) in M.res
          10

(The evaluation result is shown, indented, underneath each expression.)

In the following we write more such DSL transformers -- the interpreters that not only evaluate or print an expression but also to re-write it, in generally context-sensitive, apparently non-compositional ways.

Warm-up: applying algebraic laws

Before introducing the optimization framework, let us write a simple standalone optimization, using the standard technique of partial evaluation. Although easy to implement, the optimization is not easy to extend and harder to combine with others. We later re-implement it within the framework and see how the drawbacks are rectified.

The sample optimization is simple: apply the algebraic laws of addition x+0 = 0+x = x to eliminate the additions to zero. We assume for simplicity that all integer literals are non-negative. It is an exercise to the reader to remove that assumption.

The optimization is expressed as an interpretation in the domain of partially-known values, which track and propagate the available static information. In our case, this information is whether the value is statically known to be zero.

    module SuppressZeroPE(F:SYM) = struct
      type 'a repr = {dynamic: 'a F.repr; known_zero : bool}
    
      let int x = {dynamic = F.int x; known_zero = (x = 0)}
      let add e1 e2 = match (e1,e2) with   (* add is not recursive! *)
      | ({known_zero=true},x) -> x
      | (x,{known_zero=true}) -> x
      | ({dynamic=x},{dynamic=y}) -> {dynamic=F.add x y; known_zero=false}
    end

As typical of partially-known values, 'a repr has two components. The dynamic part is the (transformed) expression to be interpreted by the source F interpreter. Since this interpreter is abstract at this point, nothing is known about 'a F.repr. The second component of 'a repr tells if this partially-known value is in fact fully known, as zero. We find this out when interpreting DSL integer literals, which are known literally. The interpretation of the addition propagates this knowledge and applies the algebraic laws in the straightforward way.

Let us take a moment to convince ourselves of the correctness. The inductive structure of interpreters makes structural induction proofs straightforward. First we need a lemma that if e is a value of the type 'a repr and e.known_zero is true, e represents zero. The proof is trivial. The soundness can be stated as the theorem that if E is a tagless-final encoding of an expression and I is an interpreter in which the monoid law of addition holds, then E(I).res and E(SuppressZeroPE(I)).res.dynamic have the same value. We can also prove a stronger property: if e is a value of the type 'a repr, it represents zero if and only if e.known_zero is true. The theorem clearly holds for the base case int x. In the inductive case add e1 e2 we apply the inductive hypothesis to e1 and e2 and the assumption that literals are non-negative. The stronger property entails the completeness of the optimization: if e is a value of the type 'a repr then e.dynamic represents the expression that has no additions to zero. The reader is encouraged to write the detailed proof, perhaps with the help of their favorite proof assistant.

The following example will demonstrate the optimization:

    module Ex3(I:SYM) = struct
      open I
      let res = add (add (int 3) (add (int 5) (int 0)))
                    (add (int 1) (int 0))
    end

Optimizing it and then getting the Sh interpreter to print out the result produces

    let module Ex3NoZerosPE(F:SYM) = Ex3(SuppressZeroPE(F)) in
      let module M = Ex3NoZerosPE(Sh) in M.res
    
         - : int SuppressZeroPE(Sh).repr =
             {SuppressZeroPE(Sh).dynamic = "((3 + 5) + 1)"; known_zero = false}

The optimization has worked: zero summands have disappeared. The OCaml output also shows the problem: what is printed is the partially known value. These values are used during the optimization; at the end we would like just the final result. There should be a way to observe the optimization result and spare the user the internals of the optimizer. The second problem is extensibility: if we extend our DSL with, say, first-class functions, we likewise have to extend SuppressZeroPE to interpret the new expression forms (function abstraction and applications) in the 'a SuppressZeroPE(Sh).repr domain. These new interpretations are simple homomorphisms. They are irrelevant for the zero-addition elimination and are annoying boilerplate. More seriously, they break modularity: every time we extend the language we have to change all optimizations, even those that are not affected by the extension. Our optimization framework overcomes these problems.

Optimization framework

We now introduce the optimization framework for writing modular and composable optimization passes over embedded DSL. When the DSL is extended with forms irrelevant for a given optimization, its code does not have to be changed or even recompiled. We do not need to keep writing boilerplate homomorphic interpretations. Once developed, optimization passes can be assembled into chains. In this section, we build the framework. The next two sections show it off on two sample optimizations.

In the previous section we have seen that tagless-final DSL transformations are written as interpreter transformations. The transformed interpreter uses the suitable representation for DSL expressions, from which the optimization result is extracted, or `observed', at the end. It makes sense therefore to add the observation function to the interpreter signature:

    module type SYMOBS = sig
      include SYM
      type 'a obs
      val observe: 'a repr -> 'a obs
    end

In our Ru and Sh interpreters, the representation type is already suitable for observation and so observe is just the identity. The following functor will add such a trivial observation:

    module AddObs(I:SYM) =  struct
      include I
      type 'a obs = 'a repr
      let observe x = x
    end

Let us recall again the SuppressZeroPE optimizer from the warm-up section: SuppressZeroPE(F) interprets DSL expressions as values of the data type

    type 'a repr = {dynamic: 'a F.repr; known_zero : bool}

that includes 'a F.repr, the interpretation of the expression in some F interpreter (which evaluates the optimized expression). In addition, 'a repr contains extra data used during the optimization. The zero-elimination optimization evaluates the DSL expression and hence builds 'a repr bottom-up. The dynamic part is the optimized F interpretation and is extracted at the end. The following captures the pattern of two expression representations, one with the extra data for the optimization:

    module type Trans = sig
      type 'a from
      type 'a term
      val fwd : 'a from -> 'a term  (* reflection *)
      val bwd : 'a term -> 'a from  (* reification *)
    end

The type 'a from is the representation of a DSL (sub)expression in the From interpreter; one may think of it as the optimization result so far. Since the From interpreter is abstract, nothing is known about 'a from values and they cannot be inspected. The type 'a term represents the same DSL (sub)expression; it is also the optimization result so far. Unlike 'a from, the values of 'a term are known, at least to some extent, and can be inspected. These values contain the working data for the optimization, its state. The optimization process builds the 'a term values bottom up, inspecting those for the already optimized sub-expressions. The next two sections show the examples. The two representations are related, by the operations fwd and bwd. The latter is used in particular at the very end to extract the optimization result from the working data 'a term. The operation fwd builds the working data. The two operations do not generally define bijection: usually fwd is not surjective and bwd is not injective. That is, although the composition bwd . fwd should be the identity, fwd . bwd is typically not. One may think of bwd as an `abstraction' function: it throws away the data used internally during the optimization, the optimization state. Then fwd is a concretization function, which initializes the optimization state, to some default, empty value. The composition fwd . bwd is not the identity because the thrown away optimization state is typically richer than the default state.

The two types and their relationship reminds one of the normalization-by-evaluation (NBE). In NBE, 'a from will be called an opaque object representation; 'a term will then be a meta-language representation, the domain of the (non-standard) evaluation. The function fwd is then reflection and bwd is reification.

To complete the framework we define a default, generic optimizer: given a particular instance of Trans (we reserve the name X for such instances), the optimizer builds the 'a term bottom-up, from which the 'a from value can be extracted, or observed.

    module SYMT(X:Trans)(F:SYMOBS with type 'a repr = 'a X.from) = struct
      open X
      type 'a repr = 'a term
      type 'a obs  = 'a F.obs
    
      let int x   = fwd (F.int x)
      let add x y = fwd (F.add (bwd x) (bwd y))
    
      let observe x = F.observe (bwd x)
    end

The reader may have noticed that this optimizer is the silly identity transformer: the optimization state initialized by fwd is not used for anything and is later thrown away by bwd. Therefore, the SYMT-optimized Ex2 is the same as the original Ex2: to be precise, module M = Ex2(SYMT(X)(I)) in M.observe M.res will always give the same result as module M = Ex2(I) in M.res for any X and interpreter I. Concrete optimizers are built as 'deltas' to SYMT, overriding the interpretations of some DSL forms and actually using the optimization state. Interpretations of the other forms, irrelevant for the optimization, remain homomorphisms. The example is next.

References

Peter Dybjer and Andrzej Filinski: Normalization and Partial Evaluation
G. Barthe et al. (Eds.): Applied Semantics, LNCS 2395, pp. 137-192, 2002.

Zero elimination within the framework

To show off the optimization framework, we re-implement the optimization from the warm-up section that eliminated additions to zero.

Defining an optimization pass within the framework means specifying two things. First is the optimization state, the data that govern the optimization. They are defined by writing an instance of Trans. Next we need to tell how to use this optimization state and actually do the optimization. It is specified as a `partial interpreter': an interpreter for only those DSL forms that are affected by the optimization. We will be calling that second part IDelta. It is convenient to put the two parts inside a functor, since they are both parameterized by the From interpreter, which interprets the optimized expression. In the case of the zero-addition elimination, the optimization pass looks as follows.

    module SuppressZeroPass(F:SYM) = struct
      module X = struct
        type 'a from = 'a F.repr
        type 'a term = 
          | Unk  : 'a from -> 'a term
          | Zero : int term                 (* if the expression is zero *)
        let fwd x = Unk x                   (* generic reflection *)
        let bwd : type a. a term -> a from = function
          | Unk x -> x
          | Zero  -> F.int 0
      end
      open X
      module IDelta = struct
       let int n = if n = 0 then Zero else Unk (F.int n)
       let add : int term -> int term -> int term = fun x y ->
       match (x,y) with
       | (Zero,x) -> x
       | (x,Zero) -> x
       | (Unk x, Unk y) -> Unk (F.add x y)
       end
    end

The type 'a term is the GADT with two variants: Zero for the literal zero and Unk for a DSL expression whose zeroness is statically unknown or forgotten. The GADT hence tells the minimum we need to know to carry out the zero-addition elimination. The operation fwd marks its argument as unknown, and bwd forgets all static knowledge, producing the opaque 'a from value. This is the typical pattern. It should be clear that 'a term, albeit a data type, is not at all the deep DSL embedding (such as the expr data type from the Primer section). For one, 'a term is not recursive. Second, it makes explicit only the bare minimum -- in our case, if the expression is literally zero or not.

The partial interpreter shows how the literal zeroness is determined and used. The match statement in the interpretation of add expresses the rules of optimization clearly and directly. Since our language is so simple, IDelta is in fact the full interpreter. Later on, we extend the DSL with more expression forms; the IDelta however will remain the same.

The complete optimizer is built by combining the optimization pass with the default optimizer SYMT and overriding the default interpretations by those in IDelta. The result is the full, optimizing interpreter:

    module SuppressZero(F:SYMOBS) = struct
      module OptM  = SuppressZeroPass(F)
      include SYMT(OptM.X)(F)
      include OptM.IDelta       (* overriding int and add in SYMT *)
    end

Ex3 from the previous section demonstrates the optimization. The result is shown underneath the expression, indented.

    module SuppressZeroSh = SuppressZero(AddObs(Sh))
    let module M = Ex3(SuppressZeroSh) in SuppressZeroSh.observe M.res
         "((3 + 5) + 1)"

The next section shows a more interesting optimization and its composition with the present zero-elimination pass.

Linearizing additions by re-association

More interesting is the linearization of additions by re-associating them to the right: (add 1 (add 2 (add 3 ...))). In this form, the first argument of any addition is not itself an addition. Such expressions can be compiled to a sequence of simple addition instructions over a single accumulator (think of Z80). We later compose this optimization with the zero-elimination of the previous section.

The implementation follows the established pattern, of defining an instance of Trans with the data type 'a term representing the optimization state, and the partial interpreter IDelta that builds and propagates this state.

    module LinearizeAddPass(F:SYM) = struct
      module X = struct
        type 'a from = 'a F.repr
        type 'a term = 
          | Unk : 'a from -> 'a term
                (* non-empty difference list *)
          | Add : (int from -> int from) * int from -> int term
        let fwd x = Unk x                   (* generic reflect *)
        let bwd : type a. a term -> a from = function
          | Unk x -> x
          | Add (f,n) -> f n
      end
      open X
      module IDelta = struct
       let int n = Add ((fun x -> x),F.int n) 
       let add : int term -> int term -> int term = fun x y ->
       match (x,y) with
       | (Add(fx,nx), Add(fy,ny)) -> 
          Add ((fun nil -> fx (F.add nx (fy nil))),ny)
       | (Unk x, Add(fy,ny)) -> 
          Add ((fun nil -> F.add x (fy nil)),ny)
       | (Add(fx,nx), Unk y) -> 
          Add ((fun nil -> fx (F.add nx nil)),y)
       | _ -> Unk (F.add (bwd x) (bwd y))
     end
    end

Again, some may think that the data type 'a term is the deep DSL embedding. It cannot be: it is not recursive. Further, the addition is represented differently from the AST data type expr of the Primer section: Add (f,n) : int term is not a node in a tree but a non-empty difference list. The 'a term here is different from the 'a term in the previous section. Unlike the typical deep embedding optimizations that traverse the same AST data type, tagless-final optimizations employ whatever representation is the most convenient for them. Different optimizations will likely use different expression representations.

Right-associated additions are isomorphic to a non-empty list of summands. We encode it as a non-empty difference list and build bottom-up. Again the readers are encouraged to prove that the code above really does so. The proof is an easy induction: after all, the structural induction is apparent in the IDelta code, at least from the fact add is not recursive.

Taking Ex3 again as the example, we linearize it and pretty-print the result:

    module LinearizeAdd(F:SYMOBS) = struct
      module OptM  = LinearizeAddPass(F)
      include SYMT(OptM.X)(F)
      include OptM.IDelta
    end
    
    module SLin = LinearizeAdd(AddObs(Sh))
    let module M = Ex3(SLin) in SLin.observe M.res
         "(3 + (5 + (0 + (1 + 0))))"

The result is indeed right-associated, but with zero summands. We have a pass for that, which we tack on:

    module SLinZero = SuppressZero(SLin)
    let module M = Ex3(SLinZero) in SLinZero.observe M.res
         "(3 + (5 + 1))"

Composing the LinearizeAdd optimization with the earlier defined SuppressZero is as simple as the ordinary functional composition of the transformers.

Extending the language

The moment of truth for the modularity of the optimization framework comes when we extend the DSL. This section does exactly that. Our trivial DSL of integer literal additions gains two new forms for creating and applying first-class functions, becoming higher-order. Although the addition operations now have to deal with variables, the old interpreters can be fully reused in extensions. In particular, the two previously developed optimization passes are applicable as they are.

In the tagless-final approach, a DSL is defined by the signature of its interpreters, like the SYM signature from the Primer. The new DSL of higher-order functions and addition expression has the following signature:

    module type SYMHO = sig
      include SYMOBS
      val lam: ('a repr -> 'b repr) -> ('a->'b) repr
      val app: ('a->'b) repr -> ('a repr -> 'b repr)
    end

It reuses the already defined simple DSL of arithmetic expressions, and its observations, adding to it two new expression forms. Here is a sample expression of the extended language, with lambda-abstractions and applications:

    module Ex5(I:SYMHO) = struct
      open I
      let res = observe @@ lam @@ fun x -> 
        add (add x (add (int 5) (int 0)))
            (add
               (add (add (int 1) (int 0)) (int 7))
               (add 
                  (app (lam (fun y->y)) (int 4)) 
                  (int 2)))
    end

The inferred signature

    module Ex5 : functor (I : SYMHO) -> sig val res : (int -> int) I.obs end

tells that our DSL has grown beyond solely int expressions: Ex5 represents an int->int function.

We need an Ru-like interpreter to evaluate it. We do not write it from scratch: rather, we extend the Ru interpreter:

    module RuHO = struct
      include AddObs(Ru)
      let lam f   = f
      let app f x = f x
    end

As the proper extension, it can handle all earlier examples in the original, simple DSL; it also interprets the extended language:

    let module M = Ex2(RuHO) in M.res
         10
    let module M = Ex5(RuHO) in M.res 6
         25

RuHO, like any other tagless-final interpreter, is typed, evaluating an 'a repr DSL expression into the OCaml value of the same type. We did not talk about types before as the DSL had only int expressions. It has grown: Ex5 represents an int->int expression, and is evaluated into the corresponding OCaml function; the latter is then applied to 6. The evaluation result has no tags and other wrappers, justifying the name of the approach.

The ShHO interpreter pretty-prints expressions:

    let module M = Ex5(ShHO) in M.res
         "Lx. (x + (5 + 0)) + (((1 + 0) + 7) + ((Ly. y) 4 + 2))"

(We re-wrote the interpreter with precedence pretty-printing, to omit useless parenthesis).

Let us see how to optimize the extended expressions. First, we need a new default interpreter (which is again obtained by extending SYMT and reusing all of its code):

    module SYMTHO(X:Trans)(F:SYMHO with type 'a repr = 'a X.from) = struct
      open X
      include SYMT(X)(F)
      let app x y = fwd (F.app (bwd x) (bwd y))
      let lam f   = fwd (F.lam (fun x -> bwd (f (fwd x))))
    end

There is no second step. The old optimization passes SuppressZeroPass and LinearizeAddPass can be used as they are:

    module SuppressZeroHO(F:SYMHO) = struct
      module OptM  = SuppressZeroPass(F)
      include SYMTHO(OptM.X)(F)
      include OptM.IDelta
    end
    
    module SuppressZeroHOSh = SuppressZeroHO(ShHO)
    
    let module M = Ex5(SuppressZeroHOSh) in M.res
         "Lx. (x + 5) + ((1 + 7) + ((Ly. y) 4 + 2))"

Adding the next pass, to linearize the additions, is a simple composition, as before:

    module LinearizeAddHO(F:SYMHO) = struct
      module OptM  = LinearizeAddPass(F)
      include SYMTHO(OptM.X)(F)
      include OptM.IDelta
    end
    
    module SLinZeroHO = LinearizeAddHO(SuppressZeroHOSh)
    
    let module M = Ex5(SLinZeroHO) in M.res
          "Lx. x + (5 + (1 + (7 + ((Ly. y) 4 + 2))))"

The additions are all associated to the right, and there are no zeros in sight. Incidentally, Ex5(SLinZeroHO) has the type sig val res : (int -> int) SLinZeroHO.obs end. The result, just like the representation of the original Ex5, is indexed by the same type int->int. The optimization has preserved the type. All tagless-final optimizations have this property by the very construction.

Conclusions

We have seen how to optimize tagless-final expressions, how to compose optimizations, and how tagless-final optimizations compare with the deep embedding. To summarize, tagless-final optimizations are modular and extensible: when new expression forms are added to the language, all previous optimization passes not affected by the new forms apply as they are. The optimization are typed. They are type-preserving by construction. They facilitate correctness proofs by structural induction. They are deeply connected to the normalization-by-evaluation. They are pipelined: once a sub-expression is optimized, it is passed on to the next interpreter/optimizer right away, without waiting for the whole first optimization to finish. All in all, the compositional interpretation -- as fold -- does not get in the way of generally context-sensitive, apparently non-compositional optimizations.

Tagless-final, systematic optimizations in Haskell and OCaml: Tutorial

This tutorial introduces, explains and drills optimizations of domain-specific languages embedded in Haskell or OCaml in the tagless-final style. The running example is deliberately simple and well-known: the DSL of logic circuits. The language is initially first order; it is later extended to the higher-order, to demonstrate modularity and extensibility. The tutorial was given on several occasions, using Haskell and OCaml.

The intended audience are the users, researchers and students of functional programming. Experience with the typed functional programming and only the basic familiarity with Haskell or OCaml are assumed. The tutorial thus should be applicable across the modern functional programming languages. Rather than a lecture, the tutorial is designed to be an interactive session, explaining the technique on a series of progressively more complex examples and discussing the implementation points and problems with the audience.

Just as a Math talk cannot avoid looking at proofs, in a programming tutorial we cannot help but stare at the code. Befitting its informal style, the tutorial was presented by showing, describing, explaining and discussing the code. Below are the code files, in Haskell and OCaml, with many comments, questions, remarks and quizzes.

BasicGates.hs [5K]
basic_gates.ml [9K]
A DSL to model AND/OR/NOT gates connected by wires, and its embedding into Haskell or OCaml. We show sample circuits and several sample interpreters.
NAND.hs [2K]
basic_gates.ml [9K]
Realizing logic circuits in terms of NAND gates
BConstProp.hs [3K]
bconst_prop.ml [3K]
How to transform embedded DSL expressions? The first optimization: simplify gate assemblies by Boolean constant propagation.
BB.hs [3K]
Short overview of Tagless-Final and of the Boehm-Berarducci's encoding
BasicOptNeg.hs [4K]
bneg_adhoc.ml [4K]
Pushing negation all the way towards literals: the first step towards normalizing gate assemblies. This transformation does not at all look like a fold-like interpretation. Here we implement this transformation creatively. Later on, we redo it in a systematic, ``dumb'', mechanical way that no longer requires much creativity. Replacing creativity with mechanical work is one of the main goals of Math and Computer Science.
RR.hs [<1K]
rr.ml [2K]
Capturing the common pattern in writing optimizers: the Reflection-reification pair
B2Neg.hs [3K]
bneg_double.ml [3K]
The second optimization: double-negation elimination. We write a transformation that at first glance does not look like a fold at all
BNegDown.hs [3K]
bneg_down.ml [3K]
Even less fold-like transformer: applying De Morgan laws to push the negation towards the inputs
BNeg.hs [<1K]
bneg.ml [<1K]
Composing the three previous transformations, and repeating them the statically unknown number of times
HGates.hs [3K]
hgates.ml [6K]
Making the language higher-order: adding abstractions and applications. Modeling not just wires and gates but composite circuits
HConstProp.hs [2K]
hgates.ml [6K]
Higher-order ``constant propagation'': extending the partial evaluator to apply statically known functions. Demonstrating how to use the new optimization along with the previously developed ones -- which apply as they are to new, higher-order circuits
cnf.ml [2K]
Conversion to the conjunctive normal form

The tutorial has numerous quizzes. A good exercise is implementing the optimization of applying a statically known function, but only in the case when the function's argument occurs in its body at most once. Further problems to think are: modeling circuits with multiple inputs and outputs; implementing and composing adders (ripple-adders and more advanced ones); writing the optimizations of the sort AND x x -> x; complete Boolean circuit simplifications; converting to DNF; handling sharing.

References

The Haskell version of the tutorial was first presented at the 17th Programming and Programming Languages workshop (PPL 2015), Matsuyama, Japan, on Mar 5, 2015.

slides-PPL.pdf [82K]
A few introductory slides of that Haskell tutorial (in Japanese)

T4: Embedding and optimizing domain-specific languages in the typed final style
The more extensive (3.5hrs) tutorial, presented at CUFP 2015 in Vancouver, BC, Canada on September 3, 2015

Functional Thursday #30 October 1, 2015. Taipei, Taiwan
The recorded lecture at the Functional Thursday Meetup Taipei

International Summer School on Metaprogramming
Robinson College, Cambridge, 8th to 12th August 2016
Three-lecture course using Haskell as the meta-language

Tagless-final optimizations, algebraically and semantically
The follow-up tutorial, stressing semantics of optimizations, their correctness, and calculation rather than re-writing

Modular, composable, typed optimizations in the tagless-final style

Last updated September 3, 2019