Simply-typed encodings: PCF considered as unexpectedly expressive programming language

We present unexpected macro-expressivity results: implementations of pairs, sums, lists, trees, objects, generally recursive data types, imperative programs in PCF: simply-typed lambda calculus with integers and fixpoint. These implementations are as `macros' (functions), requiring no global program transformations. They maintain type soundness, introducing no partiality beyond what may be inherent in emulated facilities. In fact, many realizations do not require fixpoint and hence hold already in System T. The implementations introduce no exponential blowup.

Introduction
Unions
Pairs
Sums
Sequences: lists and arrays
Trees
Heaps, closures, and imperative programs
Objects
General (with contra-variant occurrences) recursive types

Introduction

Simply-typed lambda calculus is simple and logical in its semantics, which helps in verification and assured program development. It is also computationally deficient: the pure calculus (without constants) cannot even represent integer predecessor. Needless to say, the calculus does not express lists, trees, let alone more complicated streams, objects, etc.

The common way of raising expressiveness is complicating types. System F, the polymorphic calculus, can represent integer arithmetic -- and via the Boehm-Berarducci (or Goedel) encoding, inductive data structures such as lists and trees. System F is strongly normalizing, maintaining connections with logic -- but its semantics is quite more complicated. Also, Boehm-Berarducci (let alone Goedel) encodings are inherently inefficient. One may also complicate the type system by adding recursive types, which lets us embed untyped lambda calculus and represent anything, and efficiently -- at the cost of even more complicated semantics.

A different direction is to maintain simple types but add the basic integer data type and successor, predecessor, branching-on-zero, fixpoint as primitives. The result is called PCF. Although its functions are generally partial, it does have a rather straightforward semantics: the famous Scott's LCF, among others. PCF is an extension of the famous Goedel System T, which admitted only primitive recursion on integers. PCF is computationally complete (e.g., it is a model of Kleene's partial recursive functions) and hence can, in principle, represent anything computable. One has to distinguish however an encoding of a program via a global transformation (continuations-passing transformation or an encoding into a Turing machine or partial recursive function) from a local encoding of operations as macros/procedures, in terms of PCF primitives. The present page concerns the latter: macro-expressiveness. It is not to be taken for granted in PCF and is in fact rather unexpected. Our encodings are type sound and do not introduce partiality beyond what is inherent in emulated operations (like taking the head of an empty list). They are also reasonably efficient, with no exponential blow-up. Many of our expressivity results hold already in System T.

PCF is a real programming language: it is a subset of a typed functional programming language such as OCaml. All encodings described on the present page have been implemented in OCaml and tested: see the accompanying complete code. We also use the OCaml notation to present the PCF code on this page.

The trigger for developing the PCF encodings were much appreciated discussions with Naoki Kobayashi in September 2011, in the context of program verification via Higher-Order Recursion Schemes (HORS). Then used version of HORS could not cope with polymorphic and recursive types; hence simply-typed encodings were sought. Many of the presented encodings were developed at that time.

References

Gordon Plotkin: LCF considered as a programming language
Theoretical Computer Science, 1977, 5(3), 223–255.

Beyond Church encoding: Boehm-Berarducci isomorphism of algebraic data types and polymorphic lambda-terms

Unions

Our starting point are non-discriminated unions: the ones that should be familiar from C. Like in C, they are used to pass around values of several types as if they all had the same type.

Given two types t1 and t2, their union, to be denoted as t1 <+> t2, is the type that is `big enough' for both t1 and t2. That is, there must exist total functions

    inj_t1t2 : t1 -> (t1 <+> t2)         prj_t1t2 : (t1 <+> t2) -> t1
    inj_t2t1 : t2 -> (t1 <+> t2)         prj_t2t1 : (t1 <+> t2) -> t2

such that prj_t1t2 . inj_t1t2 === id and prj_t2t1 . inj_t2t1 === id. Since the union is symmetric: t1 <+> t2 = t2 <+> t1, we shall be concerned only with one inj/prj pair. Below we demonstrate that for any two PCF types t1 and t2 there exists the PCF type t1 <+> t2 and the total inj_t1t2 and prj_t1t2 functions: the embedding/projection pair. No fixpoint is required: the encoding hence works in a strongly-normalizing simply-typed lambda calculus with integers (System T).

Recall that all types in PCF have the following form (in BNF)

    t ::= int | t -> t

The type t1 <+> t2 is also of this form. The simplest case is when t1 and t2 are the same. Their union is this type then: t <+> t = t (quite in the spirit of C), and inj_tt and prj_tt are the identity functions.

As the next case consider t1 is int and t2 is int->int. Obviously, int->int functions accommodate an integer, as the constant function. Thus

    type int <+> (int->int)  =  (int->int)
    let inj : int -> (int->int) = fun x -> fun _ -> x
    let prj : (int->int) -> int = fun f -> f 0

Clearly, inj and prj are total and their composition prj . inj is the identity. Generalizing:

    type int <+> (ta -> tb)  =  ta -> (int <+> tb)
    let inj : int -> (ta -> (int <+> tb)) = fun x -> fun _ -> inj_int_tb x
    let prj : (ta -> (int <+> tb)) -> int = fun f -> prj_int_tb (f ta_in) 
    
    type (ta -> tb) <+> int  =  ta -> (tb <+> int)
    let inj : (ta -> tb) -> (ta -> (tb <+> int)) = fun f -> inj_tb_int . f
    let prj : (ta -> (tb <+> int)) -> (ta -> tb) = fun f -> prj_tb_int . f

By ta_in above we mean a value of the type ta -- an arbitrary value. Recall, in PCF, all types are inhabitable: after all, PCF types are either integer, or functions accepting some arguments and returning an integer, inhabited by constant functions among others. Therefore, ta_in always exists.

The next inductive case is also straightforward:

    type (t -> t1) <+> (t -> t2)  =  t -> (t1 <+> t2)
    let inj : (t -> t1) -> (t -> (t1 <+> t2)) = fun f -> inj_t1_t2 . f
    let prj : (t -> (t1 <+> t2)) -> (t -> t1) = fun f -> prj_t1_t2 . f

The totality and prj-inj inverse are obvious, by induction.

The remaining, and the most difficult case (subsuming the previous one) is the union of two functions that take arguments of generally different types.

    type (ta -> t1) <+> (tb -> t2)  =  (ta <+> tb) -> (t1 <+> t2)
    let inj : (ta -> t1) -> ((ta <+> tb) -> (t1 <+> t2)) = fun f -> inj_t1_t2 . f . prj_ta_tb
    let prj : ((ta <+> tb) -> (t1 <+> t2)) -> (ta -> t1) = fun f -> prj_t1_t2 . f . inj_ta_tb

Again, the properties of inj and prj are inductively obvious.

In OCaml, the union interface can be represented as a signature

    module type union = sig
      type t1
      type t2
      type union
      val inj1 : t1 -> union
      val prj1 : union -> t1
      val inj2 : t2 -> union
      val prj2 : union -> t2
    end

The code contains the implementation, following the inductive rules above. The next section shows an example.

Version

The current version is November 2021

References

systemT-datatype.ml [5K]
The complete OCaml source code with tests

Pairs

The just presented encoding of non-discriminated unions induces the encoding on pairs. It requires no fixpoint and holds in System T.

Before getting to the encoding, we have to discuss what the result must be. What is a pair? One answer is the facility with the following interface:

    type ('a,'b) pair   (* abstract *)
    val pair : 'a -> 'b -> ('a,'b) pair
    val fst : ('a,'b) pair -> 'a
    val snd : ('a,'b) pair -> 'b

However, PCF has no polymorphism and type variables and no way to express even the types in that interface. Still, the above definition of pairs is not wrong, if we understand 'a and 'b as schematic variables and the interface as a schema (cf. axiom schemas in first-order logic). Upon such reading, the interface does not prescribe the single function pair that takes two arguments of arbitrary types. What is prescribed is that for any two specific types t1 and t2, there must be a type tpair and (total) functions pair_t1t2 : t1 -> t2 -> tpair, fst_t1t2 : tpair -> t1, snd : tpair -> t2 satisfying the obvious properties

    fst_t1t2 (pair_t1t2 x y) = x          snd_t1t2 (pair_t1t2 x y) = y

for any x of type t1 and y of type t2.

It may take a moment to realize that since the type of pair_t1t2 : t1 -> t2 -> tpair is specific to the particular types t1 and t2, so may be the term pair_t1t2 itself. In OCaml, Haskell, etc., pairs are polymorphic: there exists one single term (pairing function) that builds pairs of all possible types, uniformly. Such polymorphic pairs is just one implementation choice. The other choice is type-specific pairing: the representation and construction of a pair of t1 and t2 is not uniform but specific to the types t1 and t2. (The most well-known type-specific operation is equality: each type has its own way of comparing its values.)

Thus the interface of PCF pairs is expressed by the following OCaml signature:

    module type pairs = sig
      type t1
      type t2
      type tpair
      val pair : t1 -> t2 -> tpair
      val fst : tpair -> t1
      val snd : tpair -> t2
    end

(Not stated explicitly but important are the requirements of being total and satisfying the fst-pair and snd-pair equational laws above.) We demonstrate that an implementation of this interface exists for any PCF types t1 and t2.

The implementation is trivial. Recall the encoding of pairs in untyped lambda calculus:

    let pair x y = fun t -> if t = 0 then x else y

It won't work exactly like that in the typed setting because x and y may have different types. That is where the union comes in:

    let pair : t1 -> t2 -> (int -> (t1 <+> t2)) = fun x y ->
        fun t -> if t = 0 then inj_t1_t2 x else inj_t2_t1 y
    let fst : (int -> (t1 <+> t2)) -> t1 = fun p -> prj_t1_t2 (p 0)
    let snd : (int -> (t1 <+> t2)) -> t2 = fun p -> prj_t2_t1 (p 1)

Since the union t1 <+> t2 exists for any PCF types t1 and t2, so is the pairing. Thus defined pair, fst and snd are clearly total and satisfy the expected equational laws. In the following, we write the type int -> (t1 <+> t2) as t1 * t2.

In OCaml, the construction can be realized as a functor

    module Pair(U:union) = struct
      type t1 = U.t1
      type t2 = U.t2
      type tpair = int -> U.union
      let pair : t1 -> t2 -> tpair = fun x y -> 
        fun t -> if t = 0 then U.inj1 x else U.inj2 y
      let fst : tpair -> t1 = fun p -> p 0 |> U.prj1
      let snd : tpair -> t2 = fun p -> p 1 |> U.prj2
    end

Here is an example. First we build the union implementations for two int types

    module U_same_int = USame(struct type t = int end)

and the types int and int->int:

    module Ui_ii = Uint(U_same_int)(struct type ta = int let inhabitant = 0 end)

The pair of int and int->int is then

    module Pairi_ii = Pair(Ui_ii)

that is used as expected: Pairi_ii.(pair 0 succ |> fst) gives 0, and Pairi_ii.(pair 0 succ |> snd) is the successor, which can be verified by checking that Pairi_ii.(pair 0 succ |> snd) 10 evaluates to 11. The accompanying code has more examples.

Version

The current version is November 2021

References

systemT-datatype.ml [5K]
The complete OCaml source code with tests

Sums

Sums are discriminated unions. To make the non-discriminated union (described earlier) into discriminated we pair it with the discriminator -- quite like discriminated (tagged) unions in C. Like unions and pairs, sums are thus implementable already in System T.

The interface of sums is expressed by the following OCaml signature

    module type sums = sig
      type t1
      type t2
      type tsum
      val inl : t1 -> tsum
      val inr : t2 -> tsum
      val is_left  : tsum -> bool
      val is_right : tsum -> bool
      val left  : tsum -> t1
      val right : tsum -> t2
    end

(In PCF, a boolean is represented as an integer: 0 is false and non-zero is true -- like in C.) These functions must satisfy the equational laws

    is_left  (inl x) === true     left  (inl x) === x
    is_right (inr x) === true     right (inr x) === x     is_left === not . is_right

Strictly speaking, left and right operations do not have to be total. They are in our implementation: left (inr x) does not crash and does return some value (which is not specified however). The implementation is straightforward: pairing the union with the discriminator: 0 for left and 1 for the right alternative:

    module Sum(U:union)(P:pairs with type t1=int and type t2=U.union) = struct
      type t1 = U.t1
      type t2 = U.t2
      type tsum = P.tpair
      let inl : t1 -> tsum = fun x -> U.inj1 x |> P.pair 0
      let inr : t2 -> tsum = fun x -> U.inj2 x |> P.pair 1
      let is_left  : tsum -> bool = fun x -> P.fst x = 0
      let is_right : tsum -> bool = fun x -> P.fst x = 1
      let left  : tsum -> t1 = fun x -> P.snd x |> U.prj1
      let right : tsum -> t2 = fun x -> P.snd x |> U.prj2
    end

It is just as clear that the required properties of these operations are satisfied.

A particular instance of sums are option types. They can be realized simply as

    type toption  = int * t
    let none : toption      = pair 0 t_in
    let some : t -> toption = pair 1

where t_in is an arbitrary value of type t, which always exists as discussed earlier.

Version

The current version is November 2021

References

systemT-datatype.ml [5K]
The complete OCaml source code with tests

Simply-typed encodings: PCF considered as unexpectedly expressive programming language

Introduction

Unions

Pairs

Sums

Last updated December 3, 2021