The common way of raising expressiveness is complicating types. System F, the polymorphic calculus, can represent integer arithmetic -- and via the Boehm-Berarducci (or Goedel) encoding, inductive data structures such as lists and trees. System F is strongly normalizing, maintaining connections with logic -- but its semantics is quite more complicated. Also, Boehm-Berarducci (let alone Goedel) encodings are inherently inefficient. One may also complicate the type system by adding recursive types, which lets us embed untyped lambda calculus and represent anything, and efficiently -- at the cost of even more complicated semantics.
A different direction is to maintain simple types but add the basic integer data type and successor, predecessor, branching-on-zero, fixpoint as primitives. The result is called PCF. Although its functions are generally partial, it does have a rather straightforward semantics: the famous Scott's LCF, among others. PCF is an extension of the famous Goedel System T, which admitted only primitive recursion on integers. PCF is computationally complete (e.g., it is a model of Kleene's partial recursive functions) and hence can, in principle, represent anything computable. One has to distinguish however an encoding of a program via a global transformation (continuations-passing transformation or an encoding into a Turing machine or partial recursive function) from a local encoding of operations as macros/procedures, in terms of PCF primitives. The present page concerns the latter: macro-expressiveness. It is not to be taken for granted in PCF and is in fact rather unexpected. Our encodings are type sound and do not introduce partiality beyond what is inherent in emulated operations (like taking the head of an empty list). They are also reasonably efficient, with no exponential blow-up. Many of our expressivity results hold already in System T.
PCF is a real programming language: it is a subset of a typed functional programming language such as OCaml. All encodings described on the present page have been implemented in OCaml and tested: see the accompanying complete code. We also use the OCaml notation to present the PCF code on this page.
The trigger for developing the PCF encodings were much appreciated discussions with Naoki Kobayashi in September 2011, in the context of program verification via Higher-Order Recursion Schemes (HORS). Then used version of HORS could not cope with polymorphic and recursive types; hence simply-typed encodings were sought. Many of the presented encodings were developed at that time.
Given two types t1
and t2
, their union, to be denoted as t1 <+> t2
, is the type that is `big enough' for both t1
and t2
. That is,
there must exist total functions
inj_t1t2 : t1 -> (t1 <+> t2) prj_t1t2 : (t1 <+> t2) -> t1 inj_t2t1 : t2 -> (t1 <+> t2) prj_t2t1 : (t1 <+> t2) -> t2such that
prj_t1t2 . inj_t1t2 === id
and
prj_t2t1 . inj_t2t1 === id
. Since the union is symmetric: t1 <+> t2
= t2 <+> t1
, we shall be concerned only with one inj/prj
pair. Below we demonstrate that for any two PCF types t1
and t2
there exists the PCF type t1 <+> t2
and the total inj_t1t2
and
prj_t1t2
functions: the embedding/projection pair.
No fixpoint is required: the encoding hence
works in a strongly-normalizing simply-typed lambda calculus with
integers (System T).
Recall that all types in PCF have the following form (in BNF)
t ::= int | t -> tThe type
t1 <+> t2
is also of this form. The simplest case is when
t1
and t2
are the same. Their union is this type then:
t <+> t = t
(quite in the spirit of C), and inj_tt
and prj_tt
are the identity functions.
As the next case consider t1
is int
and t2
is
int->int
. Obviously, int->int
functions accommodate an
integer, as the constant function. Thus
type int <+> (int->int) = (int->int) let inj : int -> (int->int) = fun x -> fun _ -> x let prj : (int->int) -> int = fun f -> f 0Clearly,
inj
and prj
are total and their composition prj . inj
is the identity. Generalizing:
type int <+> (ta -> tb) = ta -> (int <+> tb) let inj : int -> (ta -> (int <+> tb)) = fun x -> fun _ -> inj_int_tb x let prj : (ta -> (int <+> tb)) -> int = fun f -> prj_int_tb (f ta_in) type (ta -> tb) <+> int = ta -> (tb <+> int) let inj : (ta -> tb) -> (ta -> (tb <+> int)) = fun f -> inj_tb_int . f let prj : (ta -> (tb <+> int)) -> (ta -> tb) = fun f -> prj_tb_int . fBy
ta_in
above we mean a value of the type ta
-- an arbitrary
value. Recall, in PCF, all types are inhabitable: after all, PCF types
are either integer, or functions accepting some arguments and
returning an integer, inhabited by constant functions among
others. Therefore, ta_in
always exists.
The next inductive case is also straightforward:
type (t -> t1) <+> (t -> t2) = t -> (t1 <+> t2) let inj : (t -> t1) -> (t -> (t1 <+> t2)) = fun f -> inj_t1_t2 . f let prj : (t -> (t1 <+> t2)) -> (t -> t1) = fun f -> prj_t1_t2 . fThe totality and prj-inj inverse are obvious, by induction.
The remaining, and the most difficult case (subsuming the previous one) is the union of two functions that take arguments of generally different types.
type (ta -> t1) <+> (tb -> t2) = (ta <+> tb) -> (t1 <+> t2) let inj : (ta -> t1) -> ((ta <+> tb) -> (t1 <+> t2)) = fun f -> inj_t1_t2 . f . prj_ta_tb let prj : ((ta <+> tb) -> (t1 <+> t2)) -> (ta -> t1) = fun f -> prj_t1_t2 . f . inj_ta_tbAgain, the properties of
inj
and prj
are inductively obvious.
In OCaml, the union interface can be represented as a signature
module type union = sig type t1 type t2 type union val inj1 : t1 -> union val prj1 : union -> t1 val inj2 : t2 -> union val prj2 : union -> t2 end
The code contains the implementation, following the inductive rules above. The next section shows an example.
Before getting to the encoding, we have to discuss what the result must be. What is a pair? One answer is the facility with the following interface:
type ('a,'b) pair (* abstract *) val pair : 'a -> 'b -> ('a,'b) pair val fst : ('a,'b) pair -> 'a val snd : ('a,'b) pair -> 'bHowever, PCF has no polymorphism and type variables and no way to express even the types in that interface. Still, the above definition of pairs is not wrong, if we understand
'a
and 'b
as
schematic variables and the interface as a schema (cf. axiom
schemas in first-order logic). Upon such reading, the interface does
not prescribe the single function pair
that takes two arguments of
arbitrary types. What is prescribed is that for any two specific types
t1
and t2
, there must be a type tpair
and (total) functions
pair_t1t2 : t1 -> t2 -> tpair
, fst_t1t2 : tpair -> t1
,
snd : tpair -> t2
satisfying the obvious properties
fst_t1t2 (pair_t1t2 x y) = x snd_t1t2 (pair_t1t2 x y) = yfor any
x
of type t1
and y
of type t2
.
It may take a moment to realize that since the type of
pair_t1t2 : t1 -> t2 -> tpair
is specific to the particular types
t1
and t2
, so may be the term pair_t1t2
itself. In OCaml,
Haskell, etc., pairs are polymorphic: there exists one
single term (pairing function) that builds pairs of all possible
types, uniformly.
Such polymorphic pairs is just one implementation choice. The
other choice is type-specific pairing: the representation and
construction of a pair of t1
and t2
is not uniform but specific to
the types t1
and t2
. (The most well-known type-specific operation
is equality: each type has its own way of comparing its values.)
Thus the interface of PCF pairs is expressed by the following OCaml signature:
module type pairs = sig type t1 type t2 type tpair val pair : t1 -> t2 -> tpair val fst : tpair -> t1 val snd : tpair -> t2 end(Not stated explicitly but important are the requirements of being total and satisfying the fst-pair and snd-pair equational laws above.) We demonstrate that an implementation of this interface exists for any PCF types
t1
and t2
.
The implementation is trivial. Recall the encoding of pairs in untyped lambda calculus:
let pair x y = fun t -> if t = 0 then x else yIt won't work exactly like that in the typed setting because
x
and
y
may have different types. That is where the union comes in:
let pair : t1 -> t2 -> (int -> (t1 <+> t2)) = fun x y -> fun t -> if t = 0 then inj_t1_t2 x else inj_t2_t1 y let fst : (int -> (t1 <+> t2)) -> t1 = fun p -> prj_t1_t2 (p 0) let snd : (int -> (t1 <+> t2)) -> t2 = fun p -> prj_t2_t1 (p 1)Since the union
t1 <+> t2
exists for any PCF types t1
and t2
, so
is the pairing. Thus defined pair
, fst
and snd
are clearly total
and satisfy the expected equational laws. In the following, we write
the type int -> (t1 <+> t2)
as t1 * t2
.
In OCaml, the construction can be realized as a functor
module Pair(U:union) = struct type t1 = U.t1 type t2 = U.t2 type tpair = int -> U.union let pair : t1 -> t2 -> tpair = fun x y -> fun t -> if t = 0 then U.inj1 x else U.inj2 y let fst : tpair -> t1 = fun p -> p 0 |> U.prj1 let snd : tpair -> t2 = fun p -> p 1 |> U.prj2 end
Here is an example. First we build the union
implementations for two
int
types
module U_same_int = USame(struct type t = int end)and the types
int
and int->int
:
module Ui_ii = Uint(U_same_int)(struct type ta = int let inhabitant = 0 end)The pair of
int
and int->int
is then
module Pairi_ii = Pair(Ui_ii)that is used as expected:
Pairi_ii.(pair 0 succ |> fst)
gives
0
, and Pairi_ii.(pair 0 succ |> snd)
is the successor, which can
be verified by checking that Pairi_ii.(pair 0 succ |> snd) 10
evaluates to 11
. The accompanying code has more examples.
The interface of sums is expressed by the following OCaml signature
module type sums = sig type t1 type t2 type tsum val inl : t1 -> tsum val inr : t2 -> tsum val is_left : tsum -> bool val is_right : tsum -> bool val left : tsum -> t1 val right : tsum -> t2 end(In PCF, a boolean is represented as an integer: 0 is
false
and
non-zero is true
-- like in C.) These functions must satisfy the
equational laws
is_left (inl x) === true left (inl x) === x is_right (inr x) === true right (inr x) === x is_left === not . is_rightStrictly speaking,
left
and right
operations do not have to be
total. They are in our implementation: left (inr x)
does not crash
and does return some value (which is not specified however). The
implementation is straightforward: pairing the union with the
discriminator: 0 for left and 1 for the right alternative:
module Sum(U:union)(P:pairs with type t1=int and type t2=U.union) = struct type t1 = U.t1 type t2 = U.t2 type tsum = P.tpair let inl : t1 -> tsum = fun x -> U.inj1 x |> P.pair 0 let inr : t2 -> tsum = fun x -> U.inj2 x |> P.pair 1 let is_left : tsum -> bool = fun x -> P.fst x = 0 let is_right : tsum -> bool = fun x -> P.fst x = 1 let left : tsum -> t1 = fun x -> P.snd x |> U.prj1 let right : tsum -> t2 = fun x -> P.snd x |> U.prj2 endIt is just as clear that the required properties of these operations are satisfied.
A particular instance of sums are option types. They can be realized simply as
type toption = int * t let none : toption = pair 0 t_in let some : t -> toption = pair 1where
t_in
is an arbitrary value of type t
, which always
exists as discussed earlier.