The generators in our story come from the formula yield = exceptions + non-determinism. We will mostly use the Haskell
implementation because of more revealing types;
porting to OCaml or other languages is straightforward. The basic
generator interface defines one (abstract) type GenT e m a
and two functions, for yielding and iterating -- along with the observation
function.
-- Basic interface
type GenT e m a -- abstract
instance Monad m => Monad (GenT e m)
instance MonadTrans (GenT e)
type LoopBody e m = e -> m ()
yield :: Monad m => e -> GenT e m ()
for_loop :: (MonadTrans t, Monad m, Monad (t m)) => GenT e (t m) () -> LoopBody e (t m) -> t m ()
observe :: (MonadTrans t, Monad m) => t m a -> m a
Gen e m a is the type of computations that yield
values of the type e in the monad m, eventually returning the
value of the type a. The particular implementation of generators used here
requires the monad m be MonadPlus. Likewise the monad transformer t
that appears in the signatures of for_loop and observe must be LogicT. Other implementations of the basic generator interface
lift these requirements.
The primitive generator yield (along with the fact that Gen e m is a monad)
lets us write other generators. In this article, we will use two.
One is the generator that enumerates values within a given range.
Compared to the familiar Enum a => enumFromTo :: a -> a -> [a],
the generator does not create any list, even lazily.
Rather, it yields elements as they are being enumerated:
enumFromToG :: (MonadPlus m, Enum a) => a -> a -> GenT a m ()
enumFromToG xbeg xend | fromEnum xbeg <= fromEnum xend =
yield xbeg >> enumFromToG (succ xbeg) xend
enumFromToG xbeg xend = return ()
The second sample generator yields labels from an
in-order traversal of a tree with integer-labeled leaves. We have
seen this generator many times in an earlier page on generators. type Label = Int
in_order2 :: (MonadIO m, MonadPlus m) => Tree -> GenT Label m ()
To perform the iteration, we connect a generator with the consumer of the generated values, that is, with a loop body. The sample loop:
sl1 = for_loop (enumFromToG 'a' 'e') $ \i -> lift . print $ icorresponds to the for-loop in Pascal
for i:='a' to 'e' do writeln(i)The second sample loop
for_loop (in_order2 (make_full_tree 3)) $ \i -> lift . print $ icannot be idiomatically expressed in Pascal. It can be written however, roughly in the same form, in CLU, which had generators. To execute the overall iterative computation we have to
observe it,
in a base monad such as IO.
The basic interface is an abstraction for simple generators, such as those
in CLU, which suffice for many cases. Parallel loops however need
more powerful generators, described by the advanced
interface below. The basic interface was the abstraction of the advanced one.
The advanced interface exposes the implementation of generators as
yield = exceptions + non-determinism.
-- Advanced interface
type GenT e m a = MonadPlus m => ErrorT e m a -- now the concrete type
yield :: MonadPlus m => e -> GenT e m ()
yield x = raise x `mplus` return ()
for_loop :: (LogicT t, MonadPlus (t m), Monad m) => GenT e (t m) () -> LoopBody e (t m) -> t m ()
for_loop gen body = iter Nothing $ catchError' gen >>= body
catchError' :: MonadPlus m => GenT e m () -> m e
iter:: (LogicT t, MonadPlus (t m), Monad m) => Maybe Int -> t m () -> t m ()
msplit :: (LogicT t, MonadPlus (t m), Monad m) =>
t m a -> t m (Maybe (a, t m a))
Since generators combine exceptions (the ErrorT monad) and non-determinism
(the LogicT monad), they inherit the operations of those monads.
Specifically, catchError', an instance of catch, obtains the value
yielded by a generator. The combinator iter, a particular case of
bagOfN, takes a non-deterministic computation t m () and
executes all of its choices, one-by-one. If the first argument
of iter is Just n, only the first n choices are executed and the rest
are discarded. Effectively, iter `runs' a non-deterministic computation,
actualizing all its non-deterministic choices. For example, desugaring
the first sample loop gives the following advanced code: iter Nothing $ do
i <- catchError' $ enumFromToG 'a' 'e'
lift . print $ i
The loop body non-deterministically obtains the value yielded by the
generator, and prints it. The iter Nothing line -- the loop header, so
to speak -- makes the body run through all choices, that is,
all generated values.Generators: yield = exceptions + non-determinism
The detailed description of the generators used on this page
GenLoop.hs [5K]
The complete code for this article
nl1 = observe $
for_loop (in_order2 (make_full_tree 3)) $ \i ->
for_loop (enumFromToG 'a' 'e') $ \j ->
lift . print $ (i,j)
The print-out confirms -- although we have never doubted -- that for each value
of i, j ranges over all its possible values.
Since the inner j-loop is within the i-loop body, the variable i
scopes over the entire inner loop, including its generator. That is,
the inner loop generator may refer to the outer-loop variable, making
loop dependence obvious:
nl2 = observe $
for_loop (in_order2 (make_full_tree 3)) $ \i ->
for_loop (enumFromToG i 3) $ \j ->
lift . print $ (i,j)
The binding scope of a loop variable matches the scope of iteration
of the corresponding loop.
The story picks up as we desugar the nested loop nl1 and rearrange the code
using equational reasoning. We obtain a different way of writing
nested loops, now using the advanced interface:
nl1' = observe $
iter Nothing $ do -- loop header
i <- catchError' (in_order2 (make_full_tree 3))
j <- catchError' (enumFromToG 'a' 'e')
lift . print $ (i,j) -- loop body
which we will write in the abbreviated form as nl1' = observe $
iter Nothing $ do
i <- e1
j <- e2
body
The binding and control scopes are a bit harder to see now.
Recall that the Haskell notation do
i <- e1
j <- e2
body
is a syntactic sugar fore1 >>= (\i -> e2 >>= (\j -> body))exposing that
i scopes over body and over e2 -- as was the
case in the original nl1. The repeated do-bindings corresponds to
what other languages call `nested let' or `repeated let': let i = e1 in
let j = e2 in
body
The control scope also becomes clear if we look at the evaluation context
of the first generator e1:
do
i <- []
j <- e2
body
The context includes the generator e2. The non-deterministic
expression e1 replicates its context; it plugs the first yielded
value into the first context replica, the second yielded value into
the second replica, etc. Each replica has the copy of e2, the generator
for the j-loop, which runs anew in each replicated copy. The body is
thus executed at each point in the Cartesian product of the domain of i and the domain of j. The loop nl1' indeed has the same behavior
as the original nl1. The binding scope of a loop variable and the
control scope -- the evaluation context of the generator --
still match.
Desugaring of nl2 re-iterates the point that the second generator
is within the binding scope of the loop variable i associated
with the first generator.
nl2' = observe $
iter Nothing $ do -- loop header
i <- catchError' (in_order2 (make_full_tree 3))
j <- catchError' (enumFromToG i 3)
lift . print $ (i,j) -- loop body
The binding scope is lexical and hence is easier to see than the
control scope. Since the two scopes match, we can tell which generator
holds control over which by examining the binding scopes of their
loop variables.
Although the desugared form of for_loop is less pretty to look at,
it is more insightful. There is only one binding form for loop
variable: do-binding, which is just the lambda-binding. Lambda
provides binding and generators provide iteration. The binding scope
of a loop variable matches the control scope of the iteration.
let- or do-bindings
with generators as binding expressions correspond to nested loops.
Many programming languages offer so-called parallel bindings,
such as the following OCaml form:let i = e1 and j = e2 in bodyAlthough
i still scopes over body it no longer scopes
over e2. If e2 contains an occurrence of i, it would
remain free. The following two expressions should help see the difference
in binding scope: let i = 1 in
let i = i + 1 in let j = i + i in (i,j)
let i = 1 in
let i = i + 1 and j = i + i in (i,j)
The former, with all nested lets, evaluates to (2,4); the latter
evaluates to (2,2).
The bindings are called parallel because e1 and e2 no longer have
any data dependency and may be evaluated in any order: e2 can be
evaluated without waiting for the value to be bound to i.
The expressions e1 and e2 may even be evaluated in parallel, and the
program should not -- ought not to -- notice.
Our story comes to a decisive point. Can we realize parallel bindings,
first in Haskell, to maintain not only data independence of e1 and e2 but also their control independence? Can we use parallel bindings
to implement parallel loops?
Parallel let such as let i = e1 and j = e2 in body
may be regarded, in the first approximation, as a syntactic sugar for (fun (i,j) -> body) (e1,e2). The Haskell implementation starts off easy: pair e1 e2 >>= \ (i,j) -> body. What remains is to determine the zip-like combinator to pair-up two generators. The naive implementation:
seq2G :: Monad m => m t -> m t1 -> m (t, t1)
seq2G e1 e2 = do
i <- e1
j <- e2
return (i,j)
alas is not satisfactory: e2 is in the evaluation context of e1, and
hence control-dependent on e1. Somehow we have to exclude e1 and e2 from the evaluation contexts of each other, we have to isolate
non-deterministic effects of e1 and e2 and prevent them from
affecting each other.
Fortunately, LogicT lets us run a non-deterministic computation
in an isolated `sand-box'.
msplit :: (LogicT t, MonadPlus (t m), Monad m) =>
t m a -> t m (Maybe (a, t m a))
That is, msplit e1 runs e1 in the sandbox and reports the outcome: Nothing if e1 fails or Just (x,e1') if e1 produced x;
the computation e1' represents the remaining non-deterministic choices.
Technically speaking, msplitreifies the non-determinism effect
as a Maybe (a, t m a) value. This sandbox isolation is exactly what is
needed for the parallel combinator: par2G :: (LogicT t, MonadPlus (t m), Monad m) =>
t m t1 -> t m t2 -> t m (t1, t2)
par2G e1 e2 = do
iR <- msplit e1
jR <- msplit e2
case (iR,jR) of
(Just (i,e1'), Just (j,e2')) -> return (i,j) `mplus` par2G e1' e2'
_ -> mzero
We determine the first choice of e1 and, separately, the first choice
of e2, and pair them up as the first choice of the par2G e1 e2.The story concludes with the parallel loop
pl1' = observe $
iter Nothing $ do -- loop header
(i,j) <- par2G (catchError' (in_order2 (make_full_tree 3)))
(catchError' (enumFromToG 'a' 'e'))
lift . print $ (i,j) -- loop body
Lexically, the two generator expressions are not in scope of either
loop variable binding. The print-out shows the two generators indeed
run in parallel. Neither of the two generators is in the control
scope of the other. Binding and control scopes match again.
Our pairing combinator par2G does not report which of the two generators
finished first. We describe a more general combinator below.
The Haskell and OCaml generators derived on this page effectively
run on the spaghetti stack, one may say, which they inherit from
the implementation of non-determinism. The Haskell and OCaml
generators hence can run side-by-side. Recall that
our generators can be represented as -- reified into -- a lazy stream
(see msplit). Writing a zipWith-like function to process two
streams in parallel is straightforward then.
We demonstrate the parallel composition of Haskell and OCaml generators on the famous ``same-fringe problem'', determining if two binary trees have the same fringe, that is, yield the same sequence of labels when traversed in a particular order. Crucially, we must stop the traversal as soon as the mismatch is found, returning it. To make clear the early traversal termination, we print the labels as we traverse the tree. The printing operation makes the traversal effectful and hence precludes lazy lists.
We solve the problem by defining the parallel loop -- the
zipWith-like combinator for streams. We show the Haskell code first.
To let the zipping function deal with the streams of unequal size, we
make its argument types to be Maybe a, with Nothing
signifying stream's end:
zipWithL :: (Monad m, LogicT t, MonadPlus (t m)) =>
(Maybe a -> Maybe b -> c) -> t m a -> t m b -> t m c
zipWithL f m1 m2 = do
r1 <- msplit m1
r2 <- msplit m2
case (r1,r2) of
(Nothing, Nothing) -> return $ f Nothing Nothing
(Just (v1,_),Nothing) -> return $ f (Just v1) Nothing
(Nothing,Just (v2,_)) -> return $ f Nothing (Just v2)
(Just (v1,t1),Just (v2,t2)) -> return (f (Just v1) (Just v2))
`mplus` zipWithL f t1 t2
The function same_fringe below takes two generator expressions and
runs them side-by-side, generating the stream of mismatches.
The one-element stream of (Nothing,Nothing) denotes the complete match. same_fringe :: (Monad m, LogicT t, MonadPlus (t m), Eq a) =>
ErrorT a (t m) () -> ErrorT a (t m) () -> t m (Maybe a, Maybe a)
same_fringe m1 m2 = zipWithL (,) (catchError' m1) (catchError' m2) >>= check
where
check r@(Nothing,_) = return r
check r@(_,Nothing) = return r
check r@(Just x,Just y) | x /= y = return r
check _ = mzero
The following function compares the in-order sequences of labels of
two trees, returning the first mismatch. sfringe_test :: MonadIO m => Tree -> Tree -> m (Maybe Label, Maybe Label)
sfringe_test t1 t2 = observe $ same_fringe (in_order2 t1) (in_order2 t2)
The generator in_order2, described earlier on this page,
prints the trace of the traversal. The trace reveals that the two
trees are indeed traversed side-by-side; the traversal stops
as soon as the first mismatch is found. We may also print, say,
the first three mismatches.The OCaml implementation is the same. We define a parallel loop over two generators:
let for_loop_2 p gen1 gen2 body : unit =
let rec loop = function
| (Nil,Nil) -> body None None
| (Cons (v1,_),Nil) -> body (Some v1) None
| (Nil,Cons (v2,_)) -> body None (Some v2)
| (Cons (v1,t1), Cons (v2,t2)) ->
body (Some v1) (Some v2); loop (t1(),t2())
in loop (msplit p gen1, msplit p gen2)
employing it to write the generator for mismatches of two other generators
gen1 and gen2: let pmismatch : ('a option * 'a option) stream Delimcc.prompt = new_prompt ()
let same_fringe p gen1 gen2 =
for_loop_2 p gen1 gen2 (fun x y ->
match (x,y) with
| (None,_) | (_,None) as r -> yield pmismatch r
| (Some x,Some y) as r when not (x = y) -> yield pmismatch r
| _ -> ())
We may run same_fringe as any other generator, to obtain the first
mismatch or the first n mismatches.test_gen.ml [7K]
The complete code for the OCaml examples.
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML