Haskell Tutorial for C Programmers - Section VI

Written by Eric Etheridge
version 2.0
last updated May 18, 2009

Page-Wise Table of Contents

Section VI: Extended Examples

Part I: Intro to Examples

This section covers the extended examples I have added to this tutorial. The entire set of examples can be downloaded as a .zip or .tar.bz2 archive.

Again, here are the source files and text for all examples for this tutorial, including all those in the sections and the large examples at the end, zipped using bzip2: bzip2 of sources, 28K, and zipped as a zip: zip of sources, 43K.

Lists of the relevant files are at the beginning of each part. As a reminder, for the examples in the sections, see the following:

ExampleSectionsCode.hs
ExampleSectionsTry.txt

The examples use concise notation where the final appearance is clear, but this does not mean that it will be immediately clear to Haskell novices. Haskell encourages lengthy statements that can become difficult to comprehend, and I have attempted to break these up into smaller components that can be easily identfied and labeled.

There are two extra pieces of syntax in the examples that I have not already covered in this tutorial. The first is the '$' operator. In some cases, I use the '$' operator to avoid deeply nested parentheses.

sumListedv1, sumListedv2 :: Num a => [a] -> [Int] -> a
sumListedv1 nums indices = sum (map (nums !!) indices)

sumListedv2 nums indices = sum $ map (nums !!) indices

'$' is a do-nothing operator; it's only point is to alter the precedence of the expressions on its left and its right. It takes two values as parameters and applies the first the the second. This is useless, mathematically, but it can let a programmer avoid parenthesizing lengthy expressions. I learned it from using HOpenGL, since many of the state altering function takes several lines of parameters.

The second piece of syntax is the effect of backquotes, '`', which are below '~' on many keyboards. When placed around a function which accepts two parameters, backquotes make that function act like an operator.

printIsMemberv1, printIsMemberv2 :: (Eq a, Show a) => [a] -> a -> IO ()
printIsMemberv1 items item = if elem item items
	then putStrLn $ (show item) ++ " is a member of the list."
	else putStrLn $ (show item) ++ " is NOT a member of the list."

printIsMemberv2 items item = if item `elem` items
	then putStrLn $ (show item) ++ " is a member of the list."
	else putStrLn $ (show item) ++ " is NOT a member of the list."

'elem' takes two parameters, a single item and a list of the same kind and returns True if the single item is present in the list, otherwise False. In the second definition ('printIsMemberv2'), backquotes are use to use `elem` like an operator between its two parameters.

I could have written my examples without these extra syntax niceties, but that would make them less like real code. If you've come this far, I believe that you can handle it.

The examples aren't in order of importance. Skip around if you like, or skim the source code for style hints. Some of the examples involve complicated topics, but they are hopefully documented well enough that passing familiarity is sufficient.

Part II: Calculating Pi

'pi' is a builtin constant in Haskell, as it is in many languages. This example demonstrates some Haskell techniques by calculating pi in various ways. The easiest way to use this example is to load the source into ghci and notice how long it takes each value 'pi1', 'pi2', and 'pi3' to evaluate the first time. The files:

ExamplePiCalc.hs

In Haskell, 'pi' is defined in the 'Float' class here:

Prelude: the 'Float' class

Much of the math was taken from this page:

Wikipedia's article "Computing Pi"

The values generated by the code differ slightly from the most accurate value possible. In two cases, the last digit is slightly off because the math used involves rounding the last digit. In the case of 'pi1', this is because greater accuracy is unattainable using that method without excessive computation.

Once one of the constants has been evaluated, it will not be reevaluated if requested a second time. You can see this in ghci by noting that 'pi1' takes several seconds or more to print the first time, but virtually no time after that. This is an example of lazy evaluation applied to constants.

Part III: MergeSort Variations

This example provides the functions 'mergeSort1', 'mergeSort2', and 'mergeSort3' as well as some simple test lists 'list1', 'list2', and 'list3'. The easiest way to use this example is to load the source into ghci and try the different sort functions on various lists. Also try the function 'sort' in Data.List for comparison. The files:

ExampleMergeSort.hs

This is not practical code, since the default 'sort' function is efficient for lists, but the variants illustrate a lot of the basics of Haskell. The default sort function can be reached in source code by putting "import Data.List" at the top and in GHCi and Hugs by using ":m Data.List". You have to load a module before a file in GHCi.

For those unfamiliar with merge sort, the basic idea is that you sort two halves of a list separately, then do a simple merge operation the makes one sorted list from the two sorted halves. How do you sort the halves? By running merge sort on them, too. This splitting continues until you get to a manageable number of items, in this case down to zero or one. Then you merge back together.

The versions of merge sort here do not vary that behavior. What is changed is some optimizations that try to avoid as much recalculation as possible. To split a list in half, you need its length. The first and simplest variant calculates that for every call, which arguably uses some redundant computation.

The second variant tries to avoid this redundancy by passing around the length of the list and dividing it by 2 at ever call. This would result in some slightly lopsided merges, but no real loss of efficiency there. The gains would probably outweigh the loss.

The third variant goes overboard and tries an optimization that is likely worse than the second variant. This is an example of a good idea which needs close examination. The third variant forms two halves by placing individual items alternately into two lists. This avoids recalculating the length with each call and avoids passing an integer and doing math with each call. However, about as much work is necessary to split the list in this manner as in the other methods, possibly more.

Part IV: Regular Expressions and Finite Automata: Overview

As I said before, Haskell's power only shines when it is used for larger projects. This makes it difficult to demonstrate Haskell's ability without writing code on topics that will be unfamiliar for some students. The smaller examples above and in the other sections of the tutorial are aimed at broad audiences, but such good examples are not necessarily available for more complex demonstrations.

At the risk of alienating some readers, I have written a complex example covering regular expressions, finite automata, Haskell's module system, and writing your own monads. This is a big example and it covers a lot of ground. The files involved are:

ExREtoFAconvertDFA.hs
ExREtoFAconvertNFA.hs
ExREtoFAconvertRE.hs
ExREtoFAconvertREv1.hs
ExREtoFAconvertREv2.hs
ExREtoFAconvertREv2monad.hs
ExREtoFAconvertREv3.hs
ExREtoFAconvertREv3monad.hs
ExREtoFAexampleTests.txt
ExREtoFAexecution.hs
ExREtoFAmain.hs
ExREtoFAtypes.hs

The code in this example takes a string and parses it according to a subset of the POSIX standard for regular expressions (REs), converting it into a structure suited for testing input. That structure is then converted to a non-deterministic finite automata (NFA) which will accept and reject equivalent inputs. Finally, that NFA is converted to a deterministic finite automata (DFA), although no great effort is made to minimize the number of states.

If you don't know what either regular expressions or finite automata are, these are covered in detail in any undergraduate computer science education. If you are currently in school, you will get detailed instruction on them, probably soon. If not, these are topics you can learn at your leisure. I do not yet know if this example can be understood without a complete knowledge of the material. Please send me feedback about that.

Whether or not you are familiar with these concepts, you can test the code with the examples in ExREtoFAexampleTests.txt. First you give a regular expression in POSIX notation as an input, then you repeatedly give strings to test against it. Use Ctrl-C to quit.

The statements are printed as soon as each operation is done, so if a statement fails to appear, that operation went into an infinite loop. This occurs in some cases with the first implementation of running an NFA and is difficult to prevent in any implementation of running a regular expression directly. For that reason, and because DFAs can be executed more quickly, in practice REs (and NFAs) are converted to DFAs before use. My implementation uses a counter to test for deeply recursing execution so I can return a value indicating failure.

Part V: Regular Expressions and Finite Automata: Types

The 'types.hs' file for this example provides the types used in the rest of the files. In addition, the module provides functions that validate members of those types and functions which compute useful information from members of those types. This is the only module which does not include any other modules from the example. Files:

ExREtoFAtypes.hs

The import statements in this file (and the rest) do not bring into scope everything from the standard modules they import. Instead, they only include the functions and types used in the code. Doing this helps track what functions are used where and prevents accidental clashes or similarities between names.

The comments in this file also include a summary of the POSIX rules implemented.

Part VI: Regular Expressions and Finite Automata: String to RE, and Monads

The base file for this part of the example includes and re-exports three functions. There is one for each version implemented of the method to convert a string to the POSIX regular expression that it represents. Files:

ExREtoFAconvertRE.hs
ExREtoFAconvertREv1.hs
ExREtoFAconvertREv2.hs
ExREtoFAconvertREv2monad.hs
ExREtoFAconvertREv3.hs
ExREtoFAconvertREv3monad.hs

The first method, implemented in 'convertREv1.hs', is a set of functions which work on the input string. They pass it around along with the pieces of regular expressions under construction. There is nothing wrong with this method, and in fact I only added the other methods as a final touch to the example. However, there is a lot of potential for mistakes when writing the code, because the precisely correct string must be passed forward, simulating using up the right amount of the input.

This 'passing forward' nature made the conversion suitable for reimplementation as a Monad. Versions 2 and 3 of this conversion show how you can make your own monad. You create the type, not an intuitive task when you need to carry a state. You then write the instance declaration, the accessor functions, and functions which compose those accessors.

Why use monads for this task? Note the types of the recursive functions:

convertSREouter1 :: String -> MyRegExp -> Maybe (MyRegExp, String)
convertSquareBlock1 :: String -> Maybe (MyRegExpItem, String)
convertNumberBlock1 :: String -> Maybe (Int, Int, String)

In each case, the inputs are returned, modified. This is a textbook case for creating a monad, or would be if any book existed with that hint. Using a monad lets us separate our changes to state, passed forward to the next function called, from our return values, passed upward to the calling function.

Since a String is passed forward in every case, that will be the monad's state, its internal value. The rest of the data will be normal return values. The functions have almost exactly the same structure, but the code is clearer.

The instance was very hard for me to write. There are two issues, the first being that the syntax for Instance declarations is tricky to find. The second is that, even armed with the type of (>>=), the functions are difficult to write. Let's analyze them. Here's what GHCi says about (>>=):

Prelude> :t (>>=)
(>>=) :: (Monad m) => m a -> (a -> m b) -> m b

At least we can tell that it has two parameters, one of type 'm a' and one of type '(a -> m b)'. The immediate question, if you can understand the types, is, "Why is the monad type passed a type variable as if it is a type constructor?" It seems like, even if you define a type that doesn't take a variable, if you define it as an instance of Monad, it has to take a type variable.

The truth is that a monad must take a type variable. This variable type is used to pass back results.

What is not mentioned anywhere is how to handle passing a state forward. A monad data type does not contain a state. If you use a state, your monad type must instead be a function which takes a state and then returns it. In this way, the function describes how to transform a state. For instance, the following code does not work:

newtype REmaker2 a = RE2 (String, a)

Where things fail is that you will find no way for the bind operation (>>=) to take the existing state as an input. This is not fixed by altering the bind method or the function to be bound, but only by altering the type of the monad itself so that a later instance takes the new state as a parameter. This is not intuitive for me and not something that I can do without the examples I listed.

You also have to wrap up your type for a Monad in a 'newtype' declaration (or a 'data' declaration) so that it works in an Instance declaration. Haskell will not let you define an instance for a type other than those.

When I was working with this example on monads, I used a lot of help from here:

http://www.engr.mun.ca/~theo/Misc/haskell_and_monads.htm

The third version is very similar to the second, but show how some extra access functions can make code simpler. Both files make use of a separate file to store their types. The export list for those sub-modules specifically fails to export the constructor for the monad type. This forces the functions outside that sub-module to only modify and use the monad using the exported accessor and run functions. This is done to make the data type abstract and keep code clean.

It is worth noting that even with the data type in a separate file, the code is longer when implemented using a Monad. The syntax of monads is slightly more verbose, and can create some awkwardness in smaller projects such as this. As a project scales in size, the gains increase when using monads in cases for which they are appropriate. Many of the gains come from lack of errors in carrying state forward, which is most of the point in the first place.

Part VII: Regular Expressions and Finite Automata: Conversion and Execution

The remaining conversion files for this example implement some standard algorithms for converting REs to NFAs and NFAs to DFAs. The code is complex and shows a lot of the power of Haskell. The majority of the source code is comments rather than code. In addition, the code was written as clearly as possible rather than focusing on brevity. Hopefully, this will help the reader follow along. Files:

ExREtoFAconvertDFA.hs
ExREtoFAconvertNFA.hs

The execution file for this example uses multiple methods to test input strings against the resulting structure from each point in the conversion process. The key distinctions are that some methods track the set of reachable states simultaneously, accepting when any reachable state would accept, and other methods make a separate recursive call for all branches. The branching method is prone to infinite loops. The fact that REs and NFAs branch, while the end result, DFAs, do not branch, illustrates why this conversion is usually performed. Files:

ExREtoFAexecution.hs

In the main program for this example, these execution methods are ordered and displayed. Extensive use is made of basic operations in the IO monad, and this file can serve as an example for that as well, particularly the need to flush buffers after writes which do not end in a newline. Files:

ExREtoFAmain.hs

Part VIII: Solving 2CNFs in Linear Time

The final example, currently, is an abbreviated demonstration of list comprehensions and the 'data' data type declaration. The program solves boolean equations in conjunctive normal form with two variables each, also called 2CNF. This is widely known as a special case of the NP-Hard CNF problem. The files:

Example2CNF.hs

The source code for this example indicates several ways to expand on it.

Part IX: In Closing

There are some extremely complex graph algorithms that I would like to provide as examples. For now, I'm done here. Thanks for reading all of this.

Eric Etheridge
[email protected]