Haskell Tutorial for C Programmers - Section III

Written by Eric Etheridge
version 2.0
last updated May 18, 2009

Page-Wise Table of Contents

Section III: Now Let's Really Write Functions

Part I: Did You Take That Break? Here Are Patterns

Hopefully you are now comfortable defining and using functions in Haskell using your choice of Hugs, GHC, or GHCi. Now it's time to talk about all the ways in which functions can be defined. If you are not ready, re-read some of the earlier material and set up one of the programs for using Haskell. The rest of this will only help you if you're trying it for yourself.

Each function has a type. Even functions with no parameters each have a type, i.e. global and local variables. It is not always necessary to write this type, as Haskell compilers can determine it. However, writing it is a good practice and sometimes a necessity. We are about to cover a lot of syntax, so after reading this section, it would be a good idea to read the Tour of the Haskell Syntax linked here:

http://cs.anu.edu.au/Student/comp1100/haskell/tourofsyntax.html

There are a few examples we can go through to make that page clearer. First, a simple function that walks down a list and sums its members:

sumAll :: Num a => [a] -> a
sumAll (x:xs) = x + sumAll xs
sumAll [] = 0

This is a recursive function. It takes a list of 'a' in Num and returns an 'a'. However, there seem to be two definitions for sumAll. And there are. This is how pattern matching works. The two definitions have different specifications for their parameters, and each time sumAll is called, whichever pattern matches the parameters will be evaluated. Let's look at each definition. The second definition is the clearest. '[]' is the empty list, and sumAll of an empty list is defined here as zero. The middle line has something odd about it, though. '(x:xs)' is listed as a parameter, as if we were trying to stick something on the front of a list. In essence we are, because this pattern takes apart its input. There are a few patterns which do this, and this feature of Haskell makes lists very easy to use. To restate, when '(x:xs)' is written as a parameter in a function definition, it will only match lists which have an item stuck on the front. In other words, it will match lists with at least one member. The choice of variable names 'x' and 'xs' is completely arbitrary, but since 'x' will be 'bound' to the first member of the list and 'xs' will be 'bound' to the remainder of the list, it is somewhat natural to write one 'x', and then the remaining 'xs' when describing the list.

When the pattern 'sumAll (x:xs)' matches the input, that is, when 'sumAll' is called with a list that has at least one value, the first definition will be evaluated. This will return the result of adding said first value to the result of calling sumAll on the rest of the list. Patterns are checked top-to-bottom, so when 'sumAll (x:xs)' fails to match, 'sumAll []' is checked. Since the only pattern that could fail to match 'sumAll (x:xs)' is an empty list, 'sumAll []' will definitely match, and it returns zero. This is the end condition for the recursion.

This sort of function is very common in Haskell. Pattern matching lets us avoid complicated 'switch' statements or 'if' blocks in favor of simply writing separate definitions for distinct inputs. This allows code to be clear and concise. Patterns can also be more specific. The fib example can be rewritten as follows:

fibPat :: (Num a, Num b) => a -> b
fibPat n = fibGenPat 0 1 n
	
fibGenPat :: (Num a, Num b) => b -> b -> a -> b
fibGenPat a _ 0 = a
fibGenPat a b n = fibGenPat b (a + b) (n - 1)

Again, the names are changed to avoid conflicts in the examples, adding 'Pat'. Here a literal ('0') is used to match a pattern, and that is fine, too. Also note the underscore ('_') in the first definition. The underscore matches anything, just like a variable name, but does not get 'bound' to its parameter. This can make code clearer when a parameter is not used. In this case, we do not care about the second parameter, since we are only matching against the third and returning the first.

Part II: After Patterns, Guards

Sometimes functions need more complicated work to choose between definitions. This can be done with guards, shown here:

showTime :: Int -> Int -> String
showTime hours minutes
	| hours == 0	= "12" ++ ":" ++ showMin ++ " am"
	| hours <= 11	= (show hours) ++ ":" ++ showMin ++ " am"
	| hours == 12	= (show hours) ++ ":" ++ showMin ++ " pm"
	| otherwise 	= (show (hours - 12)) ++ ":" ++ showMin ++ " pm"
	where
	showMin
		| minutes < 10	= "0" ++ show minutes
		| otherwise		= show minutes

Ignore the lines that start with 'where' for the moment. 'showTime' has only one definition clause, but it is broken up into three guards. Each guard has a boolean expression, and they are checked in order. The first which is found to evaluate to True will have its corresponding expression evaluated and returned. The function will be equal to that expression for the case that that guard is true. 'otherwise' is equal to True, and will therefore always be accepted if it is reached. 'otherwise' is a convenience, not a necessity. '++' is the list concatenation operator. It is used here for the following reason:

String :: [Char]

So all of that makes sense except for the 'where' stuff. Here I threw in something extra, and used a 'where' clause to define a local function, in this case a variable called 'showMin'. 'showMin' is a variable in the traditional sense, but it is also a function here in Haskell, so instead of an 'if' statement or 'case' statement, I used guards again to describe its two definitions.

In all, this function takes an hour (hopefully from 0 to 23) and minutes (hopefully from 0 to 59) and prints the time from them. 'showMin' is a local variable/function, defined in the where clause. Guards are used both to define 'showTime' and 'showMin'.

It is important to note that the variables defined in 'where' clauses and their cousins, 'let' clauses, are only in scope for the pattern in which they exist. So a function defined with multiple patterns can't use a 'where' clause to define a local variable across all of them.

Part III: 'If'

I mentioned 'if' statements just above, and Haskell has them, but they are always if-then-else. Haskell doesn't have much use for a function that doesn't return some kind of value, so 'if-then' doesn't work. That said, here's a simple example:

showMsg :: Int -> String
showMsg n = if n < 70 then "failing" else "passing"

Not much to it. Since 'showMsg' has a return type of String, the values in both the 'then' clause and the 'else' clause have to also be of that type. 'if' does not need to be the entire definition of a function. For instance:

showLen :: [a] -> String
showLen lst = (show (theLen)) ++ (if theLen == 1 then " item" else " items")
	where
	theLen = length lst

Part IV: Indention Syntax

You may have noticed that I use indention to indicate lines that are part of a block of source code. This is not simply good style, but it is also part of Haskell's syntax. Indentation denotes structure. Specifically, changing the indentation from one line to the next indicates that a block of code has begun or ended. Also, Haskell will not let you place the definitions for two functions one line after another. Instead it demands a blank line to indicate that the definition of a function has truly finished. This all enforces good style, and it greatly reduces the amount of junk syntax in the language, such as '{', '}', and ';'.

Part V: And Lambda Functions

Given a programmer's ability to define functions locally in Haskell, you might ask, "Is there an even more brief way of defining functions?" Unlike this tutorial, Haskell does let you finish quickly:

(\x y -> x * 4 + y) :: Num a => a -> a -> a

What is this you ask? It all starts with the '(\' right at the beginning. '\', you see, looks a little like the greek letter lambda, which happens to be the symbol used for Haskell itself:

http://www.haskell.org/

'Lambda calculus' is a branch of mathematics that deals with functions created on the fly. There's a lot more to it, such as its place in computation theory, etc. You can find out more about it other places. The point is that when you write '(\', you have started a 'lambda expression', which generally look a lot like the one above. It has '(\', then some variable names (usually short ones), a '->' arrow, and an expression which uses those variables. And of course a ')'. I'm sure you can figure out what this does. It defines a function and uses it right where it is defined. For example:

Prelude> map (\x -> "the " ++ show x) [1, 2, 3, 4, 5]
["the 1","the 2","the 3","the 4","the 5"]
Prelude>

I mention lambda functions because they come in handy, but also because they are used often in the more complicated examples. Of course, lambda functions cannot be recursive, since they would need a name in order to refer to themselves. So they are good for those cases where a function is needed only once, usually to define another function.

Part VI: Polymorphic Types and Type Constructors

The simplest Haskell program is the following:

main = return ()

The variable 'main' is a reserved word, and whatever its value is, the program executes. Here is the obligatory "Hello World" program:

main = putStrLn "Hello World"

And the type of 'main':

main :: IO ()

This is another way of saying "main is of type something or other". Well, that's how it looks anyway. What we have here are examples of two strange things at once. That happens a lot when you're learning Haskell, and that's why I'm writing such a long tutorial in the first place. '()' is a type. The only value of type '()' is written '()'. Another word for this type and its singular value is 'null'. So this type can be read "main is of type IO null". But why is it read that way? What does it mean to put one type after another? IO is not a type. The word 'IO' is only part of a type. 'IO a' is a type. 'IO' is a type constructor which takes a type as a parameter. Deep breath.

This is not the same thing as a class. When we were talking about classes, I said that functions were polymorphic, meaning that they could operate on values of any type provided that the proper functions were defined for it. You can create a type and make it a member of Num, as long as it has '+', '-', and/or '*' defined for it, as well as having equality defined. If you do that correctly, any function which has 'Num a =>' at the beginning of its type will accept your new type and everything will work fine. But 'IO' isn't a class, or a polymorphic function. This is something stranger. It is a polymorphic type.

Did that make sense? A type that takes another type as a parameter? Let's look at an example from the standard libraries:

data Maybe a = Nothing | Just a

This type can be found here:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Maybe.html

This is a 'data' type definition. The values on the right are separated by '|', the pipe symbol, which can be read here as "or". This type says, "a value of type 'Maybe a' can be 'Nothing', or can be 'Just a'", that is 'Just' followed by a value of type 'a'. Here's an example using Maybe in pattern matching:

showPet :: Maybe (String, Int, String) -> String
showPet Nothing		= "none"
showPet (Just (name, age, species))	= "a " ++ species ++ " named " ++ name ++ ", aged " ++ (show age)

'showPet' has two patterns. The first matches a value of 'Nothing', the first 'data constructor' for Maybe. There are no variables after 'Nothing', just as there are no types listed after 'Nothing' and before the '|' in the type definition for 'Maybe a'. The second matches a value of 'Just', the second 'data constructor' for Maybe. 'Just' does have a tuple after it, just like in the type definition, and parentheses are used to group these two things in the pattern. The words 'Just' and 'Nothing' are arbitrarily chosen, although good choices. It is important that the first letter is capitalized. As you may have noticed, throughout Haskell, variables have lowercase first letters and types have uppercase first letters. This is a part of Haskell's syntax. Data constructors are not variables, and so the convention is extended to require their first letters to also be capitalized.

Part VII: The IO Monad

So let's get back to 'main' and its type, 'IO ()'. IO is a polymorphic type. But what is it? Unfortunately, I cannot show you its definition. 'IO a' is a part of the Haskell standard, and most of its implementation is 'under the hood'. Its implementation is also very large, and involves a lot of low-level code. 'IO' is a member of the class Monad, which means (very briefly) that it lets you write sequential-looking code using the 'do' notation which I'll show in minute. 'IO' is also short for Input/Output of course, and that means that the 'under the hood' functions of IO let you read and write the world state. So 'IO' is how Haskell interacts with the real world. Again:

main :: IO ()

Monads are odd things, and a function of type 'IO a' will perform an 'IO' operation and return a value of type 'a'. In this case, main will return a value of type '()', namely '()', or 'null'. So main is always "an IO operation that returns null". Here is short main function, with types. Note that it is never necessary to specify the type of main.

someFunc :: Int -> Int -> [Int]
someFunc .........

main = do
	putStr "prompt 1"
	a <- getLine
	putStr "prompt 2"
	b <- getLine
	putStrLn (show (someFunc (read a) (read b)))

Here's what's going on: The IO Monad binds blah blah blah that's all in the other tutorials and yet you're reading this. Let's try again.

Here's what's going on: All of that 'variable equal at all times' stuff I mentioned in previous sections doesn't work with I/O. After all, you have to call 'getLine' several times, and it doesn't always return the same value. You can't say that the function 'getLine' is "equal" to anything, since its value may be different every time it is referenced. This is in contrast to a value, like '4', which will always be the number 4. So IO is hard to do if you want a language that has true "equality" when you write an assignment. This has actually been the biggest stumbling block for functional language design since the idea of functional languages arose. Haskell has a solution.

Most functional languages break their 'functional model' to handle IO, but of course Haskell does something weirder. There's an obscure branch of mathematics, namely monads, that concerns state transformation functions. The authors of Haskell used it to let you write mathematical functions that denote changes in the world's state without breaking that nice equality stuff. [How's that for simple? See how I made it all make sense by appealing to a higher authority? Haskell works by magic! -Eric]

Briefly, the IO monad takes the state of the world, alters it with some function, and passes it forward to be the input for another IO function. In the example, the functions 'putStr', 'getLine', and 'putStrLn' are members of IO. 'Binding' them to 'main' by using the 'do' notation means that when 'main' is evaluated, they will be evaluated in order, just like you'd expect. Thus the main function of the program above will put a prompt on a screen and read in a string, and then do it again, then print something. The '(read x)' function takes a string and returns a number of whatever type is needed, assuming the string parses. The 'show' function will take the result from 'someFunc' and turn it into a string so that putStrLn can display it. I will return to this code and make it useful later.

The description of IO may sound like 'imperative' behavior, and it is, with a twist. Input and output need to happen in a certain sequence in a program, but a mathematical function's parts can be determined in any order as long as it eventually finishes. Sequence doesn't matter for the evaluation of a mathematical definition, and most Haskell code is written in that manner. So what about input and output? Can they be mathematically described? As it turns out, they can. Monad theory means that a function can be "equal" to the act of transforming a state. The mathematical framework for monads captures the idea of sequenced actions, and that let Haskell's creators give 'IO a' a type that could be used just like any other type. When a sequence of monad functions are evaluated, for instance when 'main' is evaluated, the monad functions are applied in sequence to the initial state. Before that evaluation, Haskell gets to treat functions of type 'IO a' just like anything else, i.e. as expressions waiting to be evaluated. That means that the 'IO a' type doesn't have to be a special, builtin part of Haskell, even though input and output have to be part of the standard libraries and implemented with system calls. It does mean that doing IO in Haskell means using a polymorphic type as well as a mathematical theory that is rather obtuse. That's why IO is so far down in this tutorial.

All Haskell functions carry around this mathematical framework ("equality") and unless otherwise noted, they are lazy. This means that the only thing that forces evaluation is binding an 'IO monad' function to 'main'. Nothing is ever evaluated unless it is going to be the return value of an evaluated 'monad' function, or unless it is needed to compute such a return value. That the 'IO monad' forces evaluation isn't really important, but it will explain some odd behavior if you start doing "out there" coding in Haskell. There are eager, or 'strict' functions in Haskell which, when evaluated, will first fully evaluate their parameters. These are usually marked in the documentation.

It is possible to use the IO monad, illustrated above, to write imperative-style programs in Haskell. That is a direct result of the behavior of the IO monad. Doing so would be inappropriate, because Haskell has much more power than that. I say that because I have not seen any other tutorial do so, and I think it is important.

If you write programs that call outside libraries, you'll deal with the IO monad a lot. Everything that deals with the rest of the computer is part of the IO monad, such as driver calls, network libraries, file access, threading, and system calls. There are other monads in Haskell's libraries, and you can also write your own. Writing your own monad for a major project will probably be the other hard thing you need to do to fully understand Haskell, after understanding 'foldr' and variants. It's pretty hard, but not because it's complicated. Writing your own monad is hard because there's so little to do that you'll have to work to understand why that little bit of code is all you need. This tutorial ends with an extended example that demonstrates how to write a monad from scratch, but it may not be necessary for a general Haskell program to learn that.

Part VIII: Dissecting the IO Example

Let's return to basic IO monad usage. Here's the example main program again:

someFunc :: Int -> Int -> [Int]
someFunc .........

main = do
	putStr "prompt 1"
	a <- getLine
	putStr "prompt 2"
	b <- getLine
	putStrLn (show (someFunc (read a) (read b)))

This is a general framework for learning Haskell. Aside from altering the number of parameters to 'someFunc' (perhaps zero), this is all you will really need for a main function until you feel comfortable with Haskell. It is good enough for most simple tasks, and you can use it to test out ideas in GHC by replacing the definition of 'someFunc' with whatever function you're trying to write. You won't need it for working in Hugs or GHCi, but you will if you compile with GHC. In Hugs and GHCi, you only need to make a source code file that includes someFunc's definition.

What I said earlier about the indentation technique removing extraneous syntax isn't quite true. '{', '}', and ';' do exist in Haskell. They are an alternative to the the whitespace-defining notation used here, and are sometimes but rarely preferable. The whitespace method is very simple, and the example shows most of its syntax. Blocks begin on tabbed or spaced lines and further indention is used for separate blocks. Using 'do' signals to the compiler to expect indentation based on whitespace unless you then use '{', '}', and ';'.

The '(read x)' items use the 'read' function found here:

Prelude: the 'read' function

'someFunc' is whatever you want to test. Since its return value is a parameter to 'show', 'someFunc' can be defined with a variety of return types, such as 'Int', '[Int]', 'String', or even '(String, [Int])'. The type of 'show' is given here:

Prelude: the 'show' function

'show' is a class method defined for members of the class 'Show'. This is just like how '+' and '*' are class methods defined for members of 'Num'. You can see those here:

Prelude, Section: the Num class

The types of the IO functions, specifically 'putStr', 'getLine', and 'putStrLn', are given here:

System.IO, Section: Special cases for standard input and output

and also here, in the Prelude, which is why they are in scope normally:

Prelude, Section: Simple I/O operations

As you can see from the documentation, when putStrLn is applied to a String, you get a value of type 'IO ()'. The 'null' means that no useful information is returned by the function. It altered the state of the IO monad by putting something on the screen, and no value comes back from it.

The '<-' arrow is used to bind the result of an IO operation to a variable. 'getLine' has a type of 'IO String'. The 'do' notation says that a monad function like 'getLine' can be prefaced by a variable name and the left arrow. That variable name comes into scope at that point in the function and when the monad function is evaluated, and its return value will be assigned to the variable. If a function is not prefaced by a left arrow and a variable, then its return value is ignored, although whatever that function did to the state carried by the monad still happens. For instance, if you do not bind the return value of 'getLine' to a variable, you don't store the line the user typed in, but it is still read in and buffers are messed with, etc., meaning that another getLine won't return it.

There is one exception to this 'ignoring' of return values not bound to variables. It is no accident that the last line in the sequence has the same return type as main. When using monads to compose sequences of operations, the last line in a function must have the same IO type as the function itself. When this is not natural, use 'return' with an appropriate value:

getName1 :: IO String
getName1 = do
	putStr "Please enter your name: "
	name <- getLine
	putStrLn "Thank you.  Please wait."
	return name

'putStrLn' has a type String -> IO (), but IO () doesn't match the type of 'getName', which is IO String. 'return name' is used to end the function with a function of the proper type, and to return the data we wanted to return. Without the reply message, this function can be written much more succintly:

getName2 :: IO String
getName2 = do
	putStr "Please enter your name: "
	getLine

As you can see, calling an IO function which returns a value is the same as binding that value to a variable and then returning that value.

This whole monad issue looks imperative, and in some ways, it is. Once you call 'someFunc', you get away from all that, and Haskell's laziness and equality become the norm. Is all of this necessary? Is it a good idea to have imperative-looking code calling lazy, functional code? In my opinion, it can be. You get to specify the order of execution for those operations that require it (such as IO), and you get powerful functional tools the rest of the time. You also get a hurt head. So take a break. I need one from writing all of this. The next section will be on advanced type declarations.