Haskell Tutorial for C Programmers - Section I

Written by Eric Etheridge
version 2.0
last updated May 18, 2009

Page-Wise Table of Contents

Section I: What the Heck is Going On?

Part I: Haskell's Oddity

To begin, Haskell has no update operator. If that sentence did not make sense, then please keep reading, because this tutorial was written with you in mind. By 'update operator', I mean that the following does not happen in normal Haskell:

int a
a := 4
print a
a := 5
print a

> 4
> 5

The above programming style, i.e. 'making a variable, putting data in it, using it, then replacing the data in it, using it again' does not happen in normal Haskell. Those of you who have used LISP or Scheme will be familiar with this concept, but I am sure that the rest of you are probably baffled. Here is how Haskell works, again in pseudo-code:

print a
	
int a
a = 5
	
> 5

int a
a = 5
	
print a
	
> 5

The order of these actions does not matter. There is also a reason that the first example used ':=' and the second example used '='. In 'imperative' languages, storing data is an operation, and it happens in a sequence. In 'funtional' languages like Haskell, the equal sign means an exact definition. In other words, each variable is equal to its value not only after the assignment statement is reached in sequence, but in fact at all points during execution.

Some of you may be saying, "That's nice, Eric, but what good is a language where everything is hardcoded? Wouldn't I have to define every variable with its correct value as I coded? Isn't 'computing' values the whole point of a 'computer'?" And you would be right; knowing results ahead of time would make computing weird. The 'redeeming' feature of Haskell is that you don't need to store data to return a result.

I am going to put many pauses in this tutorial because learning Haskell hurt a lot, at least for me. I needed breaks, and my brain hurt while I was trying to understand. Let's look at that statement again: you don't need to store data to return a result. I'll illustrate. Here is an example of a function in C:

int foo (int bar) {
	int result;
	result = bar * 10 + 4;
	return result;
}

The important part is the expression in the middle. It can also be written as follows:

int foo (int bar) {
	return bar * 10 + 4;
}

These are the same, but the second is shorter and clearer. With a function like this, you could state the following: "The value of foo(x) is equal to (x * 10 + 4)." Or, more simply, "foo(x) = x * 10 + 4". I know you're saying, "Most functions aren't that simple." That is true, but bear with me. Haskell has much more powerful tools for writing functions that most other languages, and a lot of complicated operations look this simple in Haskell. The key to using those tools will be changing the way you think from 'make data, then alter it' to 'define a function that would return the result, then apply to inputs'.

Part II: Input and Output

We'll come back to the frame of mind later. There is such a difference between Haskell and C/C++ that many concepts only make sense in conjunction with others. I need to cover the basics of several concepts before I can explain each of them fully.

Moving on, the question on everybody's mind is probably either, "How does I/O work?" or "What are these tools?" IO is one of the complicated parts of Haskell, and later I'll describe how it works. I will also describe a simple frame for programming with GHC. Until then, use GHCi or Hugs to try the examples. They have an interactive prompt where you type in expressions, like a function plus its parameters. Then they print the evaluated result. Also, variable bindings such as 'a = 4' hang around while you're using Hugs and GHCi, so my examples should work just fine. To write your own functions, you need to write a Haskell source file and load it first. Using GHC itself requires knowing some of Haskell's more complicated tools, so we'll put off learning those until we need GHC.

To use Hugs and GHCi with your own functions, you have to write a Haskell source file and load it into the interpreter. Generally, this works as follows:

Open a text editor and write Haskell code.
Save that code as a file with the extension '.hs', for instance, 'test.hs'.
With the source file in the current directory, run Hugs or GHCi.
In Hugs or GHCi, at the prompt type ':l test.hs'. That's a lowercase 'L'.
Source code that needs modules, say Data.Maybe, should include 'import Data.Maybe' at the top.

Note that the module 'Prelude' is automatically imported. The Prelude module covers most basic elements of the language.

Part III: Very Basic Intro to Types

Moving on again, let's talk about tools. Haskell's greatest strength lies in the power a programmer has to define useful functions easily and clearly. Here is our earlier example from C:

int foo (int bar) {
	return bar * 10 + 4;
}

In Haskell, to write this function named foo, you would write the following:

foo bar = bar * 10 + 4

That's all, except for the type:

foo :: Int -> Int

The type reads, "foo is of type Int to Int", meaning that it takes an Int and returns an Int. Together, you write:

foo :: Int -> Int
foo bar = bar * 10 + 4

Defining functions and types is the majority of the work in Haskell, and usually they require equal amounts of time. Haskell has a variety of ways to define functions. I feel that the tutorials mentioned previously do not adequately introduce these ways, so we will discuss them in detail later once we have some tools under our belt.

Part IV: Haskell's Lists and List Comprehensions

Those of you familiar with C know that pointers are the primary object in the language. For almost all imperative languages, the most useful structure is the Array, a randomly accessible sequence of values usually stored in order in memory. Haskell has arrays, but the most-used object in Haskell is a List. A list in Haskell is accessible only at the front, and is not stored in order in memory. While this may sound atrocious, Haskell has such weird abilities that it is more natural to use a list than an array, and often faster. Let us start with the C code to compute the fibonacci numbers (starting with zero):

int fib (int n) {
	int a = 0, b = 1, i, temp;
	for (i = 0; i < n; i++) {
		temp = a + b;
		a = b;
		b = temp;
	}
	return a;
}

This is fine for computing a particular value, but things get ugly when you want to create the sequence:

int * fibArray(int n) {
	int * fibs;
	fibs = (int *)malloc((sizeof int) * n);
	for (i = 0; i < n; i++) {
		fibs[i] = a;
		temp = a + b;
		a = b;
		b = temp;
	}
	return fibs;
}

When I say 'get ugly', I mean that something is included in that function which shouldn't be there: the size of the list. The fibonacci sequence is infinite, and the code above does not represent it, only a part of it. This doesn't sound so bad, unless you don't know how many values you need initially.

In Haskell, 'fib', the function to compute a single fibonacci value, can be written as follows:

fib :: Int -> Int
fib n = fibGen 0 1 n
	
fibGen :: Int -> Int -> Int -> Int
fibGen a b n = case n of
	0 -> a
	n -> fibGen b (a + b) (n - 1)

This is a slight improvement over the C code, but not much. Note that the type of fibGen is "Int to Int to Int to Int", meaning that it takes three Ints and returns an Int. More on that later. Also note that this uses a recursive function. Recursion is everywhere in Haskell. Most of your 'looping' functions will involve recursion instead.

The real improvement over C comes in defining the sequence:

fibs :: [Int]
fibs = 0 : 1 : [ a + b | (a, b) <- zip fibs (tail fibs)]

Don't be scared. Once you understand this function, you will understand at least half of the intracies of Haskell. Let's take it from the top. In Haskell, lists are written as follows:

[ 4, 2, 6, 7, 2 ]

This is the list of 4, then 2, then 6, etc. The ':' operator is used to compose a list by sticking a value on the front (left). For instance:

temp = 1 : [ 4, 2, 5 ]

is the list [ 1, 4, 2, 5 ].

That means that in the above code, fibs is a list of Int, and its first two values are zero and one. That's good so far. At least the first two values of 'fibs' will be right. The next part definitely looks weird. It's a tool found in Haskell called a 'list comprehension'. In Part II, I said that instead of making space and then filling it with the right values, you can define the right values. Here's that sentence, restated for list comprehensions: "You can define the values of the list rather than make space and then fill them in." List comprehensions work like so:

[ func x | x <- list, boolFunc x ]

This expression makes a new list. In the middle, there's a 'list', and it spits out values called x. These are the values of the list in order. If 'boolFunc x' is True, then x will get used in this new list. No boolFunc is in the 'fibs' example, but I include it here because it can also be extremely handy. Assuming 'boolFunc x' was true, 'func x' applies some function to the value of x, and the result is then put next in line in the final result. Here's an example of a list and its use in some list comprehensions, copied from using GHCi:

Prelude> let nums = [ 4, 2, 6, 8, 5, 11 ]
Prelude> [ x + 1 | x <- nums ]
[5,3,7,9,6,12]
Prelude> [ x * x | x <- nums, x < 7 ]
[16,4,36,25]
Prelude> [ 2 * x | x <- 9 : 1 : nums ]
[18,2,8,4,12,16,10,22]
Prelude> [ "String" | x <- nums, x < 5 ]
["String","String"]
Prelude>

Note that the order was preserved in each case. This is very important for our example. Also note that the type of the list comprehension was not necessarily the type of nums, nor did x actually have to be used in the function. Let's return to 'fibs'.

Part V: Making Sense of 'fibs', and Why Lazy Evaluation is Important

We were working on a definition for the list of Fibonacci numbers. Here is the example again:

fibs :: [Int]
fibs = 0 : 1 : [ a + b | (a, b) <- zip fibs (tail fibs)]

So what the heck is '(a, b)' and 'zip fibs (tail fibs)' and all that? Well, Haskell has a more expressive type system than most other languages. As in Python, '(a, b)' is a tuple, meaning two values stuck together. It's a convienent way to store and pass multiple values, much more so than structs. Just add parentheses and enough commas, and you pass the group of values around as you please. The only trick is that Haskell expects you to be consistent, and that means having the right type. The code adds 'a' and 'b' together to get a number in the Fibonacci sequence, so we know that'a' and 'b' are integers. Clearly, '(a, b)' is of type '(Int, Int)', which is stated as follows:

(a, b) :: (Int, Int)

We get these values labeled '(a, b)' from the list defined by 'zip fibs (tail fibs)'. Therefore 'zip fibs (tail fibs)' is of type '[(Int, Int)]', a list of 2-tuples of an Int and an Int. More clearly:

zip fibs (tail fibs) :: [(Int, Int)]

You can use GHCi and Hugs to print these types. The ':t' command, followed by a variable or function, will print its type. The following is at the GHCi prompt with the example file that includes the fibs function loaded:

*Main> :t zip fibs (tail fibs)
zip fibs (tail fibs) :: [(Int, Int)]
*Main>

So what is 'zip'? Its type and general meaning are given here:

Prelude, Section: Zipping and Unzipping Lists

'zip', as its name somewhat implies, takes two lists and 'zips' them together, returning a list of tuples, with the left member of each tuple being an item from the first (left) list, and likewise for the right.

Prelude> zip [ 1, 3, 6, 2 ] [ "duck", "duck", "duck", "goose" ]
[(1,"duck"),(3,"duck"),(6,"duck"),(2,"goose")]
Prelude>

And what about '(tail fibs)'? 'tail' is a pretty straightforward function: it chops off the first item of a list and returns the remainder. That statement can be slightly misleading. 'fibs' doesn't get altered by using 'tail' on it; as I said before, Haskell doesn't have an update operation. Instead, 'tail' just computes the proper result and returns it, rather than altering 'fibs' itself.

Prelude> tail [ 10, 20, 30, 40, 50 ]
[20,30,40,50]
Prelude>

Well, that makes it seem like 'zip fibs (tail fibs)' probably has the right type, but what is it?

fibs :: [Int]
fibs = 0 : 1 : [ a + b | (a, b) <- zip fibs (tail fibs)]

The first paramater to zip is 'fibs', which is the list defined by the expression! What the heck? Can you do that? Yes, you can. See, 'fibs' is the entire list, including the 0 and 1 at the beginning. So the first two tuples in the list created by the zip function will have a 0 and then a 1 on their left. So what is 'zip fibs (tail fibs)'? Well, the first value is definitely (0, 1). Why? Because the first item in fibs is 0, and the first item in (tail fibs) is 1, the second item in fibs. So what's the second value in zip fibs (tail fibs)? It's (1, 1). Where did the right hand 1 come from? It's the third value in fibs, which we just computed. The first value of zip fibs (tail fibs) is (0, 1), which is '(a, b)' in the list comprehension, and so the first value in that comprehension is 0 + 1, or 1. That is also the third value in fibs, etc.

Did you catch all that? The definition of fibs is evaluating itself while it is computing itself. So why didn't some sort of error happen because of undefined values? The trick to all of this is Haskell's laziness. Also, the evaluation is always one step behind the computation, so evaluation can always proceed exactly as far as needed for the next computation. Finally, this list is infinite. Of course, no computer can hold an infinite amount of data. So how much is really there? The answer is simple: until you read some values from fibs and print them, there's only the 0 and the 1, plus the function to generate more of the list. After you read some, fibs will be evaluated to that point, and no further. Since fibs is defined globally, it will remain defined in memory, making reading further values very quick. Try this with Hugs or GHCi and you see'll what I mean.

fibs !! 2
fibs !! 4
fibs !! 30
fibs !! 30
fibs !! 6
fibs !! 20
fibs !! 30
take 10 fibs

'!!' is the 'index' operator for lists in Haskell. It walks down the list and returns the nth item, zero-indexed like C/C++ and Python. 'take 10 fibs' will return the first 10 values of fibs. Be careful, fibs has infinite length. If you just type 'fibs', the output could go forever.

And why does the list only get evaluated as far as you print it? Haskell is 'lazy', meaning that it doesn't do work that it doesn't have to. C programmers know that the boolean operators '&&' and '||' are 'short-circuit', meaning that the right hand side is not evaluated unless it's needed. This allows for all kinds of neat tricks, like not dereferencing a null pointer. The entire language of Haskell has this short-circuit behavior, including the functions that you write yourself. This sounds strange, and will become even more important when we get to Haskell's tools.

This also brings us to one of the other odd things about Haskell: It is often easier to code the general definition for something than to write a function that generates a specific value. This is one of those things you have to get used to, and you will probably come back to it again. And again.

Well, give yourself a pat on the back. If you got all that, or at least you will after playing around in Hugs, then you understand about half of the common usages of Haskell. Ready for the other half? Maybe take a break, and play around a bit in Hugs or GHCi.