Haskell Tutorial for C Programmers - Section II

Written by Eric Etheridge
version 2.0
last updated May 18, 2009

Page-Wise Table of Contents

Section II: Towards Functions

Part I: The Order of Operations as a Programmer

A programming note for recursive functions and Haskell:

Concerning the fib / fibGen example here:

fib :: Int -> Int
fib n = fibGen 0 1 n

fibGen :: Int -> Int -> Int -> Int
fibGen a b n = case n of
	0 -> a
	n -> fibGen b (a + b) (n - 1)

When I was writing this example, I wrote the type of fib first, then the type and definition of fibGen, then finally the definition of fib.

For those of you who are not accustomed to writing recursive functions, Haskell programming often requires them. Often these recursive functions need subsidiary functions, which either 'frame' the main operation of recursion, or perform a simple task that keeps the main recursion function clean. In either case, the subsidiary functions can be written later, after the major recursive operation is clearly defined including end conditions.

In general, it is a good idea to concentrate on the most crucial aspect of a piece of code when programming, but Haskell's design greatly reinforces this. The definition of subsidiary functions, such as the 'setup' where fib calls fibGen with parameters '0 1 n', can wait until the function itself has been written. This is true even though the type of fib was obvious from the beginning. Likewise, Haskell makes writing trivial functions like that so quick that they can generally be ignored while thinking about the larger picture. These things are likely to change the way that you code, and probably in a good way.

Part II: Functions, But Really a Sidetrack to Types

As we move on, the other half of Haskell's general usage looms. This half is about functions.

So what is a function? As this tutorial's alignment indicates, we'll compare C/C++ to Haskell. In C, a function is a sequence of commands that have their own namespace, are called during execution and passed parameters, inherit the namespace of the scope in which they are written, and return a value to their caller. In Haskell, most of that is true, except of course functions in Haskell are not sequences of events, but expressions and definitions. There is a major difference between C and Haskell, however, and it concerns the amount of flexibility that functions have.

In C, functions take parameters and return a single value. We've already seen that Haskell has many ways to group values, like several other languages. The two most common of these are lists and tuples, and these can be the return type from a function. To sum them up, in Haskell lists are variable length and hold values of the same type, and tuples are fixed length and can hold values of different types. Here is an example of a function type that returns a tuple:

splitAt :: Int -> [a] -> ([a], [a])

'splitAt' takes an Int and a list and returns a tuple. The left value in the tuple is the first n values in the list, and the right value is the rest of the list, where n is the first parameter. This function is in the Prelude, and its description can be found here:

Prelude, Section: Sublists

We've already seen lists in a type:

fibs :: [Int]

Since the fibonacci numbers grow rapidly, but 'Int' is 32-bit, it would probably have been better to use 'Integer', Haskell's built-in infinite-precision integer storage.

fibs :: [Integer]

And this is a function type. The function takes zero parameters and returns a list of Integers. This isn't a trick of Haskell's syntax. 'fibs' really is a function that, when evaluated will return numbers comprising the fibonacci sequence. That kind of logic is what lets Haskell's compilers produce code which runs quickly and lets Haskell programmers write code efficiently.

Part III: More Types, Because Haskell Is 'Polymorphic'

It's time for a brief [not so brief] digression about types. As you've noticed, the trend seems to be to call everything a 'function'. And that's true. Take '4' for example. When you use a number '4' hardcoded into your code, it looks to you like the number 4. But what is it to Haskell? Type ':t 4' into Hugs or GHCi. What do you get? You get some weird junk:

Prelude> :t 4
4 :: (Num t) => t
Prelude>

That looks like it's a function that's taking a parameter. It's not, and the key is the '=>' arrow rather than the '->' arrow. The type given is read as follows: "four is of type 'a', where 'a' is in the class 'Num'." What's the class 'Num'? Well, it's the class that all numbers belong to. The real answer is that Haskell has something C doesn't: true polymorphism.

This is an important term and it needs some illustration. Most C++ programmers are familiar with the term 'overloading', which means that a function is defined for more than one set of parameter types. For instance, addition and multiplication in C/C++ are overloaded, allowing the following combinations to occur:

int a = 4, b = 5;
float x = 2.5, y = 7.0;
	
cout << a + b;	//9
cout << a + y;	//11
cout << y + a;	//11.0
cout << x + y;	//9.5
cout << b * a;	//20
cout << b * x;	//12.5
cout << y * b;	//35.0
cout << x * y;	//17.5

In C/C++, this is accomplished by defining all of the following overloaded definitions for '+' and '*':

operator+ (int, int);
operator+ (int, float);
operator+ (float, int);
operator+ (float, float);
operator* (int, int);
operator* (int, float);
operator* (float, int);
operator* (float, float);

The C compiler picks the appropriate type at compile time. The key distinction between polymorphism and overloading is that in C/C++, which uses overloading, each function above must be written separately. In C/C++, any function besides those above that uses either an int or a float must specify which one it expects, or must itself be overloaded. This bring us to the idea of classes.

For what types is '+' defined in C/C++? It is possible to overload the operator for new types defined by a user, but those new types will not be interchangeable with ints, floats, or other numeric types. What this means is that existing sort functions such as mergeSort and quickSort would need to be rewritten to sort values of the new type. In constrast, here is the type of mergeSort in Haskell:

mergeSort :: Ord a => [a] -> [a]

What is going on? Again, there are two parameters listed, not three. The first thing that appears to be a parameter is actually a class restriction. 'mergeSort', as you would expect, takes a list of objects of some type (type 'a'), and returns a list of objects of the same type. So why is the following type not sufficient?:

mergeSortBadType :: [a] -> [a]

The reason this is insufficient is that at some point in mergeSort the items in the list will need to be compared to each other. This will be done using a comparison operator such as '>', '<', '>=', or '<='. In Haskell, those operators are part of a class definition. The values for which '>' and so on are defined are those which are members of class 'Ord', so named because an 'order' can be determined for them. Many numeric types are of type Ord, as are characters (type 'Char') and strings (type 'String'). mergeSort needs to compare the items in its inputs, so mergeSort must clarify its type. Its parameter must be a list of objects for which '<' and so on are defined. It would also be okay to make the type more specific, but this is unnecessary and generally a bad technique.

And what about '4'? How come four is of type 'a', where 'a' is a member of class 'Num'? Can't it just be a Num? Or an Int? It can be an Int if we specifically say it is, like so:

a = (4 :: Int) + 2

Here '4' is an Int. That is how you specify the type of something inside of an expression. But without that, 4 is of type 'a', where 'a' is in class 'Num', or more simply, 4 is of type 'a' in class 'Num'. And that is important, because '+' is defined for all member types of class Num, meaning that '4' is definitely a legal parameter for this function:

doubleIt :: Num a => a -> a
doubleIt n = n + n

'-' and '*' are also defined for all member types of Num, so 4 is also allowed for this function:

fibPoly :: (Num a, Num b) => a -> b
fibPoly n = fibGenPoly 0 1 n

fibGenPoly :: (Num a, Num b) => b -> b -> a -> b
fibGenPoly a b n = case n of
	0 -> a
	n -> fibGenPoly b (a + b) (n - 1)

That is our first Haskell fib function, but with the types changed. The names have an added 'Poly' so that an error doesn't occur in the example files because of a reused name. The type of 'fibPoly' is read, "fibPoly is of type 'a' to 'b', where 'a' is a member of class Num and 'b' is a member of class Num." There is only one '=>' arrow because there is only ever one section of the type that describes class restrictions. The parentheses are required.

Why would we do this? Shouldn't we pick a single type for b rather than use a class? Here's an example. What if you worked on a group project, and two people need to calculate fibonacci numbers? And for reasons of their own, one needed an Int returned and the other needed an Integer? Or a Double? Would you write the code twice with different types? If you were using C you would. You'd have to. Using general type classes allows code reuse in a way that is impossible in other languages.

Also notice that in the initial call to 'fibGenPoly', the third parameter is 'n', the first parameter of 'fibPoly', and that the types of 'fibPoly' and 'fibGenPoly' seem to make note of this. The reason I wrote 'fibPoly' with a different return type from its parameter is that the following would be common:

fib :: Int -> Integer

We only need Int-sized storage of our counter input, but we may need Integer-sized storage of the result. Using two separate types allows this. Also, carefully check how types flow in 'fibGenPoly'. The math does not mix parameters of type 'a' and 'b', and a parameter of type 'b' is also used as the final return value. The types match not only externally but internally. Following types through code in this manner will be important for debugging.

Continuing onward, in the fib example we used 'tail'. Here is its type:

tail :: [a] -> [a]

In C, tail would have to be reimplemented for every type of list you used. That sounds like a slightly contrived problem, so what about '!!', the index operator? In most other languages, indexing a list is builtin operator, because it has to work for every kind of array. So it's not actually a function. And so on. Everything in C is either overloaded, built in, or works for only one type. There are a few exceptions, generally involving casting to or from '(void *)', but those are far outside the scope of this tutorial.

The point is, you're going to see 'Num a =>' at the beginning of type signatures, as well as 'a' and 'b' inside type signatures. Here, 'a' and 'b' are type variables, used by the compiler solely to determine proper types for compilation. Occassionally you will get messages such as 'can't determine type', or 'type mismatch'. The second means the you've done something wrong, but the first usually means that a type variable can't be pinned down to a single type for a function that you've written. This can happen for the simplest of reasons:

main = putStrLn (show 4)

Previous versions of GHC would not accept this. Here's why: 'putStrLn' takes a string and puts it on the screen. 4 has a 'polymorphic' type, i.e. it is a member of a type class, not defined as a specific type. 'show' takes anything that can be turned into a string (basically), and so it doesn't specify a type for '4' either. This leaves the compiler in a quandry, because no specific type is indicated anywhere, and it will complain. To resolve it, add the type definition like so:

main = putStrLn (show (4 :: Int))

Or Integer, or Double, or whatever. This will be handy when you try to test generalized functions, and you'll need it in a few other weird cases as well.

One last note. You can define the type of multiple functions simultaneously:

addOne, subtractOne :: Int -> Int

This can be handy.

Part IV: Functions Already

But we were talking about functions. As you may have noticed, it seems like anything can work as a parameter or return value for a function. This is absolutely true, as long as the types match. For instance, let's take a look at the extremely useful 'map' function:

map :: (a -> b) -> [a] -> [b]

By now you can probably read that, strange as it may be. "map is of type function a to b followed by a list of type a and returns a list of type b". It's taking a function as a parameter. Not only that, but a polymorphic function, with no type restrictions. And look at the other two items. The function it takes is from a to b, and then it takes a list of type a and returns a list of type b. With a name like 'map', it's pretty clear what should happen when you use it:

fooList :: [Int]
fooList = [3, 1, 5, 4]
	
bar :: Int -> Int
bar n = n - 2
	
*Main> map bar fooList
[1,-1,3,2]
*Main>

Nothing to it. In the past a type for at least 'fooList' or 'bar' would have been required or Hugs and GHC would have complained that the types could not be fully determined. 'map' is in the Prelude, and its description can be found here:

Prelude, Section: List Operations

The example using 'map' shows that you can write functions which take functions as parameters. This can be fun and very useful. Now let's try something stranger:

subEachFromTen :: [Int] -> [Int]
subEachFromTen = map (10 -)

What the heck? First, for this to work there do need to be parentheses around the '-' and the '10'. Second, what does this do? We'll take this one step at a time again. '(10 -)' is a function. It takes a number and returns ten minus that number. Use ':t' in Hugs or GHCi to find out its type:

*Main> :t (10 -)
(10 -) :: (Num t) => t -> t
*Main>

Second, 'map' takes a function as its first parameter. There's a reason that Haskell uses arrows to define types, rather than a parenthesized list. 'map', applied to '(10 -)' has the following type (again, check in Hugs and GHCi):

*Main> :t map (10 -)
map (10 -) :: (Num t) => [t] -> [t]
*Main>

It takes a list of Num members (all the same type of Num members, mind you) and returns a list of the same type (again, the same type member of Num). This is constrained to [Int] -> [Int] by subEachFromTen's type signature.

Applying the 'map' function to less than its full list of arguments like this is called 'partial evaluation'. You take a function, give it some of its parameters, and you've got a function that takes the rest of the parameters. You can even name this 'in-between' state, since it is just a function like anything else. Here is 'subEachFromTen' in action:

*Main> subEachFromTen [4, 6, 7, 11]
[6,4,3,-1]
*Main>

It does what you think it should, given how I named it. Remember that applying subEachFromTen to a list, even a named list, does not change that list, but merely returns the result.

Take some time now to play around with partial evaluation, in addition to the list functions mentioned before and list comprehensions. Remember that functions grab their parameters 'eagerly', so you have to put parentheses around any parameters that are composed of a function with its own parameters.