Mini-Wulf
A small Beowulf Cluster running FreeBSD
(With apologies to Verne Troyer and
The GIMP)
FreeBSD Beowulf cluster prototype/test system. Six PCs:
Master node: "Master"
Pentium 133
64M RAM
8GB hard drive
2 10bT Ethernet NICs
3Com
Addtron
Slave node: "Alpha"
Pentium 133
32M RAM
2 1.5GB hard drives
3Com 10bT NIC
Slave node: "Bravo"
K6/2 333
150M RAM
3GB hard drive
3Com 10/100bT NIC
Slave node: "Charlie"
Pentium 133
64M RAM
2GB hard drive
6GB hard drive
3Com 10bT NIC
Slave node: "Delta"
Pentium 133
32M RAM
1GB hard drive
3Com 10bT NIC
Slave node: "Echo"
Pentium 100
64M RAM
1GB hard drive
4GB hard drive
SMC 10bT NIC
Network:
Addtron 8-port 10bT ethernet hub
Cat 5 ethernet cables
Software:
OS: FreeBSD 4.7
http://www.freebsd.org
Message passing: MPICH
http://www-unix.mcs.anl.gov/mpi/mpich/
Message passing: LAM/MPI
http://www.lam-mpi.org
Model:
'Klingon Bird of Prey' cluster
http://phoenix.physast.uga.edu/klingon/

"Mini-Wulf" Beowulf cluster
The story:
Mini-wulf is my first attempt at building a true Beowulf
cluster computer. While I had experimented with using MPICH
on other workstations on our LAN before, I ran into file system
problems and network security issues, and abandoned it until I had
the hardware to build a cluster specifically for this use.
Purpose:
The reason for building mini-wulf is education and
experience. I'd only played around with MPI on a
Cray T3E a little
bit, and had never used a workstation cluster system before. My boss
asked me to spec out a Beowulf cluster system for running an atmospheric
model. I figured it would be a good idea to actually build a system
out of surplus hardware first to 'get my feet wet' before spending any
money on a more state-of-the-art cluster.
The hardware I used for mini-wulf was
systems and network hardware that had been taken off line during
upgrades. My goal was to use hardware that I had sitting around
in my spares bins and obsolete systems. Thus, the cost of mini-wulf
would be free in terms of hardware. Time, on the other hand...
Background:
A
Beowulf cluster is a collection of computers
on a separate local area network that can act as one large
parallel processing computer [see figure 1]. This is done using software that
implements message-passing between copies of the same program, which
is run on each of the computers ("nodes") on the network. Each node
works on a separate section of the problem, and when done they
send the results back to the master node, which assembles the results.
Message passing can be done on just about any networked computers, as
long as they can talk to each other. What differentiates a Beowulf
cluster from other clusters is that the Beowulf has its own local
area network for the nodes to communicate with each other, and all
the nodes (except the master) are not normal workstations but
completely dedicated to being processor nodes for the cluster.

Figure 1.
Since they are on their own LAN, the processor nodes can be
configured differently in terms of network security settings than
regular workstations. This allows faster communications and less
overhead processing on the nodes. It also allows the cluster to
appear as one single computer to users. The users are only required
to log on to the master node to use the entire cluster. This master
node has all the user account information, compilers, disk storage,
and message-passing software that the cluster uses. Processor nodes
are little more than slave CPUs, and can be more stripped-down in
terms of disk space and storage.
There are different network topologies used for Beowulf
clusters, but the bottom line seems to be: the faster, the better.
As the number of nodes in a cluster grows, so does the load on the
local area network connecting them, and this LAN bandwidth can be
the limiting factor when the number of nodes exceeds a certain point.
In the 'typical' Beowulf, the master node has two network interface
cards (NICs), one attached to the external network (the internet),
and one attached to the cluster LAN. The external NIC is usually
allocated a static IP address assigned by the network administrator.
Internal NICs
also use static IPs, but since they are only for use on the Beowulf
LAN, they can be anything you like. Usually a set of IP addresses
from an RFC 1918 pool is chosen, since these IPs are non-routable
and therefore won't cause problems if a node is accidentally attached
to an outside network. The processor or slave nodes usually have only
one NIC. All the processor nodes and the inside NIC of the master node
are all connected together using a fast switch. Hubs can be used, but
will cause slow-downs due to collisions. Some switches can support
channel-bonding, which allows multiple NICs in processor nodes to
act as a single NIC of larger bandwidth. This is beyond the scope
of this article, but more information can be found on the web.
Building Mini-wulf:
Note: Mini's hardware has been changed a few times, so this
description no longer matches what currently comprises the cluster.
See the component listing at the top of this document for the current
setup.
Of the three computers I had at my disposal to build
Mini-wulf, I chose the Pentium 90 as the master node. I chose
the P90 because it had the largest hard drive, and the most memory.
Please note that when building a Beowulf cluster you usually
use the fastest, most powerful workstation for the master, not
the weakest. The master node is where all the compiling takes place,
as well as carrying a lot of NFS traffic and X if it's running.
My choice of the P90 for the master node was based only on storage
and physical memory, and since mini-wulf is more of an educational
machine than a production number cruncher, I could get away with
it.
I began construction by adding a second NIC to the P90.
One caveat here using old hardware: don't try to use two ISA
NICs in the same machine! My 3Com plug-and-play ISA boards did
not get along all that well. Switching one to a SMC PCI NIC fixed
the problem. I installed FreeBSD 4.5 on the system and configured
the outside NIC for the assigned IP of the machine. After I got
that working properly, I looked through the /var/log/messages
file, found the device name of the second NIC, and modified the
/etc/rc.conf file to assign it an RFC 1918 IP of 192.168.1.1.
For an internal IP schema, I elected to use the RFC 1918
pool of 192.168.1.x.
I then installed FreeBSD on each of the two processor
nodes. These I assigned IPs of 192.168.1.10 and 192.168.1.11. I
left the numbering gap in case I wanted to add any auxiliary nodes
later, such as a name server or additional NFS server. Since I
didn't feel like typing in all those numbers every time I wanted
to access these other nodes, I assigned names to the boxes as
well. 192.168.1.1 I called 'master', 192.168.1.10 'alpha', and
192.168.1.11 'bravo'. Since these names aren't in any DNS databases,
I hard-coded them in to each machine by adding them to the
/etc/hosts
file along with their IP numbers.
At this point I hooked all the internal NICs to an Addtron
5-port 10bT hub, the outside master NIC to our network, and rebooted all three
boxes. All were then talking to each other (I could ssh to each
box from all the others). I created a login account for myself
on each of the nodes, using the same userid and number on each.
Synchronizing the clocks:
The next step was to get all the boxes' clocks synchronized.
According to some of the Beowulf documentation I found, synching
the clocks is important, and the easiest way to do it is have your
master node act as a network time protocol (NTP) server. First I
had the master node sync its clock to an internet NTP server
by modifying the
/etc/ntp.conf file
and HUPing the ntpd daemon. On each node the
/etc/ntp.conf file
has 'server 192.168.1.1' as its' first line. This should cause
each node to synch its' clock to the master node.
NFS home directories:
The message-passing software I used expects the
home directories on all the processor nodes to be the same as
the master. This I accomplished by mounting the home directory
from the master to the nodes via NFS. I added an entry to the
/etc/exports
file on master to share out the /usr/home directory to the
processor nodes. On the processor nodes, I mounted
the master:/usr/home directory as /usr/home.
Using ssh for remote shell execution:
The master node needs to be able to run programs on the
processor nodes without hassling with login passwords and the like.
To allow this, the nodes need to have some sort of remote shell
capability set up. I use ssh for all my remote shell work, because
it is encrypted and fairly secure. To allow remote logins without
a password, I used ssh-keygen to generate a public/private key pair
on the master node. Don't put in a key-phrase when generating
this key. Copy the public key (usually .ssh/identity.pub)
to your keyring (usually .ssh/authorized_keys) on the master node,
which should automatically include it on the processor nodes
because of the shared home directories. Test this by ssh'ing
to another machine. The login should happen without asking
for a password. One caveat: I changed my .ssh/authorized_keys
entry to end with username@master.domain. My key was
generated using the public IP name of the machine, which the
internal LAN had problems with, since it saw the connection
coming from the master's internal NIC. If you get authorization
errors, this is a good place to look. On the other hand, it might
work just fine as it is, your mileage may vary.
Ssh adds encryption overhead to communications, and should
actually not be used on the internal LAN if you're really worried
about speed. Rsh should be configured to run on all the nodes
within the cluster, which involves enabling it in the /etc/inetd.conf
file, and authorizing machines in the /etc/hosts.equiv file.
This comes disabled in FreeBSD because rsh is not terribly secure,
but since the cluster LAN is not open to the public, it should not
be a problem. If you're willing to take the processing hit, ssh
will work just
fine too. If you're clustering workstations on a regular network,
ssh is the way to go, since using rsh on an open network is risky.
Installing MPICH:
Now that the nodes are synched and talking to each other
in a trusting manner, it's time to actually install some message passing
software. The first package I installed was MPICH. I uncompressed and
untared the package, then ran the configure script with the prefix
option to tell it where I wanted the package installed:
./configure --prefix=/usr/local/mpich-1.2.4
The configure script does various things while building the makefile,
including testing the ssh and rsh capabilities of the master node. This
is why that must be running before installing MPICH, and also why you
need to be able to ssh from the master to itself without passwords.
After configure
runs, it's time to run 'make' to actually build the package.
Finally 'make install' (run as root) puts the package
in its' final location. You then need to
tell MPICH what machines are available to run processes on. This is
accomplished by editing the machines.(os) file, in my case:
/usr/local/mpich-1.2.4/share/machines.freebsd. MPICH puts five copies
of the name of the master node in this file. Change it to a listing
of all the nodes, one per line (in this case, master, alpha, and bravo).
Now it's time to test the cluster to see if the nodes can talk to
each other via MPI. Run the tstmachines script in the sbin/ directory
under the mpich directory to verify this. It will help to use the -v option
to get more info. If this works, it's time to run a program on the cluster.
Under the distribution tree for mpich you'll find an examples directory.
Inside that, under the basic directory, you'll find the cpi program. This
program calculates the value of Pi, and is a good tool for verifying the
cluster is working properly. Run 'make cpi' in the basic directory to
build the executable. Run the program using the mpirun command. Here's
what I used to test mini-wulf:
mpirun -np 3 -nolocal cpi
Note: I put /usr/local/mpich-1.2.4/bin in my path before doing this, so the
machine could find mpirun. Also, the -nolocal flag was needed on my cluster
to keep it from trying to run all the processes on the master node. I don't
understand why this is, but it works for me.
Update: the -nolocal flag is only needed when the 'outside' name of the
node is included in the loopback line in /etc/hosts, which linux does by
default. Change the loopback line to read '127.0.0.1 localhost.localdomain localhost'
and MPICH won't require the flag.
The -np flag tells mpirun how
many processors to use to run the program, 3 in this case. Here's the output
I got:
% mpirun -np 3 -nolocal cpi
Process 0 of 3 on (outside IP).rwic.und.edu
pi is approximately 3.1415926535899121, Error is 0.0000000000001190
wall clock time = 2.800307
Process 1 of 3 on alpha.rwic.und.edu
Process 2 of 3 on bravo.rwic.und.edu
Note: I changed cpi.c to add more loop cycles to the program to get a longer
run time. This helped make the difference between using more nodes less
influenced by communication lags and overhead. Looks like we're actually using
all processors, but let's try some other configurations just to make sure:
% mpirun -np 1 -nolocal cpi
Process 0 of 1 on (outside IP).rwic.und.edu
pi is approximately 3.1415926535897309, Error is 0.0000000000000622
wall clock time = 8.395115
%
% mpirun -np 2 -nolocal cpi
Process 0 of 2 on (outside IP).rwic.und.edu
pi is approximately 3.1415926535899850, Error is 0.0000000000001918
wall clock time = 4.197473
Process 1 of 2 on alpha.rwic.und.edu
Yep, looks like mpirun is calling on the specified number of CPUs to run the program,
and the time savings using more CPUs is what you'd expect. Using two CPUs runs the
program in 49.9% of the time it took one, and three runs it in 33.4%. This is a
beautiful 1/N progression for the runtime vs. number of CPUs, but don't expect it to
hold for more complex programs or huge numbers of cluster nodes.
Installing LAM/MPI:
Since MPICH has some issues with NFS, and the
Klingon Bird of Prey cluster
Mini-Wulf is based on runs it, I decided to install the
LAM/MPI implementation
of MPI.
Running
./configure --prefix=/usr/local/lam_mpi --with-rsh=/usr/bin/ssh
revealed no problems, since LAM/MPI supports native FreeBSD. The INSTALL file
did instruct me to run make with the '-i' option under FreeBSD, since 'BSDs
version doesn't always handle script result codes the way they'd like. The
usual 'make -i' and 'make -i install' followed. After adding /usr/local/lam_mpi/bin
to my path, I also built the examples via 'make -i examples'.
While the mighty P90 CPU chewed on this task, I started another shell and edited
the /usr/local/lam_mpi/etc/lam-bhost.def file, which contains a list of all the
processor nodes in the cluster. This defaults to just one, the node that LAM is
built on. I added the other two nodes in the cluster.
After the examples were built and the lam-bhost.def file adjusted, it was time to test!
LAM runs a little differently than MPICH, in that it runs daemons on each node in
the cluster to facilitate the message-passing. This means that the LAM executables
must be on every node. I ran 'recon -v -a' to test the remote nodes, and got
errors when they wouldn't run the LAM program 'tkill' (which is what recon uses
to test the cluster). Since I hadn't shared out the /usr/local/lam_mpi directory
on the master node, the slaves couldn't find it. I debated doing the NFS share,
but for the moment just copied the directory to the remote nodes using scp. This
keeps NFS traffic down, although it would make cluster maintainance more labor-intensive.
(Note: I've since shared out the /usr/local/lam_mpi directory and NFS mounted
it on the slaves. This makes things a lot easier for upgrades later)
I also had to end up setting the LAMHOME environment variable to /usr/local/lam_mpi, since both recon and
lamboot were having trouble finding executables (although putting the $prefix/bin
directory in my PATH should have taken care of it. Oh well, whatever works). After
that, I ran 'lamboot' and got the required output.
Now that LAM was actually running, it was time to actually run some parallel programs
to test the cluster. I went to the examples/pi directory, and fired off my old friend,
the cpi program. LAM syntax is a bit different to start:
mpirun C cpi
The output was huge! This version of cpi was a bit different than the other I had
tested, so I copied that one (from the MPICH distro) to the local directory, compiled
it using mpicc, and ran that version using LAM. Here's the output:
> mpirun C cpi
Process 0 of 3 on (outside IP).rwic.und.edu
Process 1 of 3 on alpha.rwic.und.edu
Process 2 of 3 on bravo.rwic.und.edu
pi is approximately 3.1415926535899121, Error is 0.0000000000001190
wall clock time = 2.805890
So, it looks like the LAM version of MPI is running. It also compiles code written for
MPICH with no modifications, and runs the resultant executable in a very similar elapsed
time. Very satisfactory.
LAM requires one last step that MPICH doesn't: you have to shut down the LAM daemons on
all the nodes. This is accomplished via the 'lamhalt' command. There seems to be no man
page for this command, but you can do a man on the older 'wipe' command, which will give
you more info.
Status: June 28, 2002
At this point the cluster is functional, and can be used as is.
Most production clusters, however, need more than just the bare-bones of message-
passing. Mini-wulf at this point would be fine for a single or small number of
users running a small number of programs, but when you start adding lots of users
and/or having more programs running, management of the cluster becomes a chore.
Bigger clusters use tools for batch processing programs so that all programs get
a fair share of the CPU cycles of the cluster. Also, using good old 'adduser' on
each node to keep track of user accounts gets tedious.
Adding these tools to mini-wulf will be explored at a later date.
Status: July 1, 2002
Mini-wulf is currently off-line waiting for upgrades. I say, "waiting for upgrades."
because it sounds better than "scavenged for parts." I had a need for a server at work, and
since Mini was available, I concatenated together some of its parts to make the server. The
slave nodes are basically intact, with one CPU downgraded from an AMD K6-2 400 to a Pentium 120,
but I'll need to find a new master node. I've got a box in mind, I just have to make time to
configure it.
Status: July 3, 2002
Mini-wulf is back in operation! After mucking about with a cranky 3Com 509 NIC, I got
the new master node configured and functional. The Bravo node is still only running as a
Pentium 100, even though the CPU was only downgraded to a P120. More investigation is needed.
In any case, the overall performance of the cluster has suffered a bit. Here's the output
from a cpi run in the new configuration:
Process 0 of 3 on (outside ip).rwic.und.edu
Process 1 of 3 on alpha.rwic.und.edu
Process 2 of 3 on bravo.rwic.und.edu
pi is approximately 3.1415926535899121, Error is 0.0000000000001190
wall clock time = 3.301860
I thought that upgrading the weakest machine would help the overall performance, but as you
can see, the clock time for the program is about one second longer. This result is consistant
over several test runs. The new master node has only 32M of RAM, so there may be some disk
swap going on before the program is passed out and run on the LAN. More testing will be
conducted as I have time.
Status: July 5, 2002
Since the new master node was so pathetic, I just had to move stuff around again.
I shuffled NICs and made the old alpha node the new master, since it now had the largest
hard drive and most RAM. After fussing about with NFS mounts, rc.conf and hosts files,
I finally got everything running properly. I shared out the /usr/local/lam_mpi directory
to the slaves via NFS, since I had to end up rebuilding LAM due to damage I did during
the move. I also added set prompt = '%n@%m:%/%# '
to my .cshrc
file,
since doing the wrong commands on the wrong nodes is what messed me up in the first place.
Hint: never try to scp a directory on to itself, it corrupts all the files and generally
makes you unhappy. The cluster actually runs a bit faster now, interestingly enough:
Process 0 of 3 on (outside ip).rwic.und.edu
Process 1 of 3 on alpha.rwic.und.edu
Process 2 of 3 on bravo.rwic.und.edu
pi is approximately 3.1415926535899121, Error is 0.0000000000001190
wall clock time = 2.526050
I also ran the lam test suite, just to verify that the package was properly built and
installed. No problems at this point.
Status: July 16, 2002
I've been running the Pallas benchmark
on miniwulf for about a
week in various configurations. The Results are rather interesting.
They indicate that the choice of MPI implementation and even hardware is dependent on
your code and application of the cluster.
Status: September 3, 2002
Over the long weekend, I decided I wanted to try running some other
distributed computer clients on the slave nodes (
www.distributed.net). Since these clients were designed to run on single computers
attached to the internet, and miniwulf's slave nodes couldn't access the internet,
I had some adjusting to do. I decided to setup NAT (network address translation) on
the master node. This would allow the slave nodes with their unroutable IP numbers
to pass packets to the master, which would strip off the old IPs and use its own,
routable IP on the packets. When the packets return from the internet, natd uses
its tables to figure out which slave the packet originated from, and puts the internal
IP back on it. It's pretty slick, but under FreeBSD requires jumping through some hoops.
I had to build a custom kernel with ipfw firewall capability, and write a simple firewall
ruleset. It's ipfw that actually passes the packets off to natd. Also, the /etc/rc.conf
file has to have a few adjustments as well, such as enabling forwarding and activating
the firewall. Instructions for doing all this is available at
www.freebsd.org.
The NAT routing makes the slave nodes think they're connected directly to the internet, but
blocks any outside hosts from accessing the nodes. The cluster is thus still fairly secure,
and can still run the MPI software without any problems.
Status: October 14, 2002
Mini-wulf continues to evolve. Since the distributed.net project was completed (at
least the RC5-64 section that I was interested in), I removed the client programs from the
nodes. I left the natd functionality intact, however, to allow easier upgrades and other
maintainance of the cluster nodes.
Mini-wulf has finally been used for the purpose for which it was built: education. I enrolled
in an online MPI programming course offered by the
Ohio Supercomputer Center. Mini-wulf has been very handy for
doing homework problems for this course. It's also interesting to note that programs run with
more than three processors, which is all mini-wulf has, work just fine. This does cause more
than one process to be run on each node, but for simple programs that don't require huge
amounts of computing power, that's not a problem. Of course, a single computer with MPI
installed could also be used to run simple MPI programs to teach and demonstrate message
passing, but that would leave out all the fun of building the cluster :).
Status: January 22, 2003
"Charlie" node added. Mini is finally a four node cluster! When an old samba server
underwent an upgrade (pronounced 'replacement'), I found myself with a fully fuctional
Pentium 133 based FreeBSD box. After a quick re-read of this document, I made the necessary
adjustments to the old box's network and NFS settings, plugged it into the Mini-wulf LAN,
powered up, and away it went! Benchmarking with the pi calculation program, Mini's crunching
abilities have increased 25%. Not the 33% I would have expected, but perhaps I'm basing my
expectations on some dodgy math. It's still very gratifying that the cluster is so easy to
upgrade.
I have room left on the hub and power strip for one more system to be added to
Mini. However I don't know how likely it is that I'll do this. While Mini-wulf has been
great fun and very educational to build, computationally it gets its butt whupped by our
dual Pentium 3 Xeon system. I'm now starting the process of building our 'real'
Beowulf cluster that will have some serious MFLOPs and storage. Mini taught me many of
the things I needed to know to build the big cluster, but it will most likely be used
for 'hot storage' of old hardware from now on. It's always possible students may want to
use Mini for experimental purposes, but as a high-powered number cruncher it's just too
limited to be useful for big problem solving.
Status: January 24, 2003
I modified the pi calculation program to include a crude
MFLOP (million floating
point operation) calculator, just so I could do some simple benchmarking. Since the pi
program doesn't do any trig or other heavy math, the results should be used more as a
relative guide rather than absolute. Here are the results using different numbers of nodes:
Number of nodes MFLOPS
--------------- ------
1 12.2
2 20.1
3 22.6
4 30.1
It should be remembered that the number three node is a pentium 100, while the others are
133s. Even so, it's a bit bizzare that the addition of the third node only increased the
performance by about 12%.
The same code running on a dual 1.7GHz Pentium III Xeon system gave:
Number of nodes MFLOPS
--------------- ------
1 235.6
2 471.3
As you can see, a modern dual-processor computer beats Mini's crunching capability by about an
order of magnitude. I theorize the nice doubling of performance on the dually is because the
interprocess communication is taking place on the bus, rather than across a network.
Status: March 25, 2003
Upgraded cluster OS to latest FreeBSD security branch.
Status: March 26, 2003
Installed ATLAS linear algebra math
library.
Status: April 25, 2003
Ran Pi MFLOP benchmark again, this time for up to 20 processes (the cluster still
has only 4 nodes).
Number of processes MFLOPS
------------------- ------
1 12.2
2 23.7
3 27.4
4 36.6
5 25.0
6 30.1
7 32.0
8 36.6
9 30.1
10 33.4
11 31.3
12 34.1
13 32.6
14 35.1
15 30.9
16 32.7
17 34.1
18 36.1
19 30.7
20 32.4
Graphical version.
Status: May 12, 2003
Mini is up and operational again. During the previous week a critical server
failed, so I was forced to borrow the charlie node to fill in for it until a replacement
could be built.
Over the weekend I built Deuce, a two-node cluster running
Redhat Linux 9.0. Since Zeus
will be running this OS, I wanted to get some clustering experience with it.
Status: May 23, 2003
Since Mini is now a four node cluster, I reran the
Pallas benchmark on it. Here
are the results. These are for MPICH running on a 10bT hub.
Status: August 5, 2003
Delta node added. Yet another pentium 133 was retired from active service and was
added to Mini. This makes 5 nodes total, and fills the Addtron 10bT hub (and the power
strip) to capacity. This is likely the last node I'll add to Mini. I do have an 8-port
10bT hub I could use for the LAN, but the counter where I have the cluster installed is
running short on space. Since most of my energies as far as Beowulfs are concerned are
being spent on Zeus, Mini is mostly a curiosity for me these days.
I did run my Pi MFLOP benchmark on the new configuration, and
found a ~24% increase in maximum
MFLOPS over the 4-node configuration. Here are the results:
Number of processes MFLOPS
------------------- ------
1 12.1
2 24.2
3 22.6
4 29.6
5 37.6
6 36.2
7 42.4
8 30.1
9 33.8
10 37.6
11 41.4
12 45.1
13 32.6
14 35.1
15 37.6
16 40.1
17 42.6
18 33.8
19 35.7
20 37.6
Graphical Version
It's interesting to note the changes in the maximum performance, which show up at 12 processes on the
5-node cluster, but at 8 processes on the 4-node. It should be noted that I've switched back to the MPICH
implementation of MPI for this test, while the previous one was made using LAM/MPI. This could
certainly have an effect on the response of the cluster to different processing loads.
Status: August 13, 2003
Clusters must be some sort of disease, or perhaps addictive. I just couldn't leave well enough alone.
Another box became available, so I replaced the Bravo node with a K6/2 333. After a few abortive
attempts, I replaced the 3Com ISA NIC with a PCI version, and got it working. Before the dust had
settled, I renamed the old Bravo node to Echo, swapped out the 5-port hub for an 8-port, adjusted all
the /etc/hosts files (strange things happen if all your nodes don't know about each other) and
/usr/local/mpich-1.2.4/share/machines.freebsd, and ran the
MFLOP benchmark for 20 processes again. This resulted in a peak performance jump
of about 26% over the 5-node cluster configuration. Here's the output:
Number of processes MFLOPS
------------------- ------
1 12.2
2 24.4
3 36.4
4 40.1
5 50.1
6 45.2
7 42.6
8 48.1
9 54.1
10 50.1
11 55.1
12 45.2
13 48.9
14 52.7
15 56.4
16 53.4
17 56.8
18 45.2
19 47.6
20 50.2
Graphical Version
Changing the weak node from the second in line to the sixth has changed the shape of the repeating part
of the performance graph. I haven't tested the benchmark code on the new K6/2 node individually yet,
but it does look like it stacks up favorably to the Pentium 133s.
Looking back over this document I see that Mini is just
over one year old, has doubled in size and almost doubled in computational power. It's come a long way
since it began as an unsanctioned after-hours experiment with a few old computers I was going to
surplus and some obsolete network gear. I remember searching the web for hours, trying to figure out
what a Beowulf was and how it worked, and scratching for the little info available on FreeBSD 'wulfs amid
the comparitive wealth on Linux clusters. When I first ran the Pi program on Mini, I was totally stoked.
I started this web page and sent the URL to my Boss, who wasn't exactly bubbling over with enthusiasm
("Let's not waste too much time on this."). Since that time, however, he's become a cluster convert,
and was willing to fund the building of Zeus. While neither of us think Beowulf
clusters will send big-iron supercomputer builders packing, they do allow cash-strapped researchers
some decent computational horsepower for certain algorithms that they are suited for.
Status: September 2, 2003
A sad day. Miniwulf has been given its marching orders. The counter where the cluster stood
was needed for another project (a lab we were using to build stuff was needed again for teaching). Mini's
master node was also used as a DNS and NTP server, so it had its secondary NIC removed, configuration
adjusted, and was moved to a different machine room. The compute nodes were shut down and moved into
a storage area. It is possible that another master node could be built from one of the compute nodes,
and the cluster set up elsewhere. This will have to wait until I have sufficient free time and can
find space and power/network resources to run the cluster. Given the lack of computational power Mini
suffered from, the incentive to reassemble it is not terribly high.
So, 15 months. That's about how long Miniwulf was operational. It's been a fun and educational ride.
I took a final snapshot of the critical configuration files that went into building Mini:
Status: September 18, 2003
I've pretty much accepted the fact that Miniwulf will most likely never be reassembled.
I've raided the collection of compute nodes for replacement PCs and parts, and with the cost of
much more powerful PCs as low as they are, it just doesn't make sense to rebuild a Beowulf
that was based on Pentium 133s and 10bT ethernet. Mini's purpose was education: helping me
learn how to build, program, and manage a cluster, and now that that purpose is fufilled,
it's time to move on.
Links:
Other Clusters:
Tools for building clusters:
Other cluster stuff: