Using the MPD System Daemons with the ch_p4mpd device


Up: Details Next: Installation Previous: Examples

The new MPD system, together with its advantages in speed of startup and management of stdio is described in detail in [(ref bgl00:mpd:pvmmpi00),(ref butler-lusk-gropp:mpd-parcomp)]. Here we briefly discuss the installation process.



Up: Details Next: Installation Previous: Examples


Installation


Up: Using the MPD System Daemons with the ch_p4mpd device Next: Starting and Managing the MPD Daemons Previous: Using the MPD System Daemons with the ch_p4mpd device

To build mpich with the ch_p4mpd device, configure mpich with

    configure --with-device=ch_p4mpd -prefix=<installdir> <other options> 
It is particularly important to specifiy an install directory with the prefix argment (unless you want to use the default installation directory, which is /usr/local), since the ch_p4mpd device must be installed before use.

If you intend to run the MPD daemons as root, then you must configure with --enable-root as well. Then it will be possible for multiple users to use the same set of MPD daemons to start jobs.

After configuration, the usual

    make 
    make install 
will install mpich and the MPD executables in the <installdir>/bin directory, which should be added to your path.



Up: Using the MPD System Daemons with the ch_p4mpd device Next: Starting and Managing the MPD Daemons Previous: Using the MPD System Daemons with the ch_p4mpd device


Starting and Managing the MPD Daemons


Up: Using the MPD System Daemons with the ch_p4mpd device Next: Thorough Testing Previous: Installation

Running MPI programs with the ch_p4mpd device assumes that the mpd daemon is running on each machine in your cluster. In this section we describe how to start and manage these daemons. The mpd and related executables are built when you build and install mpich after configuring with

   --with-device=ch_p4mpd -prefix=<prefix directory> <other options> 
and are found in <prefix-directory>/bin, which you should ensure is in your path.

To start an MPD daemon, use the command

   mpd & 
(this is installed in the <prefix-directory>/bin directory; if you have not added that directory to your path, just give the full pathname). This will start an MPD daemon on the current machine and send any diagnostic information to standard output. Before running this command, you must have created a .mpd.conf file as discussed below. To make real use of the MPD system, you need to start a daemon on each machine. You can do this by logging into each machine that you want to run MPI programs on and the running mpd with options that specify the host that the first mpd is running on a the port that it is using. For example, the following example starts an mpd on host shakey and then, using ssh, starts another mpd on host terra. Note that the mpdtrace command is used to find the port that the first mpd is using. For concreteness, this example also assumes that mpich is installed in /usr/local/mpich.


   # Start the initial mpd 
   % mpd & 
   # Get the port  
   % mpdtrace 
   mpdtrace: shakey_39182:  lhs=shakey_39182  rhs=shakey_39182  rhs2=shakey_39182 gen=1 
   % ssh -n terra /usr/local/mpich/bin/mpd -h shakey -p 39182 & 
If you cannot use a remote shell command such as rsh or ssh, you can do the same by logging into each node and running the mpd command. In the above example, this might look like
   % rlogin terra 
   Password: xxxxxxxxxxxx 
   % /usr/local/mpich/bin/mpd -h shakey -p 39182 & 
   % logout 
If you do have a working remote shell program, you can use a shell loop to start the processes on the remote machines. For example, if you have a list of nodes in the file machines, then you could use (assuming that you are using csh):
    foreach host (`cat machines`)   
        ssh -n $host /usr/local/mpich/bin/mpd -h shakey -p 39182 & 
    end 
to start the MPD daemons on all of your machines.

There is also a simple script, startdaemons, that can be used or modified for your environment, to do some of this automatically.

At any time you can see what mpds are running by using mpdtrace.

When daemons are started, they enter the ring one by one. At this time authentication is required of each new daemon. Authentication is managed by a password stored in a configuration file called .mpd.conf in the user's home directory. (If the MPD system is configured with ENABLE_ROOT, then the configuration file should be /etc/mpd.conf. The permissions on .mpd.conf must be set so that the file is writeable and readable only by the user. The format of a configuration file is as follows:

# this is a comment 
  password=56rtG9 
Other keywords are available, corresponding to the arguments for mpd. In this file they are of primary benefit to those setting up systems in which the order in which the mpd's come up and the availability of certain ports can be guaranteed. If you are running the mpd's as a user then the above is all you need in your .mpd.conf file.

An mpd is identified by its host and a port. A number of commands are used to manage the ring of mpds:

mpdhelp
prints this information
mpdcleanup
deletes Unix socket files /tmp/mpd.* if necessary.
mpdtrace
causes each mpd in the ring to respond with a message identifying itself and its neighbors.
mpdringtest count
sends a message around the ring ``count'' times and times it
mpdshutdown mpd_id
shuts down the specified mpd; mpd_id is specified as host_portnum.
mpdallexit
causes all mpds to exit gracefully.
mpdlistjobs
lists active jobs managed by mpds in ring.
mpdkilljob job_id
aborts the specified job.

Several options control the behavior of the daemons, allowing them to be run either by individual users or by root without conflicts. The current set of command-line options comprises the following:
{ -h <host to connect to>}
{ -p <port to connect to>}
{ -c}
allow console (the default)
{ -n}
don't allow console
{ -d <debug (0 or 1)>}
{ -w <working directory>}
{ -l <listener port>}
use this port instead of obtaining one dynamically (useful when running mpd as root)
{ -b}
background; daemonize (disconnects stdout, stderr)
{ -e}
don't let this mpd start processes, unless root
{ -t}
echo listener port at startup

The -n option allows multiple mpds to be run on a single host by disabling the console on the second and subsequent daemons.



Up: Using the MPD System Daemons with the ch_p4mpd device Next: Thorough Testing Previous: Installation