Debugging


Up: Programming Tips Next: The printf Approach Previous: Compiling and linking with gcc or g77

Debugging parallel programs is notoriously difficult. Parallel programs are subject not only to the usual kinds of bugs but also to new kinds having to do with timing and synchronization errors. Often, the program ``hangs,'' for example when a process is waiting for a message to arrive that is never sent or is sent with the wrong tag. Parallel bugs often disappear precisely when you adds code to try to identify the bug, which is particularly frustrating. In this section we discuss several approaches to parallel debugging.



Up: Programming Tips Next: The printf Approach Previous: Compiling and linking with gcc or g77


The printf Approach


Up: Debugging Next: Error handlers Previous: Debugging

Just as in sequential debugging, you often wish to trace interesting events in the program by printing trace messages. Usually you wish to identify a message by the rank of the process emitting it. This can be done explicitly by putting the rank in the trace message.

It is recommended that you call fflush(stdout) after your printf statements to ensure the output gets forwarded to the root without delay.



Up: Debugging Next: Error handlers Previous: Debugging


Error handlers


Up: Debugging Next: Starting processes manually Previous: The printf Approach

The MPI Standard specifies a mechanism for installing one's own error handler, and specifies the behavior of two predefined ones, MPI_ERRORS_RETURN and MPI_ERRORS_ARE_FATAL.



Up: Debugging Next: Starting processes manually Previous: The printf Approach


Starting processes manually


Up: Debugging Next: Attaching a debugger to a running program Previous: Error handlers

You can start each process in a parallel job by hand by setting the appropriate environment variables. Each process needs the following variables:

    1. MPICH_JOBID=some short unique string to identify the job
    2. MPICH_NPROC=total number of processes in the job
    3. MPICH_IPROC=rank of the current process
    4. MPICH_ROOT=host:port where the root process will live and listen
If you set these by hand then you can run each process in a debugger.



Up: Debugging Next: Attaching a debugger to a running program Previous: Error handlers


Attaching a debugger to a running program


Up: Debugging Next: Log and tracefile tools Previous: Starting processes manually

You can often attach the MSDEV debugger to a running process locally. Visual C++.NET has the ability to debug processes remotely. See the MSDEV help utility for details.



Up: Debugging Next: Log and tracefile tools Previous: Starting processes manually