Debugging

Up: Programming Tips Next: The printf Approach Previous: Compiling and linking with gcc or g77

Debugging parallel programs is notoriously difficult. Parallel programs are subject not only to the usual kinds of bugs but also to new kinds having to do with timing and synchronization errors. Often, the program ``hangs,'' for example when a process is waiting for a message to arrive that is never sent or is sent with the wrong tag. Parallel bugs often disappear precisely when you adds code to try to identify the bug, which is particularly frustrating. In this section we discuss several approaches to parallel debugging.

Up: Programming Tips Next: The printf Approach Previous: Compiling and linking with gcc or g77

The printf Approach

Up: Debugging Next: Error handlers Previous: Debugging

Just as in sequential debugging, you often wish to trace interesting events in the program by printing trace messages. Usually you wish to identify a message by the rank of the process emitting it. This can be done explicitly by putting the rank in the trace message.

It is recommended that you call fflush(stdout) after your printf statements to ensure the output gets forwarded to the root without delay.

Up: Debugging Next: Error handlers Previous: Debugging

Error handlers

Up: Debugging Next: Starting processes manually Previous: The printf Approach

The MPI Standard specifies a mechanism for installing one's own error handler, and specifies the behavior of two predefined ones, MPI_ERRORS_RETURN and MPI_ERRORS_ARE_FATAL.

Up: Debugging Next: Starting processes manually Previous: The printf Approach

Starting processes manually

Up: Debugging Next: Attaching a debugger to a running program Previous: Error handlers

You can start each process in a parallel job by hand by setting the appropriate environment variables. Each process needs the following variables:

If you set these by hand then you can run each process in a debugger.

Up: Debugging Next: Attaching a debugger to a running program Previous: Error handlers

Attaching a debugger to a running program

Up: Debugging Next: Log and tracefile tools Previous: Starting processes manually