Debugging parallel programs is notoriously difficult. Parallel programs are subject not only to the usual kinds of bugs but also to new kinds having to do with timing and synchronization errors. Often, the program ``hangs,'' for example when a process is waiting for a message to arrive that is never sent or is sent with the wrong tag. Parallel bugs often disappear precisely when you adds code to try to identify the bug, which is particularly frustrating. In this section we discuss several approaches to parallel debugging.
Just as in sequential debugging, you often wish to trace interesting events in the program by printing trace messages. Usually you wish to identify a message by the rank of the process emitting it. This can be done explicitly by putting the rank in the trace message.
It is recommended that you call fflush(stdout) after your printf statements to ensure the output gets forwarded to the root without delay.
The MPI Standard specifies a mechanism for installing one's own error handler, and specifies the behavior of two predefined ones, MPI_ERRORS_RETURN and MPI_ERRORS_ARE_FATAL.
You can start each process in a parallel job by hand by setting the appropriate environment variables. Each process needs the following variables:
You can often attach the MSDEV debugger to a running process locally. Visual C++.NET has the ability to debug processes remotely. See the MSDEV help utility for details.