The Most Common Problems


Up: In Case of Trouble Next: Troubleshooting Shared Libraries Previous: Submitting bug reports

This section describes some of the most common problems encountered when building and using mpich. See also Section Frequently Asked Questions which covers frequently asked questions, including some additional problems.

{ Permission Denied.}
Q: When I use mpirun, I get the message Permission denied.


A: If you see something like this

    % mpirun -np 2 cpi  
    Permission denied. 
or
    % mpirun -np 2 cpi  
    socket: protocol failure in circuit setup 
when using the ch_p4 device, it probably means that you do not have permission to use rsh to start processes. The script tstmachines can be used to test this. Try
    tstmachines  
If this fails, then you may need a .rhosts or /etc/hosts.equiv file (you may need to see your system administrator) or you may need to use the p4 server (see Section Using the Secure Server ). Another possible problem is the choice of the remote shell program; some systems have several. Check with your systems administrator about which version of rsh or remsh you should be using. If you must use ssh, see the section on using ssh in the Installation Manual.

If your system policy allows a .rhosts file, do the following:

    1. Create a file .rhosts in your home directory
    2. Change the protection on it to user read/write only: chmod og-rwx .rhosts.
    3. Add one line to the .rhosts file for each processor that you want to use. The format is
    host username 
    
    For example, if your username is doe and you want to user machines a.our.org and b.our.org, your .rhosts file should contain
    a.our.org doe 
    b.our.org doe 
    
    Note the use of fully qualified host names (some systems require this).

    On networks where the use of .rhosts files is not allowed, you should use the secure server to run on machines that are not trusted by the machine that you are initiating the job from.

    Finally, you may need to use a non-standard rsh command within mpich. mpich must be reconfigured with -rsh=command_name, and perhaps also with -rshnol if the remote shell command does not support the -l argument. Systems using Kerberos and/or AFS may need this. See the section in the Installation Guide on using the secure shell ssh.

    An alternate source of the ``Permission denied.'' message is that you have used the su command to change your effective user id. On some systems the ch_p4 device will not work in this situation. Log in normally and try again.

Connection Refused.
This problem may be caused by Internet security settings on your system that restrict the number and frequency of interprocess connection operations. Check with your systems administrator. Linux users (depending on the Linux distribution) should try running the following commands:
   iptables --list 
   ipchains --list 
Look for any limits, restrictions on source or destination ports, or limits on syn (a type of TCP packet used in establishing connections). If you find such limits, study your security documentation and decide how you want to modify the security settings. We normally recommend that a cluster be placed behind a firewall rather than having each cluster node limit the use of TCP.

Also check the file /etc/inetd.conf to ensure that it allow more processes per minute for rsh. See the FAQ entry (Appendix Frequently Asked Questions ) on ``poll: protocol failure during circuit creation''.

Missing symbols when linking.
The most common source of missing symbols is a failure of the mpich configure step to determine how to pass command line arguments to Fortran. Check the output of the configure step for any error messages or warnings about building the Fortran libraries. If you do not require Fortran, reconfigure mpich using the configure option --disable-f77 and remake mpich. If you need Fortran and cannot figure out how to make mpich work with Fortran, send a bug report to [email protected].

Another common problem with programs that mix Fortran and C is missing libraries. The mpich configure attempts to determine the libraries that are necessary when linking C with Fortran, but may miss some. There are additional suggestions for this problem in Section Problems compiling or linking Fortran programs .

{ SIGSEGV}.
Any message that mentions SIGSEGV is refering to a ``segmentation violation'' during program execution. This is usually due to an error in the user's program, such as an array overwrite or use of an uninitialized variable in referencing storage.




Up: In Case of Trouble Next: Troubleshooting Shared Libraries Previous: Submitting bug reports