/lib/dld.sl: Bind-on-reference call failed /lib/dld.sl: Invalid argument(This example is from HP-UX), or
ld.so: libc.so.2: not found(This example is from SunOS 4.1; similar things happen on other systems).
A:
The problem here is that your program is using shared libraries, and the
libraries are not available on some of the machines that you are running on.
To fix this, relink your program without the shared libraries. To do this,
add the appropriate command-line options to the link step. For example, for
the HP system that produced the errors above, the fix is to use
-Wl,-Bimmediate to the link step. For Solaris, the appropriate option
is -Bstatic.
A:
We have seen this problem with installations using AFS. The remote shell
program, rsh, supplied with some AFS systems limits the number
of jobs that can use standard output. This seems to prevent some of the
processes from exiting as well, causing the job to hang. There are four
possible fixes:
2. Use the secure server (serv_p4). See the discussion in the Users
Guide.
3. Redirect all standard output to a file. The MPE routine
MPE_IO_Stdout_to_file may be used to do this.
4. Get a fixed rsh command. The likely source of the problem is an
incorrect usage of the select system call in the rsh command.
If the code is doing something like
int mask; mask |= 1 << fd; select( fd+1, &mask, ... );instead of
fd_set mask; FD_SET(fd,&mask); select( fd+1, &mask, ... );then the code is incorrect (the select call changed to allow more than 32 file descriptors many years ago, and the rsh program (or programmer!) hasn't changed with the times).
2. Q:
Not all processes start.
A:
This can happen when using the ch_p4 device and a system that has
extremely small limits on the number of remote shells you can have. Some
systems using ``Kerberos'' (a network security package) allow only three or
four remote shells; on these systems, the size of MPI_COMM_WORLD will
be limited to the same number (plus one if you are using the local host).
The only way around this is to try the secure server; this is documented in the mpich installation guide. Note that you will have to start the servers ``by hand'' since the chp4_servs script uses remote shell to start the servers.