RE: megatest test failure/MPI problem

From: Axel Kohlmeyer (
Date: Wed Oct 15 2008 - 14:45:16 CDT

On Wed, 15 Oct 2008, Meij, Henk wrote:

HM> under openmpi those all point to one binary file (see below).
HM> what's puzzling is i specify the host file on the command line, it's
HM> like pgm is looking elsewhere. following your lead, i build a
HM> simple LSF job trying to submit the pgm program via a LSF submission
HM> and get the same error. in this approach, LSF preps the hosts file.
HM> and that should all work right out of the box, it is for other jobs.
HM> i can also invoke mpirun_ssh from command line using other programs
HM> like amber.

HM> is there anybody on this list familiar with openmpi and the
HM> megatest? the archive only mentions mvapich and infiniband setups.

i _am_ using OpenMPI and don't have any problems, but i also
don't have a mpirun_ssh, so it may be that you have a customized
installation. i just recently installed the current version
of openmpi (1.2.7) on multiple clusters and compiled, installed
and tested charm++ and namd on top of it without ever seeing
the errors that you see. it just works on our machines.

since megatest is part of charm++, perhaps you should ask the
charm++ developers for help?


p.s: i now see that you are running as root. this is soemthing
you should _never_ _ever_ do. LAM/MPI actually will refuse to
run parallel jobs as root.


[akohlmey_at_delta ~]$ ls -l /cmm/pkg/openmpi-1.2.7/bin/
total 304
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpic++ -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpicc -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpiCC -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpicxx -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 7 Aug 28 22:11 mpiexec -> orterun
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpif77 -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpif90 -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 7 Aug 28 22:11 mpirun -> orterun
-rwxr-xr-x 1 akohlmey cmm 155835 Aug 28 22:11 ompi_info
-rwxr-xr-x 1 akohlmey cmm 19651 Aug 28 22:10 opal_wrapper
-rwxr-xr-x 1 akohlmey cmm 26771 Aug 28 22:10 orted
-rwxr-xr-x 1 akohlmey cmm 94060 Aug 28 22:10 orterun

[akohlmey_at_delta megatest]$ pwd

[akohlmey_at_delta megatest]$ make
./../../bin/charmc -c
./../../bin/charmc -o megatest.o megatest.C
./../../bin/charmc -c


./../../bin/charmc -o callback.o callback.C
./../../bin/charmc -o pgm megatest.o groupring.o nodering.o
varsizetest.o varraystest.o groupcast.o nodecast.o synctest.o fib.o
arrayring.o tempotest.o packtest.o queens.o migration.o marshall.o
priomsg.o priotest.o rotest.o statistics.o templates.o inherit.o
reduction.o bitvector.o immediatering.o callback.o -language charm++

[akohlmey_at_delta megatest]$ mpirun --mca btl tcp,self -np 8 ./pgm
Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
Megatest is running on 8 nodes 8 processors.
test 0: initiated [callback (olawlor)]
test 0: completed (0.00 sec)
test 1: initiated [immediatering (gengbin)]
test 1: completed (0.26 sec)
test 2: initiated [bitvector (jbooth)]
test 2: completed (0.00 sec)


test 42: initiated [multi groupring (milind)]
test 42: completed (0.06 sec)
test 43: initiated [all-at-once]
test 43: completed (0.28 sec)
All tests completed, exiting
End of program
[akohlmey_at_delta megatest]$

HM> -Henk HM>
HM> [root_at_swallowtail NAMD]# ls -l /share/apps/openmpi-1.2/bin/mpirun*
HM> lrwxrwxrwx 1 root root 7 Aug 6 2007 /share/apps/openmpi-1.2/bin/mpirun -> orterun
HM> lrwxrwxrwx 1 root root 7 Jan 8 2008 /share/apps/openmpi-1.2/bin/mpirun_ssh -> orterun
HM> [root_at_swallowtail NAMD]# file /share/apps/openmpi-1.2/bin/orterun
HM> /share/apps/openmpi-1.2/bin/orterun: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped

