RE: megatest test failure/MPI problem

From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Wed Oct 15 2008 - 14:45:16 CDT

On Wed, 15 Oct 2008, Meij, Henk wrote:

HM> under openmpi those all point to one binary file (see below).
HM> what's puzzling is i specify the host file on the command line, it's
HM> like pgm is looking elsewhere. following your lead, i build a
HM> simple LSF job trying to submit the pgm program via a LSF submission
HM> and get the same error. in this approach, LSF preps the hosts file.
HM> and that should all work right out of the box, it is for other jobs.
HM> i can also invoke mpirun_ssh from command line using other programs
HM> like amber.

HM> is there anybody on this list familiar with openmpi and the
HM> megatest? the archive only mentions mvapich and infiniband setups.

i _am_ using OpenMPI and don't have any problems, but i also
don't have a mpirun_ssh, so it may be that you have a customized
installation. i just recently installed the current version
of openmpi (1.2.7) on multiple clusters and compiled, installed
and tested charm++ and namd on top of it without ever seeing
the errors that you see. it just works on our machines.

since megatest is part of charm++, perhaps you should ask the
charm++ developers for help?

cheers,
   axel.

p.s: i now see that you are running as root. this is soemthing
you should _never_ _ever_ do. LAM/MPI actually will refuse to
run parallel jobs as root.

p.p.s.:

[akohlmey_at_delta ~]$ ls -l /cmm/pkg/openmpi-1.2.7/bin/
total 304
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpic++ -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpicc -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpiCC -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpicxx -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 7 Aug 28 22:11 mpiexec -> orterun
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpif77 -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 12 Aug 28 22:11 mpif90 -> opal_wrapper
lrwxrwxrwx 1 akohlmey cmm 7 Aug 28 22:11 mpirun -> orterun
-rwxr-xr-x 1 akohlmey cmm 155835 Aug 28 22:11 ompi_info
-rwxr-xr-x 1 akohlmey cmm 19651 Aug 28 22:10 opal_wrapper
-rwxr-xr-x 1 akohlmey cmm 26771 Aug 28 22:10 orted
-rwxr-xr-x 1 akohlmey cmm 94060 Aug 28 22:10 orterun

[akohlmey_at_delta megatest]$ pwd
/home/akohlmey/compile/charm/mpi-linux-x86_64-mpicxx/tests/charm++/megatest

[akohlmey_at_delta megatest]$ make
./../../bin/charmc -c megatest.ci
./../../bin/charmc -o megatest.o megatest.C
./../../bin/charmc -c groupring.ci

[...]

./../../bin/charmc -o callback.o callback.C
./../../bin/charmc -o pgm megatest.o groupring.o nodering.o
varsizetest.o varraystest.o groupcast.o nodecast.o synctest.o fib.o
arrayring.o tempotest.o packtest.o queens.o migration.o marshall.o
priomsg.o priotest.o rotest.o statistics.o templates.o inherit.o
reduction.o bitvector.o immediatering.o callback.o -language charm++

[akohlmey_at_delta megatest]$ mpirun --mca btl tcp,self -np 8 ./pgm
Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
Megatest is running on 8 nodes 8 processors.
test 0: initiated [callback (olawlor)]
test 0: completed (0.00 sec)
test 1: initiated [immediatering (gengbin)]
test 1: completed (0.26 sec)
test 2: initiated [bitvector (jbooth)]
test 2: completed (0.00 sec)

[...]

test 42: initiated [multi groupring (milind)]
test 42: completed (0.06 sec)
test 43: initiated [all-at-once]
test 43: completed (0.28 sec)
All tests completed, exiting
End of program
[akohlmey_at_delta megatest]$

HM>
HM> -Henk HM>
HM> [root_at_swallowtail NAMD]# ls -l /share/apps/openmpi-1.2/bin/mpirun*
HM> lrwxrwxrwx 1 root root 7 Aug 6 2007 /share/apps/openmpi-1.2/bin/mpirun -> orterun
HM> lrwxrwxrwx 1 root root 7 Jan 8 2008 /share/apps/openmpi-1.2/bin/mpirun_ssh -> orterun
HM>
HM> [root_at_swallowtail NAMD]# file /share/apps/openmpi-1.2/bin/orterun
HM> /share/apps/openmpi-1.2/bin/orterun: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped
HM>
HM>
HM>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:48:28 CST