Re: megatest test failure/MPI problem

From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Wed Oct 15 2008 - 10:50:14 CDT

On Wed, 15 Oct 2008, Meij, Henk wrote:

henk,

please have a look at what is in mpirun_ssh.
this looks like some kind of wrapper script
and i'd expect that the error message you are
seeing originates from there.

can you run some "hello world"-style MPI program
the same way without errors?

cheers,
   axel.

HM> cluster: redhat linux AS4 x86_64 with 2.6.9-34 kernel
HM> namd: 2.6 source, trying to compile linux-amd64-MPI with gcc
HM> mpi: 2 flavors (topspin infiniband libs came with cluster), openmpi (1.2 compiled with gigE and Infiniband libs).
HM>
HM> i'm trying to pass the megatest and detail my steps below. when i get to invoke pgm i run into a problem that i do not encounter when invoking other problems. seems basic but i can not find a way out. (invoking mpirun directly as i'm running LSF 6.2).
HM>
HM> -Henk
HM>
HM>
HM> pwd
HM> /share/apps/NAMD
HM> tar zxvf /share/apps/src/fftw-linux-amd64.tar.gz
HM> vi fftw/linux-amd64/arch/Linux-amd64.fftw # fix path
HM> tar zxvf /share/apps/src/tcl-linux-amd64.tar.gz
HM> vi tcl/linux-amd64/arch/Linux-amd64.fftw # fix path
HM> tar zxvf /share/apps/src/NAMD_2.6_Source.tar.gz
HM> cd NAMD_2.6_Source/
HM> not edits in arch/Linux-amd64-MPI.arch
HM> cd charm-5.9/
HM> vi src/arch/mpi-linux-amd64/conv-mach.sh # point to Topspin's or Openmpi's mpirun
HM> /usr/local/topspin/mpi/mpich/bin/mpiCC -show 2>/dev/null | cut -d' ' -f1 # returns g++
HM> /share/apps/openmpi-1.2/bin/mpiCC -show 2>/dev/null | cut -d' ' -f1 # returns g++
HM> # no changes in src/arch/common/
HM> ./build charm++ mpi-linux-amd64
HM> # charm++ built successfully.
HM> cd mpi-linux-amd64/tests/charm++/megatest/
HM> make # no errors
HM>
HM> # first attempt, missing libs using Topspin
HM> [root_at_swallowtail NAMD]# echo $LD_LIBRARY_PATH
HM> /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/lib:/usr/local/topspin/mpi/mpich/lib64
HM> [root_at_swallowtail megatest]# ldd pgm
HM> libmpich.so => /usr/local/topspin/mpi/mpich/lib64/libmpich.so (0x0000002a95557000)
HM> libdl.so.2 => /lib64/libdl.so.2 (0x0000003684000000)
HM> libmpi_cxx.so.0 => not found
HM> libmpi.so.0 => /opt/lam/gnu/lib/libmpi.so.0 (0x0000002a97797000)
HM> libopen-rte.so.0 => not found
HM> libopen-pal.so.0 => not found
HM> librt.so.1 => /lib64/tls/librt.so.1 (0x0000003689000000)
HM> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000002a9790f000)
HM> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003686d00000)
HM> libutil.so.1 => /lib64/libutil.so.1 (0x0000003688600000)
HM> libm.so.6 => /lib64/tls/libm.so.6 (0x00000034d3600000)
HM> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000034d3800000)
HM> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003687b00000)
HM> libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003684400000)
HM> libc.so.6 => /lib64/tls/libc.so.6 (0x0000003683b00000)
HM> libg2c.so.0 => /usr/lib64/libg2c.so.0 (0x00000039aa100000)
HM> libvapi.so => /usr/local/topspin/mpi/mpich/lib64/libvapi.so (0x0000002a97a17000)
HM> libmosal.so => /usr/local/topspin/mpi/mpich/lib64/libmosal.so (0x0000002a97b37000)
HM> /lib64/ld-linux-x86-64.so.2 (0x0000003683900000)
HM>
HM> # second attempt with OpenMPI
HM> [root_at_swallowtail megatest]# echo $LD_LIBRARY_PATH
HM> /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/lib:/share/apps/openmpi-1.2/lib
HM> [root_at_swallowtail megatest]# ldd ./pgm
HM> libmpich.so => /usr/local/topspin/mpi/mpich/lib64/libmpich.so (0x0000002a95576000)
HM> libdl.so.2 => /lib64/libdl.so.2 (0x0000003684000000)
HM> libmpi_cxx.so.0 => /share/apps/openmpi-1.2/lib/libmpi_cxx.so.0 (0x0000002a97797000)
HM> libmpi.so.0 => /share/apps/openmpi-1.2/lib/libmpi.so.0 (0x0000002a978ba000)
HM> libopen-rte.so.0 => /share/apps/openmpi-1.2/lib/libopen-rte.so.0 (0x0000002a97a4e000)
HM> libopen-pal.so.0 => /share/apps/openmpi-1.2/lib/libopen-pal.so.0 (0x0000002a97ba7000)
HM> librt.so.1 => /lib64/tls/librt.so.1 (0x0000003689000000)
HM> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000002a97d03000)
HM> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003686d00000)
HM> libutil.so.1 => /lib64/libutil.so.1 (0x0000003688600000)
HM> libm.so.6 => /lib64/tls/libm.so.6 (0x00000034d3600000)
HM> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000034d3800000)
HM> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003687b00000)
HM> libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003684400000)
HM> libc.so.6 => /lib64/tls/libc.so.6 (0x0000003683b00000)
HM> libg2c.so.0 => /usr/lib64/libg2c.so.0 (0x00000039aa100000)
HM> libvapi.so => /usr/local/topspin/mpi/mpich/lib64/libvapi.so (0x0000002a97e0b000)
HM> libmosal.so => /usr/local/topspin/mpi/mpich/lib64/libmosal.so (0x0000002a97f2b000)
HM> /lib64/ld-linux-x86-64.so.2 (0x0000003683900000)
HM>
HM> # run pgm on infiniband enabled node, create a file with 4 lines of node name 'compute-1-1'
HM> # using OpenMPI
HM>
HM> [root_at_swallowtail megatest]# /share/apps/openmpi-1.2/bin/mpirun_ssh -np 4
HM> -machinefile
HM> /share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/charm++/megatest/nodelist.txt
HM> /share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/charm++/megatest/pgm
HM> Can't read MPIRUN_HOST
HM> Can't read MPIRUN_HOST
HM> Can't read MPIRUN_HOST
HM> Can't read MPIRUN_HOST
HM> [root_at_swallowtail megatest]# cat
HM> /share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/charm++/megatest/nodelist.txt
HM> compute-1-1
HM> compute-1-1
HM> compute-1-1
HM> compute-1-1
HM>
HM>
HM>
HM>
HM>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:58 CST