RE: megatest test failure/MPI problem

From: Meij, Henk (hmeij_at_wesleyan.edu)
Date: Wed Oct 15 2008 - 13:26:52 CDT

under openmpi those all point to one binary file (see below). what's puzzling is i specify the host file on the command line, it's like pgm is looking elsewhere. following your lead, i build a simple LSF job trying to submit the pgm program via a LSF submission and get the same error. in this approach, LSF preps the hosts file. and that should all work right out of the box, it is for other jobs. i can also invoke mpirun_ssh from command line using other programs like amber.

is there anybody on this list familiar with openmpi and the megatest? the archive only mentions mvapich and infiniband setups.

-Henk

[root_at_swallowtail NAMD]# ls -l /share/apps/openmpi-1.2/bin/mpirun*
lrwxrwxrwx 1 root root 7 Aug 6 2007 /share/apps/openmpi-1.2/bin/mpirun -> orterun
lrwxrwxrwx 1 root root 7 Jan 8 2008 /share/apps/openmpi-1.2/bin/mpirun_ssh -> orterun

[root_at_swallowtail NAMD]# file /share/apps/openmpi-1.2/bin/orterun
/share/apps/openmpi-1.2/bin/orterun: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.4.0, dynamically linked (uses shared libs), not stripped

> -----Original Message-----
> From: Axel Kohlmeyer [mailto:akohlmey_at_cmm.chem.upenn.edu]
> Sent: Wednesday, October 15, 2008 11:50 AM
> To: Meij, Henk
> Cc: namd-l_at_ks.uiuc.edu
> Subject: Re: namd-l: megatest test failure/MPI problem
>
> On Wed, 15 Oct 2008, Meij, Henk wrote:
>
> henk,
>
> please have a look at what is in mpirun_ssh.
> this looks like some kind of wrapper script and i'd expect
> that the error message you are seeing originates from there.
>
> can you run some "hello world"-style MPI program the same way
> without errors?
>
> cheers,
> axel.
>
>
> HM> cluster: redhat linux AS4 x86_64 with 2.6.9-34 kernel
> HM> namd: 2.6 source, trying to compile linux-amd64-MPI with gcc
> HM> mpi: 2 flavors (topspin infiniband libs came with
> cluster), openmpi (1.2 compiled with gigE and Infiniband libs).
> HM>
> HM> i'm trying to pass the megatest and detail my steps
> below. when i get to invoke pgm i run into a problem that i
> do not encounter when invoking other problems. seems basic
> but i can not find a way out. (invoking mpirun directly as
> i'm running LSF 6.2).
> HM>
> HM> -Henk
> HM>
> HM>
> HM> pwd
> HM> /share/apps/NAMD
> HM> tar zxvf /share/apps/src/fftw-linux-amd64.tar.gz
> HM> vi fftw/linux-amd64/arch/Linux-amd64.fftw # fix path tar zxvf
> HM> /share/apps/src/tcl-linux-amd64.tar.gz
> HM> vi tcl/linux-amd64/arch/Linux-amd64.fftw # fix path tar zxvf
> HM> /share/apps/src/NAMD_2.6_Source.tar.gz
> HM> cd NAMD_2.6_Source/
> HM> not edits in arch/Linux-amd64-MPI.arch cd charm-5.9/ vi
> HM> src/arch/mpi-linux-amd64/conv-mach.sh # point to Topspin's or
> HM> Openmpi's mpirun /usr/local/topspin/mpi/mpich/bin/mpiCC -show
> HM> 2>/dev/null | cut -d' ' -f1 # returns g++
> HM> /share/apps/openmpi-1.2/bin/mpiCC -show 2>/dev/null | cut
> -d' ' -f1
> HM> # returns g++ # no changes in src/arch/common/ ./build charm++
> HM> mpi-linux-amd64 # charm++ built successfully.
> HM> cd mpi-linux-amd64/tests/charm++/megatest/
> HM> make # no errors
> HM>
> HM> # first attempt, missing libs using Topspin
> [root_at_swallowtail NAMD]#
> HM> echo $LD_LIBRARY_PATH
> HM>
> /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/lib:/usr/local/topspin/mpi/
> HM> mpich/lib64 [root_at_swallowtail megatest]# ldd pgm
> HM> libmpich.so =>
> /usr/local/topspin/mpi/mpich/lib64/libmpich.so (0x0000002a95557000)
> HM> libdl.so.2 => /lib64/libdl.so.2 (0x0000003684000000)
> HM> libmpi_cxx.so.0 => not found
> HM> libmpi.so.0 => /opt/lam/gnu/lib/libmpi.so.0
> (0x0000002a97797000)
> HM> libopen-rte.so.0 => not found
> HM> libopen-pal.so.0 => not found
> HM> librt.so.1 => /lib64/tls/librt.so.1 (0x0000003689000000)
> HM> libnuma.so.1 => /usr/lib64/libnuma.so.1
> (0x0000002a9790f000)
> HM> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003686d00000)
> HM> libutil.so.1 => /lib64/libutil.so.1 (0x0000003688600000)
> HM> libm.so.6 => /lib64/tls/libm.so.6 (0x00000034d3600000)
> HM> libstdc++.so.6 => /usr/lib64/libstdc++.so.6
> (0x00000034d3800000)
> HM> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003687b00000)
> HM> libpthread.so.0 => /lib64/tls/libpthread.so.0
> (0x0000003684400000)
> HM> libc.so.6 => /lib64/tls/libc.so.6 (0x0000003683b00000)
> HM> libg2c.so.0 => /usr/lib64/libg2c.so.0 (0x00000039aa100000)
> HM> libvapi.so =>
> /usr/local/topspin/mpi/mpich/lib64/libvapi.so (0x0000002a97a17000)
> HM> libmosal.so =>
> /usr/local/topspin/mpi/mpich/lib64/libmosal.so (0x0000002a97b37000)
> HM> /lib64/ld-linux-x86-64.so.2 (0x0000003683900000)
> HM>
> HM> # second attempt with OpenMPI
> HM> [root_at_swallowtail megatest]# echo $LD_LIBRARY_PATH
> HM>
> /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/lib:/share/apps/openmpi-1.2
> HM> /lib [root_at_swallowtail megatest]# ldd ./pgm
> HM> libmpich.so =>
> /usr/local/topspin/mpi/mpich/lib64/libmpich.so (0x0000002a95576000)
> HM> libdl.so.2 => /lib64/libdl.so.2 (0x0000003684000000)
> HM> libmpi_cxx.so.0 =>
> /share/apps/openmpi-1.2/lib/libmpi_cxx.so.0 (0x0000002a97797000)
> HM> libmpi.so.0 =>
> /share/apps/openmpi-1.2/lib/libmpi.so.0 (0x0000002a978ba000)
> HM> libopen-rte.so.0 =>
> /share/apps/openmpi-1.2/lib/libopen-rte.so.0 (0x0000002a97a4e000)
> HM> libopen-pal.so.0 =>
> /share/apps/openmpi-1.2/lib/libopen-pal.so.0 (0x0000002a97ba7000)
> HM> librt.so.1 => /lib64/tls/librt.so.1 (0x0000003689000000)
> HM> libnuma.so.1 => /usr/lib64/libnuma.so.1
> (0x0000002a97d03000)
> HM> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003686d00000)
> HM> libutil.so.1 => /lib64/libutil.so.1 (0x0000003688600000)
> HM> libm.so.6 => /lib64/tls/libm.so.6 (0x00000034d3600000)
> HM> libstdc++.so.6 => /usr/lib64/libstdc++.so.6
> (0x00000034d3800000)
> HM> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003687b00000)
> HM> libpthread.so.0 => /lib64/tls/libpthread.so.0
> (0x0000003684400000)
> HM> libc.so.6 => /lib64/tls/libc.so.6 (0x0000003683b00000)
> HM> libg2c.so.0 => /usr/lib64/libg2c.so.0 (0x00000039aa100000)
> HM> libvapi.so =>
> /usr/local/topspin/mpi/mpich/lib64/libvapi.so (0x0000002a97e0b000)
> HM> libmosal.so =>
> /usr/local/topspin/mpi/mpich/lib64/libmosal.so (0x0000002a97f2b000)
> HM> /lib64/ld-linux-x86-64.so.2 (0x0000003683900000)
> HM>
> HM> # run pgm on infiniband enabled node, create a file with
> 4 lines of node name 'compute-1-1'
> HM> # using OpenMPI
> HM>
> HM> [root_at_swallowtail megatest]#
> /share/apps/openmpi-1.2/bin/mpirun_ssh
> HM> -np 4 -machinefile
> HM>
> /share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/cha
> HM> rm++/megatest/nodelist.txt
> HM>
> /share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/cha
> HM> rm++/megatest/pgm
> HM> Can't read MPIRUN_HOST
> HM> Can't read MPIRUN_HOST
> HM> Can't read MPIRUN_HOST
> HM> Can't read MPIRUN_HOST
> HM> [root_at_swallowtail megatest]# cat
> HM>
> /share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/cha
> HM> rm++/megatest/nodelist.txt
> HM> compute-1-1
> HM> compute-1-1
> HM> compute-1-1
> HM> compute-1-1
> HM>
> HM>
> HM>
> HM>
> HM>
>
> --
> ==============================================================
> =========
> Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu
> http://www.cmm.upenn.edu
> Center for Molecular Modeling -- University of Pennsylvania
> Department of Chemistry, 231 S.34th Street, Philadelphia, PA
> 19104-6323
> tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel:
> 1-215-898-5425
> ==============================================================
> =========
> If you make something idiot-proof, the universe creates a
> better idiot.
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:59 CST