RE: megatest test failure/MPI problem

From: Meij, Henk (hmeij_at_wesleyan.edu)
Date: Wed Oct 22 2008 - 10:00:05 CDT

Hi All,

just wanted to followup on my problem/resolution of this posting. and a thank to Axel Kohlmeyer for pointing me in the right way. problem was i was mixing openmpi/topspin flavors.

i cleaned up my environment and succesfully compiled pgm/namd2 using Topsin for my infiniband nodes. passed the megatest and ran some test programs via my LSF scheduler.

i might, like Axel, redo this in the near future and compile another version with openmpi pulling in libs from OFED so that it could run over different switches as my Topspin only supports the Cisco switch.

Thanks for the support.

-Henk

=====
Wesleyan University
860.685.3783

________________________________
From: Meij, Henk
Sent: Wednesday, October 15, 2008 11:15 AM
To: 'namd-l_at_ks.uiuc.edu'
Subject: megatest test failure/MPI problem

cluster: redhat linux AS4 x86_64 with 2.6.9-34 kernel
namd: 2.6 source, trying to compile linux-amd64-MPI with gcc
mpi: 2 flavors (topspin infiniband libs came with cluster), openmpi (1.2 compiled with gigE and Infiniband libs).

i'm trying to pass the megatest and detail my steps below. when i get to invoke pgm i run into a problem that i do not encounter when invoking other problems. seems basic but i can not find a way out. (invoking mpirun directly as i'm running LSF 6.2).

-Henk

pwd
/share/apps/NAMD
tar zxvf /share/apps/src/fftw-linux-amd64.tar.gz
vi fftw/linux-amd64/arch/Linux-amd64.fftw # fix path
tar zxvf /share/apps/src/tcl-linux-amd64.tar.gz
vi tcl/linux-amd64/arch/Linux-amd64.fftw # fix path
tar zxvf /share/apps/src/NAMD_2.6_Source.tar.gz
cd NAMD_2.6_Source/
not edits in arch/Linux-amd64-MPI.arch
cd charm-5.9/
vi src/arch/mpi-linux-amd64/conv-mach.sh # point to Topspin's or Openmpi's mpirun
/usr/local/topspin/mpi/mpich/bin/mpiCC -show 2>/dev/null | cut -d' ' -f1 # returns g++
/share/apps/openmpi-1.2/bin/mpiCC -show 2>/dev/null | cut -d' ' -f1 # returns g++
# no changes in src/arch/common/
./build charm++ mpi-linux-amd64
# charm++ built successfully.
cd mpi-linux-amd64/tests/charm++/megatest/
make # no errors

# first attempt, missing libs using Topspin
[root_at_swallowtail NAMD]# echo $LD_LIBRARY_PATH
/opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/lib:/usr/local/topspin/mpi/mpich/lib64
[root_at_swallowtail megatest]# ldd pgm
        libmpich.so => /usr/local/topspin/mpi/mpich/lib64/libmpich.so (0x0000002a95557000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003684000000)
        libmpi_cxx.so.0 => not found
        libmpi.so.0 => /opt/lam/gnu/lib/libmpi.so.0 (0x0000002a97797000)
        libopen-rte.so.0 => not found
        libopen-pal.so.0 => not found
        librt.so.1 => /lib64/tls/librt.so.1 (0x0000003689000000)
        libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000002a9790f000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003686d00000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003688600000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x00000034d3600000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000034d3800000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003687b00000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003684400000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003683b00000)
        libg2c.so.0 => /usr/lib64/libg2c.so.0 (0x00000039aa100000)
        libvapi.so => /usr/local/topspin/mpi/mpich/lib64/libvapi.so (0x0000002a97a17000)
        libmosal.so => /usr/local/topspin/mpi/mpich/lib64/libmosal.so (0x0000002a97b37000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003683900000)

# second attempt with OpenMPI
[root_at_swallowtail megatest]# echo $LD_LIBRARY_PATH
/opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/lib:/share/apps/openmpi-1.2/lib
[root_at_swallowtail megatest]# ldd ./pgm
        libmpich.so => /usr/local/topspin/mpi/mpich/lib64/libmpich.so (0x0000002a95576000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003684000000)
        libmpi_cxx.so.0 => /share/apps/openmpi-1.2/lib/libmpi_cxx.so.0 (0x0000002a97797000)
        libmpi.so.0 => /share/apps/openmpi-1.2/lib/libmpi.so.0 (0x0000002a978ba000)
        libopen-rte.so.0 => /share/apps/openmpi-1.2/lib/libopen-rte.so.0 (0x0000002a97a4e000)
        libopen-pal.so.0 => /share/apps/openmpi-1.2/lib/libopen-pal.so.0 (0x0000002a97ba7000)
        librt.so.1 => /lib64/tls/librt.so.1 (0x0000003689000000)
        libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000002a97d03000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003686d00000)
        libutil.so.1 => /lib64/libutil.so.1 (0x0000003688600000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x00000034d3600000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000034d3800000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003687b00000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000003684400000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000003683b00000)
        libg2c.so.0 => /usr/lib64/libg2c.so.0 (0x00000039aa100000)
        libvapi.so => /usr/local/topspin/mpi/mpich/lib64/libvapi.so (0x0000002a97e0b000)
        libmosal.so => /usr/local/topspin/mpi/mpich/lib64/libmosal.so (0x0000002a97f2b000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003683900000)

# run pgm on infiniband enabled node, create a file with 4 lines of node name 'compute-1-1'
# using OpenMPI

[root_at_swallowtail megatest]# /share/apps/openmpi-1.2/bin/mpirun_ssh -np 4
-machinefile
/share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/charm++/megatest/nodelist.txt
/share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/charm++/megatest/pgm
Can't read MPIRUN_HOST
Can't read MPIRUN_HOST
Can't read MPIRUN_HOST
Can't read MPIRUN_HOST
[root_at_swallowtail megatest]# cat
/share/apps/NAMD/NAMD_2.6_Source/charm-5.9/mpi-linux-amd64/tests/charm++/megatest/nodelist.txt
compute-1-1
compute-1-1
compute-1-1
compute-1-1

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:00 CST