NAMD not starting calculations...

From: Jimmy Tang (jtang_at_tchpc.tcd.ie)
Date: Thu Feb 16 2006 - 08:00:34 CST

Hi,

I've been having some problems building/running a custom build of NAMD
on our opteron cluster.

I've followed the instructions on namd's website, and I've edited the
relavent files for charm and NAMD (linked against the provided libraries
of the website), everything seems to compile with out errors and NAMD
compiles without errors.

When I run namd2 on a test case that one of our users wants to run, NAMD
seems to stop just before it starts its calculations (phase8).

I was wondering if anyone has successfully compiled charm/namd2 (the
current versions) against an infiniband setup running mvapich on the
amd64 platform?

Anyway here is some information on how I built namd2 on our cluster...

(our system defaults to a custom compiled mvapich install using
pathscale compilers which was also used to build charm/namd2)

==============================================================================
charm++ -- conv-mach.sh

CMK_REAL_COMPILER=`mpicxx -show 2>/dev/null | cut -d' ' -f1 `
case "$CMK_REAL_COMPILER" in
g++) CMK_AMD64="-m64 -fPIC" ;;
esac

CMK_CPP_CHARM="/lib/cpp -P"
CMK_CPP_C="mpicc -E"
CMK_CC="mpicc $CMK_AMD64 "
CMK_CXX="mpicxx $CMK_AMD64 "
CMK_CXXPP="mpicxx -E $CMK_AMD64 "

CMK_SYSLIBS="-lmpich "
CMK_LIBS="-lckqt $CMK_SYSLIBS "
CMK_LD_LIBRARY_PATH="-Wl,-rpath,$CHARMLIBSO/"

CMK_NATIVE_CC="mpicc $CMK_AMD64 "
CMK_NATIVE_CC="mpicc $CMK_AMD64 "
CMK_NATIVE_LD="mpicc $CMK_AMD64 "
CMK_NATIVE_CXX="mpicxx $CMK_AMD64 "
CMK_NATIVE_LDXX="mpicxx $CMK_AMD64 "
CMK_NATIVE_LIBS=""

# fortran compiler
CMK_CF77="f77"
CMK_CF90="f90"
#CMK_F90LIBS="-L/usr/absoft/lib -L/opt/absoft/lib -lf90math -lfio -lU77
-lf77math "
CMK_F90LIBS="-lf90math -lfio -lU77 -lf77math "
CMK_F77LIBS="-lg2c "
CMK_MOD_NAME_ALLCAPS=1
CMK_MOD_EXT="mod"
CMK_F90_USE_MODDIR=1
CMK_F90_MODINC="-p"

CMK_QT='generic64'
CMK_RANLIB="ranlib"
==============================================================================
followed by...

./build charm++ mpi-linux-amd64 --no-build-shared -O

for namd2 edited the config files to point to the correct locations of
the dowloaded libraries and ran...

./config tcl fftw Linux-amd64-MPI

after which, i executed the test case via an interactive qsub...

mpiexec -verbose -np 4 /scratch/namd2 run.namd

which gives the following...

==============================================================================
        some of the output
==============================================================================
mpiexec: resolve_exe: using absolute exe "./namd2".
mpiexec: concurrent_init: old master died, reusing his fifo as master.
mpiexec: wait_task_start: start evt 2 task 0 on
iitac309.ib.tchpc.tcd.ie.
mpiexec: wait_task_start: start evt 3 task 1 on
iitac309.ib.tchpc.tcd.ie.
mpiexec: wait_task_start: start evt 4 task 2 on
iitac308.ib.tchpc.tcd.ie.
mpiexec: wait_task_start: start evt 5 task 3 on
iitac308.ib.tchpc.tcd.ie.
mpiexec: All 4 tasks started.
read_ib_startup_ports: waiting for checkins
read_ib_startup_ports: version 3 startup
read_ib_startup_ports: rank 0 checked in, 3 left
read_ib_startup_ports: rank 1 checked in, 2 left
read_ib_startup_ports: rank 2 checked in, 1 left
read_ib_startup_ports: rank 3 checked in, 0 left
read_ib_startup_ports: barrier start
read_ib_startup_ports: barrier done
wait_tasks: waiting for iitac309.ib.tchpc.tcd.ie
iitac309.ib.tchpc.tcd.ie iitac308.ib.tchpc.tcd.ie
iitac308.ib.tchpc.tcd.ie
Info: NAMD 2.6b1 for Linux-amd64-MPI
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
Info:
Info: Please cite Kale et al., J. Comp. Phys. 151:283-312 (1999)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 50900 for mpi-linux-amd64
Info: Built Thu Feb 16 11:23:46 GMT 2006 by root on
login01.ib.tchpc.tcd.ie
Info: Sending usage information to NAMD developers via UDP. Sent data
is:
Info: 1 NAMD 2.6b1 Linux-amd64-MPI 4 iitac309.tchpc.tcd.ie jtang
Info: Running on 4 processors.
Info: 140813 kB of memory in use.
Info: Configuration file is run.namd
TCL: Suspending until startup complete.
Warning: The parameter fullElectFrequency now defaults to nonbondedFreq
(10) rather than stepsPerCycle.
Info: SIMULATION PARAMETERS:
Info: TIMESTEP 1
Info: NUMBER OF STEPS 1000
Info: STEPS PER CYCLE 10
Info: PERIODIC CELL BASIS 1 62.23 0 0
Info: PERIODIC CELL BASIS 2 0 62.23 0
Info: PERIODIC CELL BASIS 3 0 0 62.23
...
....not needed info....
...
Info: RELATIVE IMPRECISION IN VDWB TABLE FORCE: 1.43481e-15 AT 8.94078
Info: Entering startup phase 8 with 161182 kB of memory in use.
Info: Finished startup with 164625 kB of memory in use.
...
.... the program stops here ....
...
==============================================================================

The test case does work, i tested it with the downloadable precompiled
binaries and it gets past phase8 with no problems so its not the input
files causing problems.

one thing that i did notice was namd2 would run at 100% cpu usage and it
seemed to update the FFTW_NAMD_2.6b1_Linux-amd64-MPI.txt file every so
often.

any advice on how to compile up namd2/charm against an infiniband setup
would be greatly appreciated.

Thanks,
Jim.

-- 
Jimmy Tang
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin.
http://www.tchpc.tcd.ie/

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:41:38 CST