NAMD fails on Altix

From: Margaret Kahn (Margaret.Kahn_at_anu.edu.au)
Date: Wed Jan 11 2006 - 19:50:44 CST

One of our users on the SGI Altix (Itanium processors, using MPT) is
seeing his NAMD jobs fail after about 4 hours on 8 processors and the
traceback from the failure is as follows:

received signal SIGSEGV(11)

MPI: --------stack traceback-------
Internal Error: Can't read/write file "/dev/mmtimer", (errno = 22)
MPI: Intel(R) Debugger for Itanium(R) -based Applications, Version
8.1-14,
Build 20051006
MPI: Reading symbolic information from /opt/namd-2.5/bin/namd2...done
MPI: Attached to process id 29931 ....
MPI: stopped at [0xa000000000010641]
MPI: >0 0xa000000000010641
MPI: #1 0x200000000418ccc0 in __libc_waitpid(...) in /lib/tls/
libc.so.6.1
MPI: #2 0x20000000001ba700 in MPI_SGI_stacktraceback(...)
in /opt/mpt-1.12/lib/libmpi.so
MPI: #3 0x20000000001bb3e0 in slave_sig_handler(...)
in /opt/mpt-1.12/lib/libmpi.so
MPI: #4 0xa0000000000107e0
MPI: #5 0x40000000008e5e70 in _Z11CWeb_ReducePv(...)
in /opt/namd-2.5/bin/namd2
MPI: #6 0x4000000000892390 in CmiHandleMessage(...)
in /opt/namd-2.5/bin/namd2
MPI: #7 0x4000000000892a60 in CsdScheduleForever(...)
in /opt/namd-2.5/bin/namd2
MPI: #8 0x4000000000892960 in CsdScheduler(...) in /opt/namd-2.5/bin/
namd2
MPI: #9 0x40000000000bbrokedown7ff0 in _ZN7BackEnd7suspendEv(...)
in /opt/namd-2.5/bin/namd2
MPI: #10 0x40000000004c3660 in _ZN9ScriptTcl7suspendEv(...)
in /opt/namd-2.5/bin/namd2
MPI: #11 0x40000000004c39e0 in _ZN9ScriptTcl13runControllerEi(...)
in /opt/namd-2.5/bin/namd2
MPI: #12 0x40000000004c6e50 in
_ZN9ScriptTcl7Tcl_runEPvP10Tcl_InterpiPPc(...)
in /opt/namd-2.5/bin/namd2
MPI: #13 0x2000000003c19d20 in TclInvokeStringCommand(...)
in /usr/lib/libtcl8.4.so
MPI: #14 0x2000000003ca0650 in TclEvalObjvInternal(...)
in /usr/lib/libtcl8.4.so
MPI: #15 0x2000000003cc9e10 in TclExecuteByteCode(...)
in /usr/lib/libtcl8.4.so
MPI: #16 0x2000000003cd4510 in TclCompEvalObj(...) in /usr/lib/
libtcl8.4.so
MPI: #17 0x2000000003ca20a0 in Tcl_EvalObjEx(...) in /usr/lib/
libtcl8.4.so
MPI: #18 0x2000000003ca8860 in Tcl_ForeachObjCmd(...) in /usr/lib/
libtcl8.4.so
MPI: #19 0x2000000003ca0650 in TclEvalObjvInternal(...)
in /usr/lib/libtcl8.4.so
MPI: #20 0x2000000003ca0eb0 in Tcl_EvalEx(...) in /usr/lib/libtcl8.4.so
MPI: #21 0x2000000003c385f0 in Tcl_FSEvalFile(...) in /usr/lib/
libtcl8.4.so
MPI: #22 0x2000000003c7a950 in Tcl_EvalFile(...) in /usr/lib/
libtcl8.4.so
MPI: #23 0x40000000004c3330 in _ZN9ScriptTcl3runEPc(...)
in /opt/namd-2.5/bin/namd2
MPI: #24 0x40000000000ac930 in main(...) in /opt/namd-2.5/bin/namd2
MPI: #25 0x20000000040ad850 in __libc_start_main(...) in /lib/tls/
libc.so.6.1
MPI: #26 0x40000000000a6bc0 in _start(...) in /opt/namd-2.5/bin/namd2

This was first using namd-2.5. We have since installed namd-2.6b1 and
built a separate tcl8.3 library as we only had tcl8.4 however the job
still fails at the same place.

We would appreciate any suggestions as to how to set about solving
this problem.

  Thanks in advance,

    Margaret Kahn

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:41:31 CST