unexpected end with NAMD on quad-dual core opteron machine

From: JIMENEZ Ralph (rjimenez_at_jilau1.Colorado.EDU)
Date: Thu Sep 29 2005 - 12:59:42 CDT

Hi everyone:

  I'm an amateur who has started running NAMD recently. The version is
NAMD 2.6b1 for Linux-amd64 (pre-compiled binaries). My job (an ~ 100 aa
protein in a solvent box, with periodic conditions; 11878 atoms) quits
prematurely at ~ 34K steps, without an explicit error message. I'm not
sure how to interpret the end of the log file (below). It looks like a
problem distributing the load amongst processors. The computer has quad
dual-core opterons, so in principle I think 8 CPUs should be available.

This job was started with 4 CPUs (charmrun namd2 ++local +p4 <config>) One
namd2 process was left hanging indefinitely when the other three quit.
In general, NAMD doesn't seemed to work with > 4 CPUs on this machine.

Can anyone provide me with some leads? Please let me know if I should
provide more information...

 Thanks,
Ralph Jimenez

LDB: LOAD: AVG -156.723 MAX 481.61 MSGS: TOTAL 52 MAXC 15 MAXP 3 None
Stack Traceback:
  [0] /lib64/tls/libc.so.6 [0x3817a2e410]
  [1]
_ZN10Rebalancer13refine_togridERA3_A3_NS_6pcpairEdP13processorInfoP11computeInfo+0x4c
 [0x6505ac]
  [2] _ZN10Rebalancer6refineEv+0x270 [0x64ef18]
  [3] _ZN10Rebalancer11multirefineEv+0x1dc [0x64eb44]
  [4] _ZN10RefineOnlyC9EP11computeInfoP9patchInfoP13processorInfoiii+0x83
[0x655a3b]
  [5] _ZN10RefineOnlyC1EP11computeInfoP9patchInfoP13processorInfoiii+0x13
[0x655aab]
  [6] _ZN10NamdCentLB8StrategyEPN6BaseLB7LDStatsEi+0x482 [0x60f3fa]
  [7] _ZN9CentralLB11LoadBalanceEv+0x215 [0x7231c5]
  [8] _ZN17CkIndex_CentralLB22_call_LoadBalance_voidEPvP9CentralLB+0x1c
[0x7272ac]
  [9] CkDeliverMessageFree+0x30 [0x6cfe38]
  [10] _Z15_processHandlerPvP11CkCoreState+0x44a [0x6d24ca]
  [11] CmiHandleMessage+0x26 [0x73b1ae]
  [12] CsdScheduleForever+0x4b [0x73b30b]
  [13] CsdScheduler+0x1c [0x73c98c]
  [14] _ZN7BackEnd7suspendEv+0xe [0x4a1536]
  [15] _ZN9ScriptTcl7Tcl_runEPvP10Tcl_InterpiPPc+0x164 [0x656c5c]
  [16] TclInvokeStringCommand+0x91 [0x758d78]
  [17] TclExecuteByteCode+0x856 [0x77365f]
  [18] Tcl_EvalObjEx+0x2bb [0x75978b]
  [19] Tcl_ForObjCmd+0xb6 [0x75eb8d]
  [20] /usr/local/bin/namd2 [0x78ebc8]
  [21] Tcl_EvalEx+0x176 [0x78f20b]
  [22] Tcl_EvalFile+0x134 [0x786c14]
  [23] _ZN9ScriptTcl3runEPc+0x1c [0x656294]
  [24] main+0x222 [0x49dae2]
  [25] __libc_start_main+0xdb [0x3817a1c4bb]
  [26] _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c+0x5a [0x49a4aa]

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:41:10 CST