Re: unexpected end with NAMD on quad-dual core opteron machine

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Thu Sep 29 2005 - 13:39:22 CDT

Hi,

I noticed the
AVG -156.723
in the output. The average load on all processors should never be
negative. I suspect the timer was giving wrong timing data.

Gengbin

JIMENEZ Ralph wrote:

>Hi everyone:
>
> I'm an amateur who has started running NAMD recently. The version is
>NAMD 2.6b1 for Linux-amd64 (pre-compiled binaries). My job (an ~ 100 aa
>protein in a solvent box, with periodic conditions; 11878 atoms) quits
>prematurely at ~ 34K steps, without an explicit error message. I'm not
>sure how to interpret the end of the log file (below). It looks like a
>problem distributing the load amongst processors. The computer has quad
>dual-core opterons, so in principle I think 8 CPUs should be available.
>
>This job was started with 4 CPUs (charmrun namd2 ++local +p4 <config>) One
>namd2 process was left hanging indefinitely when the other three quit.
>In general, NAMD doesn't seemed to work with > 4 CPUs on this machine.
>
>Can anyone provide me with some leads? Please let me know if I should
>provide more information...
>
> Thanks,
>Ralph Jimenez
>
>
>LDB: LOAD: AVG -156.723 MAX 481.61 MSGS: TOTAL 52 MAXC 15 MAXP 3 None
>Stack Traceback:
> [0] /lib64/tls/libc.so.6 [0x3817a2e410]
> [1]
>_ZN10Rebalancer13refine_togridERA3_A3_NS_6pcpairEdP13processorInfoP11computeInfo+0x4c
> [0x6505ac]
> [2] _ZN10Rebalancer6refineEv+0x270 [0x64ef18]
> [3] _ZN10Rebalancer11multirefineEv+0x1dc [0x64eb44]
> [4] _ZN10RefineOnlyC9EP11computeInfoP9patchInfoP13processorInfoiii+0x83
>[0x655a3b]
> [5] _ZN10RefineOnlyC1EP11computeInfoP9patchInfoP13processorInfoiii+0x13
>[0x655aab]
> [6] _ZN10NamdCentLB8StrategyEPN6BaseLB7LDStatsEi+0x482 [0x60f3fa]
> [7] _ZN9CentralLB11LoadBalanceEv+0x215 [0x7231c5]
> [8] _ZN17CkIndex_CentralLB22_call_LoadBalance_voidEPvP9CentralLB+0x1c
>[0x7272ac]
> [9] CkDeliverMessageFree+0x30 [0x6cfe38]
> [10] _Z15_processHandlerPvP11CkCoreState+0x44a [0x6d24ca]
> [11] CmiHandleMessage+0x26 [0x73b1ae]
> [12] CsdScheduleForever+0x4b [0x73b30b]
> [13] CsdScheduler+0x1c [0x73c98c]
> [14] _ZN7BackEnd7suspendEv+0xe [0x4a1536]
> [15] _ZN9ScriptTcl7Tcl_runEPvP10Tcl_InterpiPPc+0x164 [0x656c5c]
> [16] TclInvokeStringCommand+0x91 [0x758d78]
> [17] TclExecuteByteCode+0x856 [0x77365f]
> [18] Tcl_EvalObjEx+0x2bb [0x75978b]
> [19] Tcl_ForObjCmd+0xb6 [0x75eb8d]
> [20] /usr/local/bin/namd2 [0x78ebc8]
> [21] Tcl_EvalEx+0x176 [0x78f20b]
> [22] Tcl_EvalFile+0x134 [0x786c14]
> [23] _ZN9ScriptTcl3runEPc+0x1c [0x656294]
> [24] main+0x222 [0x49dae2]
> [25] __libc_start_main+0xdb [0x3817a1c4bb]
> [26] _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_c+0x5a [0x49a4aa]
>
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:58 CST