Re: namd at ranger(tacc)

From: Sándor Kovács (skovacs_at_wustl.edu)
Date: Thu Feb 24 2011 - 16:46:41 CST

Hi Lei,

I too have just started running new NAMD jobs at Ranger using the
scripts found at /share/home/00288/tg455591/NAMD_scripts/
I have no trouble starting up and running these, but one did exit
prematurely yesterday with the following error (parsed from the log
file):

SMD 2860000 34.4911 -32.609 37.017 -198.135 0 0
WRITING COORDINATES TO DCD FILE AT STEP 2860000
WRITING COORDINATES TO RESTART FILE AT STEP 2860000
FATAL ERROR: Cannot open file
'OFMO_CsmABCL_UP_SOLV_runSMD512_1.restart.coor' in PDB::write.:
Interrupted system call
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: Cannot open file
'OFMO_CsmABCL_UP_SOLV_runSMD512_1.restart.coor' in PDB::write.:
Interrupted system call

[0] Stack Traceback:
   [0:0] _Z8NAMD_errPKc+0xa3 [0x4e8f45]
   [0:1] _ZN3PDB5writeEPKcS1_+0x150 [0x97bb2a]
   [0:2] _ZN6Output10coordinateEiiP6VectorP11FloatVectorR7Lattice
+0x2c6 [0x941bac]
   [0:3]
_ZN24CkIndex_CollectionMaster39_call_receivePositions_CollectVectorMsgEPvP16CollectionMaster
+0x141 [0x4fcfd7]
   [0:4] _Z15_processHandlerPvP11CkCoreState+0x55b [0xa4e743]
   [0:5] CsdScheduler+0x424 [0xb18288]
   [0:6] _ZN7BackEnd7suspendEv+0xb [0x4f5a27]
   [0:7] _ZN9ScriptTcl7Tcl_runEPvP10Tcl_InterpiPPc+0x11d [0x9af9ed]
   [0:8] TclInvokeStringCommand+0x91 [0xb50ed8]
   [0:9] /share/home/00288/tg455591/NAMD_2.7b3_Linux-x86_64-ibverbs-
Ranger/namd2 [0xb86d28]
   [0:10] Tcl_EvalEx+0x176 [0xb8736b]
   [0:11] Tcl_EvalFile+0x134 [0xb7ed74]
   [0:12] _ZN9ScriptTcl3runEPc+0x13 [0x9ae861]
   [0:13] main+0x259 [0x4ed489]
   [0:14] __libc_start_main+0xdb [0x3a47a1c3fb]
   [0:15] _ZNSt8ios_base4InitD1Ev+0x42 [0x4e80aa]
Fatal error on PE 0> FATAL ERROR: Cannot open file
'OFMO_CsmABCL_UP_SOLV_runSMD512_1.restart.coor' in PDB::write.:
Interrupted system call

I too would be indebted to any assistance in explaining these issues
(so they could be avoided in future runs).

Thanks,
Sándor

On Feb 24, 2011, at 12:56 PM, Lei Shi wrote:

> Has anyone run into problems like me to launch new namd jobs at
> ranger(tacc) in recent two days, using the namd described in (http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdAtTexas
> )?
> My jobs quickly failed (the simulation system and qsub script have
> been working for months). The error message is like below, which
> does not tell much:
> ----------
> TACC: Starting up job 1833721
> TACC: Setting up parallel environment for MVAPICH ssh-based mpirun.
> TACC: Setup complete. Running job script.
> TACC: starting parallel tasks...
>
> Child exited abnormally!
> Killing remote processes...DONE
> TACC: MPI job exited with code: 1
> TACC: Shutting down parallel environment.
> TACC: Shutdown complete. Exiting.
> ---------
>
> I suspect there might be some recent changes of the "parallel
> environment", which are beyond my capability to detect. Can the
> guy(s) in charge of tg455591 help (e.g., run some tests)?
>
> Many Thanks!
> Lei
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:56:41 CST