Re: ABF Simulation Crash from TCL script

From: Jérôme Hénin (jhenin_at_ifr88.cnrs-mrs.fr)
Date: Wed Mar 31 2010 - 04:55:00 CDT

Hi Patrick,

Unfortunately, you have encountered one of a few mysterious NAMD/Tcl
bugs for which I do not know a fix or workaround. It seems to stem
from some sort of memory corruption, and I have never been able to
locate the faulty code either in the Tcl interpreter or in NAMD
itself.

Now, this would only happen when using the deprecated ABF code that
was written for NAMD 2.6. In any case, switching to the newer
"collective variables module" is strongly recommended. Are you sure
that you are using the version of the tutorial which is currently
online? That one should use the colvars module, and not suffer from
Tcl-related limitations. Note that the text and files suffer from
small inconsistencies (due to the transition from Tcl-ABF to colvars),
but we are working hard right now to fix them. Hopefully, an updated
version should go online very soon.

Best,
Jerome

On 31 March 2010 00:10, Patrick Yee <pyee_at_nd.edu> wrote:
> Hi NAMD users,
>
> I'm trying to find the PMF of an ion pair in water using the ABF
> method. I'm having problems with jobs crashing when I try running ABF
> simulations. The jobs seem to crash when I submit them to a node with
> multiple cores. However, if I try running NAMD as a background process
> on my workstation, the job runs fine, though it takes about 10 times
> as long to complete. Does anyone have any suggestions or know
> potential problems with what I'm doing?
>
> The abf.tcl script as well as the other ABF scripts are the same as
> those found on http://www.ks.uiuc.edu/~char/tutorials/ABF/. The
> equilibration of the system is fine, but adding the following causes
> crashes when I submit jobs.
>
> source               ../../Tutorial-ABF/abf-1.8/abf.tcl
> abf coordinate       distance-com
> abf abf1             {1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
> 20 21 22 23 24 25}
> abf abf2             {26 27 28 29 30 31 32 33 34 35 36 37 38 39 40}
> abf ximin            0.05
> abf ximax            15.95
> abf dxi              0.2
> abf fullSamples      4000
> abf forceConst       12.0
> abf writeXiFreq      10000
> abf outfile          abf_Bmim_Tf2N.abf
>
> The error I get in my log file is:
>
> TCL: can't use floating-point value as operand of "*"
> FATAL ERROR: can't use floating-point value as operand of "*"
>    while executing
> "expr {$retval + $term * $term}"
>    (procedure "veclength2" line 4)
>    invoked from within
> "veclength2 $v"
>    (procedure "veclength" line 2)
>    invoked from within
> "veclength $dr"
>    (in namespace eval "::ABF::ABFcoord" script line 4)
>    invoked from within
> "namespace eval ABFcoord {
>
>        set dr  [vecsub $coords($group2) $coords($group1)]
>        set r   [veclength $dr]
>        set nv  [vecnorm $dr] ;# unity vector group1 -> ..."
>    (procedure "ABFapply" line 5)
>    invoked from within
> "ABFapply $F"
>    (in namespace eval "::ABF" script line 87)
>    invoked from within
> "namespace eval ::ABF {
>
> # First timestep : we don't have forces
> if { $timestep == 0 } {
>
>        # must not be equal to $timestep - 1
>        set timeStored -2
>    ..."
>    (procedure "calcforces" line 2)
>    invoked from within
> "calcforces"
> Stack Traceback:
>  [0] CmiAbort+0x4f  [0x7eb405]
>  [1] _Z8NAMD_diePKc+0x62  [0x4b31e2]
>  [2] _ZN15GlobalMasterTcl9calculateEv+0x295  [0x65e723]
>  [3] _ZN12GlobalMaster11processDataEPiS0_P6VectorS2_S2_S0_S0_S2_S0_S0_S2_+0x71
>  [0x6524cd]
>  [4] _ZN18GlobalMasterServer11callClientsEv+0x43e  [0x6564aa]
>  [5] _ZN18GlobalMasterServer8recvDataEP20ComputeGlobalDataMsg+0x6f7  [0x655b1b]
>  [6] _ZN10ComputeMgr21recvComputeGlobalDataEP20ComputeGlobalDataMsg+0x12
>  [0x52039a]
>  [7] _ZN18CkIndex_ComputeMgr48_call_recvComputeGlobalData_ComputeGlobalDataMsgEPvP10ComputeMgr+0xf
>  [0x520385]
>  [8] CkDeliverMessageFree+0x21  [0x786a6b]
>  [9] _Z15_processHandlerPvP11CkCoreState+0x455  [0x786075]
>  [10] CsdScheduleForever+0xa2  [0x7f18a2]
>  [11] CsdScheduler+0x1c  [0x7f14a0]
>  [12] _ZN7BackEnd7suspendEv+0xb  [0x4bab01]
>  [13] _ZN9ScriptTcl3runEPc+0x15b  [0x6fbae5]
>  [14] main+0x21b  [0x4b69c3]
>  [15] __libc_start_main+0xf4  [0x32b121d994]
>  [16] _ZNSt8ios_base4InitD1Ev+0x3a  [0x4b2b5a]
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:58 CST