Re: ABF Simulation

From: Tyler Luchko (tluchko_at_ualberta.ca)
Date: Mon Jun 26 2006 - 19:45:02 CDT

Hello,

Has the issue with the unknown floating-point error been resolved? I
have been running into the same problem using TclForces rather than
of ABF. I have, however, been using Jerome's vectors.tcl. I also
added the {}s to this file as suggested previously in this thread.

The error is completely unpredictable (occurs anywhere from the first
few time steps until 10000+) and happens where ever there is an
[expr ...] (not just vectors.tcl). A sample error looks like:

Info: Pairlistdist is too small for 74 computes during timestep 8589.
Info: 74 pairlist warnings in past 1 steps.
TCL: unknown floating-point error, errno = 4
FATAL ERROR: unknown floating-point error, errno = 4
     while executing
"expr {sqrt($retval)}"
     (procedure "veclength" line 6)
     invoked from within
"veclength $pos2D"
     (procedure "calcforces" line 68)
     invoked from within
"calcforces"

I have also used a catch statement to catch the error in one part of
the code and print the offending variables. These, however, look
normal.

The final interesting fact is that the problem is platform
dependent. On our AMD64 cluster running Scientific Linux everything
is fine. On another cluster running CentOS with Xeon 3.0 GHz CPUs I
see the problems stated above.

Any suggestions would be greatly appreciated.

Thank you,

Tyler

________________________________________________________________
Tyler Luchko
Ph.D. Candidate
Department of Physics, University of Alberta
Theory and Modeling, National Institute for Nanotechnology
Edmonton, Alberta, Canada
tluchko_at_ualberta.ca
780-492-5519 <- NEW

On 2-Nov-05, at 7:28 AM, Lionel Perrin wrote:

> Hi Jerome,
>
> thank's lot once more, the simulation is now running with your
> modified
> vectors.tcl.
> I will tell you the outcome of this try.
>
> all the best,
>
> Lionel
>
> Le ven 28/10/2005 à 11:05, Jérôme Hénin a écrit :
>> Lionel, Jim,
>>
>> I discussed the issue with a few core Tcl developpers. This seems
>> to be
>> related to corrupted data ending up in the variables used by ABF.
>> As one of
>> these people put it: "something out-of-thread is fouling your
>> state". Jim, do
>> you have any idea where this could come from?
>>
>> You'll notice that these errors appear in the vector routines
>> defined in
>> vector.tcl, which is borrowed to VMD. In that file, expr
>> statements are not
>> protected by curly braces. The Tcl guys recommend that the
>> arguments to expr
>> always be enclosed in braces. Doing this might prevent crashes,
>> but if the
>> data is really corrupted, I suppose some erroneous behavior is to
>> be expected
>> anyway.
>>
>> Lionel, I attach a version of vectors.tcl with curly braces
>> inserted where
>> applicable, so that you can try and see if it solves anything.
>>
>> Jerome
>>
>>
>> On Wednesday 26 October 2005 17:52, Lionel Perrin wrote:
>>> Jerome,
>>>
>>> I have changed the indexes and restarted the simulation,
>>> after 1.6ns of propagation, I got a floating-point error :
>>>
>>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1618000
>>> WRITING COORDINATES TO DCD FILE AT STEP 1618000
>>> WRITING COORDINATES TO RESTART FILE AT STEP 1618000
>>> FINISHED WRITING RESTART COORDINATES
>>> WRITING VELOCITIES TO RESTART FILE AT STEP 1618000
>>> FINISHED WRITING RESTART VELOCITIES
>>> LDB: LOAD: AVG 12.143 MAX 12.3366 MSGS: TOTAL 78 MAXC 12 MAXP
>>> 4 None
>>> LDB: LOAD: AVG 12.143 MAX 12.3366 MSGS: TOTAL 78 MAXC 12 MAXP 4
>>> Refine
>>> TCL: unknown floating-point error, errno = 4
>>> FATAL ERROR: unknown floating-point error, errno = 4
>>> while executing
>>> "expr -$i"
>>> (procedure "vecinvert" line 4)
>>> invoked from within
>>> "vecinvert $F2"
>>> (in namespace eval "::ABF::ABFcoord" script line 12)
>>> invoked from within
>>> "namespace eval ABFcoord {
>>>
>>> set dr [vecsub $coords($abf2) $coords($abf1)]
>>> set r [veclength $dr]
>>> set nv [vecnorm $dr] ;# unity vector abf1 -> abf2
>>>
>>> ..."
>>> (procedure "ABFapply" line 5)
>>> invoked from within
>>> "ABFapply $F"
>>> (in namespace eval "::ABF" script line 84)
>>> invoked from within
>>> "namespace eval ::ABF {
>>>
>>> # First timestep : we don't have forces
>>> if { $timestep == 0 } {
>>>
>>> # must not be equal to $timestep - 1
>>> set timeStored -2
>>> ..."
>>> (procedure "calcforces" line 2)
>>> invoked from within
>>> "calcforces"
>>>
>>> here is my ABF input section (the rest of my input has remained
>>> unchanged with respect to my previous trials) :
>>>
>>> source /usr/local/NAMD_2.6b1_Linux-i686/lib/abf/abf.tcl
>>> abf coordinate distance
>>> abf abf1 2472
>>> abf abf2 3233
>>> abf ximin 3.0
>>> abf ximax 13.0
>>> abf dxi 0.1
>>> abf dsmooth 0.2
>>> abf forceconst 10.0
>>> abf fullsamples 500
>>> abf outfile 1J8A_abf.abf
>>> abf historyfile 1J8A_abf.his
>>> abf outputfreq 5000
>>> abf writexifreq 1000
>>> abf distfile 1J8A_abf.dis
>>>
>>> sincerly,
>>>
>>> Lionel
>>>
>>> Le ven 21/10/2005 à 11:03, Lionel Perrin a écrit :
>>>> Jerome,
>>>>
>>>> Thank you for the answer, I knew about the index issue between
>>>> NAMD and
>>>> VMD but I did not pay attention that time ! :-S
>>>> I have restarted the simulation with the "good" indexes and will
>>>> tell
>>>> you the outcome of the ABF simulation.
>>>>
>>>> thank's a lot,
>>>>
>>>> Lionel
>>>>
>>>> Le mer 19/10/2005 à 11:11, Jérôme Hénin a écrit :
>>>>> Lionel,
>>>>>
>>>>> I believe that part of the problem lies in the atom indexes you
>>>>> pass to
>>>>> ABF when defining the reaction coordinate. Your files indicate
>>>>> that you
>>>>> are using atom indexes given by VMD, which start at 0. However,
>>>>> NAMD
>>>>> uses the PDB convention for atom indexes, which starts at 1, so
>>>>> the
>>>>> atoms you pass to ABF are not the ones you intended. This is a
>>>>> very
>>>>> common pitfall that probably deserves to be more publicized in the
>>>>> community of NAMD+VMD users.
>>>>>
>>>>> I don't have a clear idea of why you get that Tcl floating-
>>>>> point error,
>>>>> though. Since one of the atoms is a hydrogen, it might be more
>>>>> prone to
>>>>> numerical instability than a heavy atom, but this is not really a
>>>>> satisfactory explanation to me. And since you are not constraining
>>>>> protein hydrogens, issues involving constraints can be ruled out.
>>>>>
>>>>> If anyone else experiences similar crashes, please tell me
>>>>> about it!
>>>>>
>>>>> Jerome
>>>>>
>>>>> On Thursday 06 October 2005 16:13, Lionel Perrin wrote:
>>>>>> Chris,
>>>>>>
>>>>>> The two atoms defining the reaction coordination belong to two
>>>>>> distinct molecules, X1 corresponds to the Cgamma of an
>>>>>> aspartate and
>>>>>> X2 corresponds to the C atom of the amidine function of a ligand.
>>>>>> Hence, these two centers are not "chemically" bonded.
>>>>>>
>>>>>> In this case, should I apply a force constant at the border of
>>>>>> the
>>>>>> reaction coordinate and/or and external restraint ?
>>>>>>
>>>>>> Lionel
>>>>>>
>>>>>> Le mer 05/10/2005 à 14:58, Chris Chipot a écrit :
>>>>>>> Lionel,
>>>>>>>
>>>>>>> could you tell us what atoms are involved in your reaction
>>>>>>> coordinate ? Are these atoms chemically bonded to constrained
>>>>>>> degrees of freedom ?
>>>>>>>
>>>>>>>
>>>>>>> Chris Chipot
> --
> !------------------------------------------!
> Lionel PERRIN
> Chargé de Recherche au CNRS
> DSV/DBJC/SBFM, URA 2096 du CNRS
> CEA-Saclay, build. 528, office #215
> tel : (+33) (-0)1 69 08 96 81
> fax : (+33) (-0)1 69 08 40 07
> 91191 Gif-sur-Yvette Cedex France
> !------------------------------------------!
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:16 CST