From: Giovanni Bellesia (gbellesia_at_chem.ucsb.edu)
Date: Mon Jun 26 2006 - 21:29:24 CDT
Hi Tyler,
I think had a similar problem a few months ago using a tclforces script
in my configuration file.
I posted somewhere on this list the solution Jim proposed to me after he
was not able to reproduce a random floating point error occuring in my
simulations.
Following his advice, I simply started using the tcp version of namd or
equivalently the option +netpoll on the command line and things are
running error-free since then (still using that script in various
simulations running for hundreds of ns).
It seems it is a hardware-related problem concerning the tcl interpreter.
Hope this can help
Giovanni
> Hello,
>
> Has the issue with the unknown floating-point error been resolved? I
> have been running into the same problem using TclForces rather than of
> ABF. I have, however, been using Jerome's vectors.tcl. I also added
> the {}s to this file as suggested previously in this thread.
>
> The error is completely unpredictable (occurs anywhere from the first
> few time steps until 10000+) and happens where ever there is an [expr
> ...] (not just vectors.tcl). A sample error looks like:
>
> Info: Pairlistdist is too small for 74 computes during timestep 8589.
> Info: 74 pairlist warnings in past 1 steps.
> TCL: unknown floating-point error, errno = 4
> FATAL ERROR: unknown floating-point error, errno = 4
> while executing
> "expr {sqrt($retval)}"
> (procedure "veclength" line 6)
> invoked from within
> "veclength $pos2D"
> (procedure "calcforces" line 68)
> invoked from within
> "calcforces"
>
> I have also used a catch statement to catch the error in one part of
> the code and print the offending variables. These, however, look normal.
>
> The final interesting fact is that the problem is platform dependent.
> On our AMD64 cluster running Scientific Linux everything is fine. On
> another cluster running CentOS with Xeon 3.0 GHz CPUs I see the
> problems stated above.
>
> Any suggestions would be greatly appreciated.
>
> Thank you,
>
> Tyler
>
> ________________________________________________________________
> Tyler Luchko
> Ph.D. Candidate
> Department of Physics, University of Alberta
> Theory and Modeling, National Institute for Nanotechnology
> Edmonton, Alberta, Canada
> tluchko_at_ualberta.ca
> 780-492-5519 <- NEW
>
>
> On 2-Nov-05, at 7:28 AM, Lionel Perrin wrote:
>
>> Hi Jerome,
>>
>> thank's lot once more, the simulation is now running with your modified
>> vectors.tcl.
>> I will tell you the outcome of this try.
>>
>> all the best,
>>
>> Lionel
>>
>> Le ven 28/10/2005 à 11:05, Jérôme Hénin a écrit :
>>> Lionel, Jim,
>>>
>>> I discussed the issue with a few core Tcl developpers. This seems to be
>>> related to corrupted data ending up in the variables used by ABF. As
>>> one of
>>> these people put it: "something out-of-thread is fouling your
>>> state". Jim, do
>>> you have any idea where this could come from?
>>>
>>> You'll notice that these errors appear in the vector routines
>>> defined in
>>> vector.tcl, which is borrowed to VMD. In that file, expr statements
>>> are not
>>> protected by curly braces. The Tcl guys recommend that the arguments
>>> to expr
>>> always be enclosed in braces. Doing this might prevent crashes, but
>>> if the
>>> data is really corrupted, I suppose some erroneous behavior is to be
>>> expected
>>> anyway.
>>>
>>> Lionel, I attach a version of vectors.tcl with curly braces inserted
>>> where
>>> applicable, so that you can try and see if it solves anything.
>>>
>>> Jerome
>>>
>>>
>>> On Wednesday 26 October 2005 17:52, Lionel Perrin wrote:
>>>> Jerome,
>>>>
>>>> I have changed the indexes and restarted the simulation,
>>>> after 1.6ns of propagation, I got a floating-point error :
>>>>
>>>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1618000
>>>> WRITING COORDINATES TO DCD FILE AT STEP 1618000
>>>> WRITING COORDINATES TO RESTART FILE AT STEP 1618000
>>>> FINISHED WRITING RESTART COORDINATES
>>>> WRITING VELOCITIES TO RESTART FILE AT STEP 1618000
>>>> FINISHED WRITING RESTART VELOCITIES
>>>> LDB: LOAD: AVG 12.143 MAX 12.3366 MSGS: TOTAL 78 MAXC 12 MAXP 4
>>>> None
>>>> LDB: LOAD: AVG 12.143 MAX 12.3366 MSGS: TOTAL 78 MAXC 12 MAXP 4
>>>> Refine
>>>> TCL: unknown floating-point error, errno = 4
>>>> FATAL ERROR: unknown floating-point error, errno = 4
>>>> while executing
>>>> "expr -$i"
>>>> (procedure "vecinvert" line 4)
>>>> invoked from within
>>>> "vecinvert $F2"
>>>> (in namespace eval "::ABF::ABFcoord" script line 12)
>>>> invoked from within
>>>> "namespace eval ABFcoord {
>>>>
>>>> set dr [vecsub $coords($abf2) $coords($abf1)]
>>>> set r [veclength $dr]
>>>> set nv [vecnorm $dr] ;# unity vector abf1 -> abf2
>>>>
>>>> ..."
>>>> (procedure "ABFapply" line 5)
>>>> invoked from within
>>>> "ABFapply $F"
>>>> (in namespace eval "::ABF" script line 84)
>>>> invoked from within
>>>> "namespace eval ::ABF {
>>>>
>>>> # First timestep : we don't have forces
>>>> if { $timestep == 0 } {
>>>>
>>>> # must not be equal to $timestep - 1
>>>> set timeStored -2
>>>> ..."
>>>> (procedure "calcforces" line 2)
>>>> invoked from within
>>>> "calcforces"
>>>>
>>>> here is my ABF input section (the rest of my input has remained
>>>> unchanged with respect to my previous trials) :
>>>>
>>>> source /usr/local/NAMD_2.6b1_Linux-i686/lib/abf/abf.tcl
>>>> abf coordinate distance
>>>> abf abf1 2472
>>>> abf abf2 3233
>>>> abf ximin 3.0
>>>> abf ximax 13.0
>>>> abf dxi 0.1
>>>> abf dsmooth 0.2
>>>> abf forceconst 10.0
>>>> abf fullsamples 500
>>>> abf outfile 1J8A_abf.abf
>>>> abf historyfile 1J8A_abf.his
>>>> abf outputfreq 5000
>>>> abf writexifreq 1000
>>>> abf distfile 1J8A_abf.dis
>>>>
>>>> sincerly,
>>>>
>>>> Lionel
>>>>
>>>> Le ven 21/10/2005 à 11:03, Lionel Perrin a écrit :
>>>>> Jerome,
>>>>>
>>>>> Thank you for the answer, I knew about the index issue between
>>>>> NAMD and
>>>>> VMD but I did not pay attention that time ! :-S
>>>>> I have restarted the simulation with the "good" indexes and will tell
>>>>> you the outcome of the ABF simulation.
>>>>>
>>>>> thank's a lot,
>>>>>
>>>>> Lionel
>>>>>
>>>>> Le mer 19/10/2005 à 11:11, Jérôme Hénin a écrit :
>>>>>> Lionel,
>>>>>>
>>>>>> I believe that part of the problem lies in the atom indexes you
>>>>>> pass to
>>>>>> ABF when defining the reaction coordinate. Your files indicate
>>>>>> that you
>>>>>> are using atom indexes given by VMD, which start at 0. However, NAMD
>>>>>> uses the PDB convention for atom indexes, which starts at 1, so the
>>>>>> atoms you pass to ABF are not the ones you intended. This is a very
>>>>>> common pitfall that probably deserves to be more publicized in the
>>>>>> community of NAMD+VMD users.
>>>>>>
>>>>>> I don't have a clear idea of why you get that Tcl floating-point
>>>>>> error,
>>>>>> though. Since one of the atoms is a hydrogen, it might be more
>>>>>> prone to
>>>>>> numerical instability than a heavy atom, but this is not really a
>>>>>> satisfactory explanation to me. And since you are not constraining
>>>>>> protein hydrogens, issues involving constraints can be ruled out.
>>>>>>
>>>>>> If anyone else experiences similar crashes, please tell me about it!
>>>>>>
>>>>>> Jerome
>>>>>>
>>>>>> On Thursday 06 October 2005 16:13, Lionel Perrin wrote:
>>>>>>> Chris,
>>>>>>>
>>>>>>> The two atoms defining the reaction coordination belong to two
>>>>>>> distinct molecules, X1 corresponds to the Cgamma of an aspartate
>>>>>>> and
>>>>>>> X2 corresponds to the C atom of the amidine function of a ligand.
>>>>>>> Hence, these two centers are not "chemically" bonded.
>>>>>>>
>>>>>>> In this case, should I apply a force constant at the border of the
>>>>>>> reaction coordinate and/or and external restraint ?
>>>>>>>
>>>>>>> Lionel
>>>>>>>
>>>>>>> Le mer 05/10/2005 à 14:58, Chris Chipot a écrit :
>>>>>>>> Lionel,
>>>>>>>>
>>>>>>>> could you tell us what atoms are involved in your reaction
>>>>>>>> coordinate ? Are these atoms chemically bonded to constrained
>>>>>>>> degrees of freedom ?
>>>>>>>>
>>>>>>>>
>>>>>>>> Chris Chipot
>> --!------------------------------------------!
>> Lionel PERRIN
>> Chargé de Recherche au CNRS
>> DSV/DBJC/SBFM, URA 2096 du CNRS
>> CEA-Saclay, build. 528, office #215
>> tel : (+33) (-0)1 69 08 96 81
>> fax : (+33) (-0)1 69 08 40 07
>> 91191 Gif-sur-Yvette Cedex France
>> !------------------------------------------!
>>
>
>
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:16 CST