Re: ABF Simulation

From: Tyler Luchko (tluchko_at_ualberta.ca)
Date: Thu Jun 29 2006 - 00:11:42 CDT

Hi Giovanni,

This did the trick. The TCP version is very slow but using +netpoll
gives me the same performance I was seeing before.

Thanks,

Tyler

On 26-Jun-06, at 8:29 PM, Giovanni Bellesia wrote:

> Hi Tyler,
> I think had a similar problem a few months ago using a tclforces
> script in my configuration file.
> I posted somewhere on this list the solution Jim proposed to me
> after he was not able to reproduce a random floating point error
> occuring in my simulations.
> Following his advice, I simply started using the tcp version of
> namd or equivalently the option +netpoll on the command line and
> things are running error-free since then (still using that script
> in various simulations running for hundreds of ns).
> It seems it is a hardware-related problem concerning the tcl
> interpreter.
>
> Hope this can help
>
> Giovanni
>> Hello,
>>
>> Has the issue with the unknown floating-point error been
>> resolved? I have been running into the same problem using
>> TclForces rather than of ABF. I have, however, been using Jerome's
>> vectors.tcl. I also added the {}s to this file as suggested
>> previously in this thread.
>>
>> The error is completely unpredictable (occurs anywhere from the
>> first few time steps until 10000+) and happens where ever there is
>> an [expr ...] (not just vectors.tcl). A sample error looks like:
>>
>> Info: Pairlistdist is too small for 74 computes during timestep 8589.
>> Info: 74 pairlist warnings in past 1 steps.
>> TCL: unknown floating-point error, errno = 4
>> FATAL ERROR: unknown floating-point error, errno = 4
>> while executing
>> "expr {sqrt($retval)}"
>> (procedure "veclength" line 6)
>> invoked from within
>> "veclength $pos2D"
>> (procedure "calcforces" line 68)
>> invoked from within
>> "calcforces"
>>
>> I have also used a catch statement to catch the error in one part
>> of the code and print the offending variables. These, however,
>> look normal.
>>
>> The final interesting fact is that the problem is platform
>> dependent. On our AMD64 cluster running Scientific Linux
>> everything is fine. On another cluster running CentOS with Xeon
>> 3.0 GHz CPUs I see the problems stated above.
>>
>> Any suggestions would be greatly appreciated.
>>
>> Thank you,
>>
>> Tyler
>>
>> ________________________________________________________________
>> Tyler Luchko
>> Ph.D. Candidate
>> Department of Physics, University of Alberta
>> Theory and Modeling, National Institute for Nanotechnology
>> Edmonton, Alberta, Canada
>> tluchko_at_ualberta.ca
>> 780-492-5519 <- NEW
>>
>>
>> On 2-Nov-05, at 7:28 AM, Lionel Perrin wrote:
>>
>>> Hi Jerome,
>>>
>>> thank's lot once more, the simulation is now running with your
>>> modified
>>> vectors.tcl.
>>> I will tell you the outcome of this try.
>>>
>>> all the best,
>>>
>>> Lionel
>>>
>>> Le ven 28/10/2005 à 11:05, Jérôme Hénin a écrit :
>>>> Lionel, Jim,
>>>>
>>>> I discussed the issue with a few core Tcl developpers. This
>>>> seems to be
>>>> related to corrupted data ending up in the variables used by
>>>> ABF. As one of
>>>> these people put it: "something out-of-thread is fouling your
>>>> state". Jim, do
>>>> you have any idea where this could come from?
>>>>
>>>> You'll notice that these errors appear in the vector routines
>>>> defined in
>>>> vector.tcl, which is borrowed to VMD. In that file, expr
>>>> statements are not
>>>> protected by curly braces. The Tcl guys recommend that the
>>>> arguments to expr
>>>> always be enclosed in braces. Doing this might prevent crashes,
>>>> but if the
>>>> data is really corrupted, I suppose some erroneous behavior is
>>>> to be expected
>>>> anyway.
>>>>
>>>> Lionel, I attach a version of vectors.tcl with curly braces
>>>> inserted where
>>>> applicable, so that you can try and see if it solves anything.
>>>>
>>>> Jerome
>>>>
>>>>
>>>> On Wednesday 26 October 2005 17:52, Lionel Perrin wrote:
>>>>> Jerome,
>>>>>
>>>>> I have changed the indexes and restarted the simulation,
>>>>> after 1.6ns of propagation, I got a floating-point error :
>>>>>
>>>>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1618000
>>>>> WRITING COORDINATES TO DCD FILE AT STEP 1618000
>>>>> WRITING COORDINATES TO RESTART FILE AT STEP 1618000
>>>>> FINISHED WRITING RESTART COORDINATES
>>>>> WRITING VELOCITIES TO RESTART FILE AT STEP 1618000
>>>>> FINISHED WRITING RESTART VELOCITIES
>>>>> LDB: LOAD: AVG 12.143 MAX 12.3366 MSGS: TOTAL 78 MAXC 12 MAXP
>>>>> 4 None
>>>>> LDB: LOAD: AVG 12.143 MAX 12.3366 MSGS: TOTAL 78 MAXC 12 MAXP 4
>>>>> Refine
>>>>> TCL: unknown floating-point error, errno = 4
>>>>> FATAL ERROR: unknown floating-point error, errno = 4
>>>>> while executing
>>>>> "expr -$i"
>>>>> (procedure "vecinvert" line 4)
>>>>> invoked from within
>>>>> "vecinvert $F2"
>>>>> (in namespace eval "::ABF::ABFcoord" script line 12)
>>>>> invoked from within
>>>>> "namespace eval ABFcoord {
>>>>>
>>>>> set dr [vecsub $coords($abf2) $coords($abf1)]
>>>>> set r [veclength $dr]
>>>>> set nv [vecnorm $dr] ;# unity vector abf1 -> abf2
>>>>>
>>>>> ..."
>>>>> (procedure "ABFapply" line 5)
>>>>> invoked from within
>>>>> "ABFapply $F"
>>>>> (in namespace eval "::ABF" script line 84)
>>>>> invoked from within
>>>>> "namespace eval ::ABF {
>>>>>
>>>>> # First timestep : we don't have forces
>>>>> if { $timestep == 0 } {
>>>>>
>>>>> # must not be equal to $timestep - 1
>>>>> set timeStored -2
>>>>> ..."
>>>>> (procedure "calcforces" line 2)
>>>>> invoked from within
>>>>> "calcforces"
>>>>>
>>>>> here is my ABF input section (the rest of my input has remained
>>>>> unchanged with respect to my previous trials) :
>>>>>
>>>>> source /usr/local/NAMD_2.6b1_Linux-i686/lib/abf/abf.tcl
>>>>> abf coordinate distance
>>>>> abf abf1 2472
>>>>> abf abf2 3233
>>>>> abf ximin 3.0
>>>>> abf ximax 13.0
>>>>> abf dxi 0.1
>>>>> abf dsmooth 0.2
>>>>> abf forceconst 10.0
>>>>> abf fullsamples 500
>>>>> abf outfile 1J8A_abf.abf
>>>>> abf historyfile 1J8A_abf.his
>>>>> abf outputfreq 5000
>>>>> abf writexifreq 1000
>>>>> abf distfile 1J8A_abf.dis
>>>>>
>>>>> sincerly,
>>>>>
>>>>> Lionel
>>>>>
>>>>> Le ven 21/10/2005 à 11:03, Lionel Perrin a écrit :
>>>>>> Jerome,
>>>>>>
>>>>>> Thank you for the answer, I knew about the index issue between
>>>>>> NAMD and
>>>>>> VMD but I did not pay attention that time ! :-S
>>>>>> I have restarted the simulation with the "good" indexes and
>>>>>> will tell
>>>>>> you the outcome of the ABF simulation.
>>>>>>
>>>>>> thank's a lot,
>>>>>>
>>>>>> Lionel
>>>>>>
>>>>>> Le mer 19/10/2005 à 11:11, Jérôme Hénin a écrit :
>>>>>>> Lionel,
>>>>>>>
>>>>>>> I believe that part of the problem lies in the atom indexes
>>>>>>> you pass to
>>>>>>> ABF when defining the reaction coordinate. Your files
>>>>>>> indicate that you
>>>>>>> are using atom indexes given by VMD, which start at 0.
>>>>>>> However, NAMD
>>>>>>> uses the PDB convention for atom indexes, which starts at 1,
>>>>>>> so the
>>>>>>> atoms you pass to ABF are not the ones you intended. This is
>>>>>>> a very
>>>>>>> common pitfall that probably deserves to be more publicized
>>>>>>> in the
>>>>>>> community of NAMD+VMD users.
>>>>>>>
>>>>>>> I don't have a clear idea of why you get that Tcl floating-
>>>>>>> point error,
>>>>>>> though. Since one of the atoms is a hydrogen, it might be
>>>>>>> more prone to
>>>>>>> numerical instability than a heavy atom, but this is not
>>>>>>> really a
>>>>>>> satisfactory explanation to me. And since you are not
>>>>>>> constraining
>>>>>>> protein hydrogens, issues involving constraints can be ruled
>>>>>>> out.
>>>>>>>
>>>>>>> If anyone else experiences similar crashes, please tell me
>>>>>>> about it!
>>>>>>>
>>>>>>> Jerome
>>>>>>>
>>>>>>> On Thursday 06 October 2005 16:13, Lionel Perrin wrote:
>>>>>>>> Chris,
>>>>>>>>
>>>>>>>> The two atoms defining the reaction coordination belong to two
>>>>>>>> distinct molecules, X1 corresponds to the Cgamma of an
>>>>>>>> aspartate and
>>>>>>>> X2 corresponds to the C atom of the amidine function of a
>>>>>>>> ligand.
>>>>>>>> Hence, these two centers are not "chemically" bonded.
>>>>>>>>
>>>>>>>> In this case, should I apply a force constant at the border
>>>>>>>> of the
>>>>>>>> reaction coordinate and/or and external restraint ?
>>>>>>>>
>>>>>>>> Lionel
>>>>>>>>
>>>>>>>> Le mer 05/10/2005 à 14:58, Chris Chipot a écrit :
>>>>>>>>> Lionel,
>>>>>>>>>
>>>>>>>>> could you tell us what atoms are involved in your reaction
>>>>>>>>> coordinate ? Are these atoms chemically bonded to constrained
>>>>>>>>> degrees of freedom ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Chris Chipot
>>> --!------------------------------------------!
>>> Lionel PERRIN
>>> Chargé de Recherche au CNRS
>>> DSV/DBJC/SBFM, URA 2096 du CNRS
>>> CEA-Saclay, build. 528, office #215
>>> tel : (+33) (-0)1 69 08 96 81
>>> fax : (+33) (-0)1 69 08 40 07
>>> 91191 Gif-sur-Yvette Cedex France
>>> !------------------------------------------!
>>>
>>
>>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:16 CST