Re: floating point reproduceability

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Fri Apr 19 2013 - 03:02:11 CDT

On Fri, Apr 19, 2013 at 6:28 AM, Thomas Brian <thomasbrianxlii_at_gmail.com> wrote:
> Thanks Norman,
> I was thinking along similar lines regarding floating point order of
> operations. It would be nice if you could force each patch to be added to
> the force totals in the same order. I wonder how much slower it would run.
> I don't know the details of how it gets programmed but I would think keeping
> the order of additions the same would be produce a modest increase in run
> times. I am curious if it is just a matter of changing a parallel style for
> loop over the patch grid to a serial one, or something simple.

no. there are two fundamental problems to make MD trajectories exactly
reproducible.

a) floating point math does not commute (and to make matters worse, on
x86 the regular floating point unit has 80-bit registers and does all
operations in 80-bit unless you enforce rounding to 64 or 32-bit after
every step). so not only would you have to change the code in all
kinds of places (neighbor list construction, force kernels, domain
decomposition) to enforce a consistent order, but also you have to
disable all kinds of compiler optimizations (vectorization, loop
unrolling, strength reduction, precomputation of invariant operations,
keeping numbers in registers) which will obviously result in a
*massive* peformance loss. i would estimate that this would be at
least one order of magnitude. the only way out of this is to build an
MD code with fixed point math, which is non-trivial.

b) but even if you address option a), there is still the problem that
MD is based on numerically solving coupled linear differential
equations, i.e. something that is in essence a chaotic system. even
the tiniest change will grow exponentially and thus make your
calculation irreversible and not bitwise reproducible unless you stick
with plain NVE time integration without any external interaction (e.g.
a thermostat, variable cell, random manipulations).

the fundamental problem here is that even assuming that individual MD
trajectories are reproducible is a mistake. MD results are only
*meant* to be reproducible in the form of ensemble averages (or time
averages for systems in equilibrium, or averages over equivalent
trajectories).

axel.

> If you wish to know what I am doing it relates to the effect of rare events
> on the ordering of a line of water molecules. Instead of waiting for these
> rare events I manually impinge an extremely large random force on one
> molecule and compare it to a case with a normally sized random force.
> Thanks for your help.
>
>
> On Thu, Apr 18, 2013 at 2:15 AM, Norman Geist
> <norman.geist_at_uni-greifswald.de> wrote:
>>
>> Hi Thomas,
>>
>>
>>
>> It’s not unusual that result of parallel codes differ in some of the last
>> digits. This happens when the distributed work is brought back together. At
>> this point, the order of finishing child processes cause rounding errors
>> because of the maximum machines precision of 64bit, or the precision that
>> has been
>>
>> chosen by the programmer for particular variables.
>>
>>
>>
>> Simplified example if I can hold only 2 digits behind the dot, also during
>> multiplying (notice the order of incoming results):
>>
>>
>>
>> Case 1: Child1=1.29 Child2=1.01 Child3=0.03 -> Produkt=0.039 = 0.4
>>
>> Case2: Child3=0.03 Child2=1.01 Child1=1.29 -> Produkt=0.0387 = 0.4
>>
>>
>>
>> IMHO, this is the way a computer works and the compiler
>> can’t do anything here. The question is, if you can call this bad precision.
>>
>> Additionally, NAMD uses, as far as I know, only 32bit
>> (single) precision for most of the work to save time as they think it
>> doesn’t make
>>
>> a difference (even the DCD is single precision). Maybe we
>> can help you, if you explain what you actually doubt or problem is.
>>
>>
>>
>> Regards
>>
>>
>>
>> Norman Geist.
>>
>>
>>
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
>> von Thomas Brian
>> Gesendet: Donnerstag, 18. April 2013 00:09
>> An: namd-l_at_ks.uiuc.edu
>> Betreff: namd-l: floating point reproduceability
>>
>>
>>
>> Hi,
>>
>> Question on results reproduceability. Does anyone know if
>> reproduceability of results on different processors can be improved, for
>> instance, by changing gcc compilation options, or perhaps by some NAMD
>> options?
>>
>>
>>
>> I have compiled the mpi version according to the readme file with default
>> options for linux-x86_32-g++.
>>
>>
>>
>> I run on a system with some Intel Nehalem E5520 cpus, and some Intel
>> Westmere X5650 cpus. Results are identical across machines if NAMD is run
>> on one thread. Results are different if using more than one. This suggests
>> floating point differences from order of operations? Any way to get around
>> this, aside from only running one thread?
>>
>>
>>
>> Thanks,
>>
>> Thomas
>
>

--
Dr. Axel Kohlmeyer  akohlmey_at_gmail.com  http://goo.gl/1wk0
International Centre for Theoretical Physics, Trieste. Italy.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:09 CST