AW: floating point reproduceability

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Apr 19 2013 - 02:09:33 CDT

Hi Thomas,

 

I don’t want to stop you, but I think keeping this order constant will cause
a dramatic slow down. Additionally, the differences you see must be so
small, I don’t see the problem. You can still compare values, even if they
are not exact the same. And please remember if you are using the DCD for
your measurements, that these coordinates are 32bit anyway.

 

Mit freundlichen Grüßen

 

Norman Geist.

 

Von: Thomas Brian [mailto:thomasbrianxlii_at_gmail.com]
Gesendet: Freitag, 19. April 2013 05:54
An: Norman Geist
Betreff: Re: namd-l: floating point reproduceability

 

Thanks Norman,

I was thinking along similar lines regarding floating point order of
operations. It would be nice if you could force each patch to be added to
the force totals in the same order. I wonder how much slower it would run.
I don't know the details of how it gets programmed but I would think keeping
the order of additions the same would be produce a modest increase in run
times. I am curious if it is just a matter of changing a parallel style for
loop over the patch grid to a serial one, or something simple.

 

If you wish to know what I am doing it relates to the effect of rare events
on the ordering of a line of water molecules. Instead of waiting for these
rare events I manually impinge an extremely large random force on one
molecule and compare it to a case with a normally sized random force.
Thanks for your help.

-Thomas

 

On Thu, Apr 18, 2013 at 2:15 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:

Hi Thomas,

 

It’s not unusual that result of parallel codes differ in some of the last
digits. This happens when the distributed work is brought back together. At
this point, the order of finishing child processes cause rounding errors
because of the maximum machines precision of 64bit, or the precision that
has been

chosen by the programmer for particular variables.

 

Simplified example if I can hold only 2 digits behind the dot, also during
multiplying (notice the order of incoming results):

 

Case 1: Child1=1.29 Child2=1.01 Child3=0.03 -> Produkt=0.039 = 0.4

Case2: Child3=0.03 Child2=1.01 Child1=1.29 -> Produkt=0.0387 = 0.4

 

                IMHO, this is the way a computer works and the compiler
can’t do anything here. The question is, if you can call this bad precision.

                Additionally, NAMD uses, as far as I know, only 32bit
(single) precision for most of the work to save time as they think it
doesn’t make

                a difference (even the DCD is single precision). Maybe we
can help you, if you explain what you actually doubt or problem is.

 

Regards

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Thomas Brian
Gesendet: Donnerstag, 18. April 2013 00:09
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: floating point reproduceability

 

Hi,

Question on results reproduceability. Does anyone know if reproduceability
of results on different processors can be improved, for instance, by
changing gcc compilation options, or perhaps by some NAMD options?

 

I have compiled the mpi version according to the readme file with default
options for linux-x86_32-g++.

 

I run on a system with some Intel Nehalem E5520 cpus, and some Intel
Westmere X5650 cpus. Results are identical across machines if NAMD is run
on one thread. Results are different if using more than one. This suggests
floating point differences from order of operations? Any way to get around
this, aside from only running one thread?

 

Thanks,

Thomas

 

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:09 CST