AW: floating point reproduceability

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Apr 18 2013 - 02:15:11 CDT

Hi Thomas,

 

It's not unusual that result of parallel codes differ in some of the last
digits. This happens when the distributed work is brought back together. At
this point, the order of finishing child processes cause rounding errors
because of the maximum machines precision of 64bit, or the precision that
has been

chosen by the programmer for particular variables.

 

Simplified example if I can hold only 2 digits behind the dot, also during
multiplying (notice the order of incoming results):

 

Case 1: Child1=1.29 Child2=1.01 Child3=0.03 -> Produkt=0.039 = 0.4

Case2: Child3=0.03 Child2=1.01 Child1=1.29 -> Produkt=0.0387 = 0.4

 

                IMHO, this is the way a computer works and the compiler
can't do anything here. The question is, if you can call this bad precision.

                Additionally, NAMD uses, as far as I know, only 32bit
(single) precision for most of the work to save time as they think it
doesn't make

                a difference (even the DCD is single precision). Maybe we
can help you, if you explain what you actually doubt or problem is.

 

Regards

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Thomas Brian
Gesendet: Donnerstag, 18. April 2013 00:09
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: floating point reproduceability

 

Hi,

Question on results reproduceability. Does anyone know if reproduceability
of results on different processors can be improved, for instance, by
changing gcc compilation options, or perhaps by some NAMD options?

 

I have compiled the mpi version according to the readme file with default
options for linux-x86_32-g++.

 

I run on a system with some Intel Nehalem E5520 cpus, and some Intel
Westmere X5650 cpus. Results are identical across machines if NAMD is run
on one thread. Results are different if using more than one. This suggests
floating point differences from order of operations? Any way to get around
this, aside from only running one thread?

 

Thanks,

Thomas

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:09 CST