From: Hailey Bureau (hailey.bureau_at_gmail.com)
Date: Tue Sep 10 2013 - 10:35:24 CDT
Whew! Thank you all for your replies!
I was trying to reproduce data because I am testing two different in-house codes that *should* be doing the same thing. I was able to reproduce vac & implicit data, but I became worried when I couldn't do the same using explicit solvent. Even though I was using one cpu (not GPU), same seed values, same binary, same operating system, and the same CPU model.
After reading & rereading your replies, I am now less convinced that I will see reproducible data. I do understand that it is statistics that matter with MD simulations, but I still was trying to see if I could reproduce data for checking what's going on between the codes I am trying to test. It still bothers me that I do see reproducible data in vac and implicit solvent and not in explicit solvent, but as of now I am chalking it up to the fact that explicit solvent systems are simply much more complex and require many more calculations, therefore leaving room for more numerical differences. However, I'm still trying to reconcile this in my head :)
Thanks for your time and patience!
-Hailey
On Sep 9, 2013, at 2:46 PM, Kenno Vanommeslaeghe <kvanomme_at_rx.umaryland.edu> wrote:
> On 09/09/2013 02:02 PM, Axel Kohlmeyer wrote:
>> - and you need to enforce a consistent rounding mode that does not
>> allow denormalized numbers. otherwise you can still have some
>> randomness creep in when you have numbers that very, very close but
>> not quite zero (like 1e-300). this one, however, is extremely
>> unlikely. but then again, if you run with a very large number of
>> atoms...
>
> That shouldn't matter. As long as the CPU is built from binary logic gates and not buggy or defective, a calculation with a denormalized result (or even a signed zero) should always yield the same denormalized result / signed zero. Again provided that the CPU model is the same.
>
>> - and if you run on a multi-core CPU, you have to set processor
>> affinity so that the process doesn't get bounced around to a different
>> CPU core.
>
> Same thing, shouldn't matter as long as all the cores have the same FPUs. Maybe one day heterogeneous CPUs will invalidate this assumption, but for now, we can perfectly get reproducible trajectories with a single core CHARMM job bouncing around between cores.
>
>> - and you have to make certain, that there are no actions that are
>> triggered by the amount of time passed or other unexpected
>> things like signal handlers etc.
>
> This would require these time-triggered actions / signal handlers to be a bit tricky as to change the order in which the simulation's associative math is done (or consume random numbers from the same PRNG as the simulation). I'm pretty sure this is not the case for CHARMM but I'm less sure about NAMD.
>
>> at this point, writing an MD kernel in fixed point math almost seems
>> like the easier undertaking... ;-)
>
> Let's say that in an imaginary world where reproducibility of the exact trajectories were important, the assumptions in my previous message are not reasonable/convenient, so one would be forced to either write MD kernels in fixed point, or use something like STREFLOP and eliminate all sources of differences in associative math. Which one would be *easier* is a not a trivial question. As I understand from reading very old papers, it took a while for the MD community to master the numerical analysis aspects of what they were doing and satisfactorily mitigate loss of significance issues. This whole exercise would need to be repeated in a fixed-point context, and it would presumable be even more difficult.
>
> But I presume that's part of the reason why you said "almost"? ;-)
>
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:39 CST