Re: Mindboggling Problem Related To NAMD2 Code

From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Sun Jun 01 2008 - 08:01:34 CDT

On Sun, 1 Jun 2008, Rahul wrote:

RM> Hi all,

rahul,

RM> I am a college student currently looking at ways to GPU-accelerate
RM> NAMD2. I am working with the source code of NAMD 2.6, and I am going
RM> to use NVIDIA CUDA to create a GPU-accelerated version.

have you had a look at the NAMD homepage recently?
and specifically, have you seen the link to:
http://www.ks.uiuc.edu/Research/gpu/

a GPU accelerated non-bonded kernel for NAMD does already exist.

[...]
RM> Including lines within the loop one-by-one in the function shows that
RM> the error occurs when the declaration and assignment statements of
RM> variables vdw_a, vdw_b, vdw_c and vdw_d are included in the function.
RM> I have been trying to solve this for the last 2 weeks without any
RM> progress.

without looking at the code. have you considered the impact of
optimization on the code? depending on the level of optimizaton
instructions may be reordered and thus leading to small differences
when summing over floating point numbers. the very nature of the
MD results in an exponential deviation of trajectories as soon as
the numbers are no longer binary identical.

RM> I have been working on a platform that consists of an Intel Core2 Quad
RM> CPU having a clock speed of 2.4 GHz, using Red Hat Linux, having gcc4.
RM> The problem persists even after using gcc3 for the compilation. The
RM> interesting thing is that the same problem does not occur on another
RM> machine that has an amd64 architecture.

using the exact same compiler/flags? single/multi-core?
multi-core cpus when used with SSE3 and high-level optimization
have the highest chance of not yielding binary identical results,
since processes get kicked around the cores resulting in some
"numerical noise". shuffling some code around can affect the
ways how much a compiler can optimize code. sometimes people
change code in ways that don't really change the algorithm,
but "guide" the compilers to do the right optimizations.

 
RM> Does anyone have any ideas on how to solve this problem?

not sure whether there is a solution that would not impact
the execution speed seriously. it doesn't really matter, for
as long as the trajectory samples the same statistical
mechanical ensemble.

cheers,
     axel.

RM>
RM> --------------------------------------------------------------------------------------------------------------------------------------------------------------
RM>
RM> Steps To Be Taken To Reproduce The Results
RM>
RM> 1. Pre-process src/ComputeNonbondedStd.C, src/ComputeNonbondedLES.C,
RM> src/ComputeNonbondedFEP.C and src/ComputeNonbondedPProf.C using gcc -E
RM>
RM> 2. Enclose one of the loops following a "#pragma ivdep" in a function,
RM> and pass the requisite arguments. Replace the loop by a function cal.
RM> Ensure that the changes to any of the variables within the function
RM> are reflected after exit from that function. Some of the arguments
RM> might have to be passed as pointers or references to ensure this.
RM>
RM> 3. Now re-make NAMD and run it with the same input. Compute the output
RM> with the reference solution.
RM>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:47:52 CST