Re: Mindboggling Problem Related To NAMD2 Code

From: Peter Freddolino (
Date: Sun Jun 01 2008 - 09:56:21 CDT

Hi Rahul,

on the topic of making a GPU-accelerated version of namd, please note
that similar efforts are currently underway in the Schulten group (see
the Presentation and Publication sections at You are of course perfectly
welcome to do this on your own as well, but depending on whether you
want to do this as a learning experience, for your own benefit (having a
faster version of namd), or to contribute to the community, you may want
to coordinate with people here rather than starting off on our own. For
example, I can tell you that there already is a good CUDA implementation
of the short range nonbonded interaction calculations (see the papers on
the site noted above).

Now, for the specific problem you're running into... while I can't say
for sure what's going on because I don't know exactly where you've set
up your function and when you're calling it, I can tell you that the key
to using ComputeNonbondedBase2.h is that it is really being used just to
supplement ComputeNonbondedBase.h by providing different versions for
the code of the nonbonded calculation loop depending on what
preprocessor macros are defined. If you want to split it off into a
function instead, you'll need to either actually make it compile to
different functions for normal, modified, and excluded interactions, or
include that information in the function call and have the function act
accordingly. Also, several variables needed in ComputeNonbondedBase2.h
are set in the enclosing loop of ComputeNonbondedBase; you may want to
verify that these are properly being passed through in your function.


Rahul wrote:
> Hi all,
> I am a college student currently looking at ways to GPU-accelerate
> NAMD2. I am working with the source code of NAMD 2.6, and I am going
> to use NVIDIA CUDA to create a GPU-accelerated version.
> A cursory look at the function profile of the code indicates that most
> of the execution time is spent on computation of nonbonded forces. The
> basic computations are conveniently enclosed in a file called
> src/ComputeNonbondedBase2.h, which consists of a loop over all the
> atoms. If these computations are to be ported to a GPU, I need to
> create, according to CUDA convention, a global kernel function that
> will carry out these operations. Before even starting to use CUDA, if
> I try to enclose the loop in a function, and run it on the CPU, it
> should work out fine, ideally. However, when I enclose this loop in a
> function, I get results (output coordinates for the same input) that
> are somewhat different from the reference values. The error increases
> with the number of timesteps, and is very significant.
> Including lines within the loop one-by-one in the function shows that
> the error occurs when the declaration and assignment statements of
> variables vdw_a, vdw_b, vdw_c and vdw_d are included in the function.
> I have been trying to solve this for the last 2 weeks without any
> progress.
> I have been working on a platform that consists of an Intel Core2 Quad
> CPU having a clock speed of 2.4 GHz, using Red Hat Linux, having gcc4.
> The problem persists even after using gcc3 for the compilation. The
> interesting thing is that the same problem does not occur on another
> machine that has an amd64 architecture.
> Does anyone have any ideas on how to solve this problem?
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
> Steps To Be Taken To Reproduce The Results
> 1. Pre-process src/ComputeNonbondedStd.C, src/ComputeNonbondedLES.C,
> src/ComputeNonbondedFEP.C and src/ComputeNonbondedPProf.C using gcc -E
> 2. Enclose one of the loops following a "#pragma ivdep" in a function,
> and pass the requisite arguments. Replace the loop by a function cal.
> Ensure that the changes to any of the variables within the function
> are reflected after exit from that function. Some of the arguments
> might have to be passed as pointers or references to ensure this.
> 3. Now re-make NAMD and run it with the same input. Compute the output
> with the reference solution.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:47:52 CST