Re: Yet another NAMD speed concerns

From: David Hardy (dhardy_at_ks.uiuc.edu)
Date: Sat Apr 25 2009 - 12:28:46 CDT

The amount of computational work that you have for the nonbonded
short-range interactions (which is the dominant part of the overall
computation) scales as the cube of your cutoff distance (the volume
of a sphere around each atom), so using a cutoff distance of 15
rather than the more commonly accepted value of 12 almost doubles the
amount of computation required per timestep, since (15/12)^3 = 1.95.
Note also that reducing the cutoff to 12 will also get you better
parallel scaling by increasing the amount of concurrent work
available to NAMD.

PME uses FFT for calculating the long-range electrostatic
contributions. The performance of FFT calculation is optimized by
using powers of 2 for the grid size or by introducing a single small
odd prime into the factorization, like 3. So better choices for your
PME performance would be to set the grid size to either 128 or 96.

-Dave

On Apr 25, 2009, at 11:43 AM, Gianluca Interlandi wrote:

> I have a system of ca. 100000 atoms and I run on 32 CPUs with
> infiniband. It takes 8 hours for 1 ns. I'm not sure the system
> scales on 256 CPUs, I haven't tried. The more CPUs you use the more
> communication there will be and this will be the bottleneck. The
> problem is the latency time, i.e., the time it takes for a message
> to be prepared and sent over the network. MD uses domain
> decomposition where the system is subdivided into domains. The more
> CPUs you use the smaller will be each subdomain and the more time a
> CPU will spend in waiting until the message is sent instead of
> computing.
>
> A comment about your parameters. I think you can choose a cutoff of
> 12 and you can also set "pairlistdist 14.0".
>
> Gianluca
>
> On Sat, 25 Apr 2009, DimitryASuplatov wrote:
>
>> Hello,
>>
>> I am running a simulation of my protein in water (cell = 15 angstroms
>> from the protein). System contains 110349 atoms total.
>> I have compiled namd 2.6 on xeon infiniband cluster uusing amd64
>> installation procedure. I`ve also used icc and ifort with -xSSSE3
>> flag.
>>
>> My force field parameters are
>> =====================
>> exclude scaled1-4
>> 1-4scaling 1.0
>> cutoff 15.
>> switching on
>> switchdist 10.
>>
>>
>> My integrator parameters are
>> =====================
>> timestep 2.0 ;# 2fs/step
>> rigidBonds all ;# needed for 2fs steps
>> nonbondedFreq 1
>> fullElectFrequency 4
>> stepspercycle 20
>>
>>
>> I use
>> ====
>> langevin on
>> langevinPiston on
>>
>> cellBasisVector1 108 0. 0.
>> cellBasisVector2 0. 102 0.
>> cellBasisVector3 0. 0 105.
>> cellOrigin 63.87940979 13.3578186035 32.2524185181
>>
>> PME yes
>> PMEGridSizeX 125
>> PMEGridSizeY 125
>> PMEGridSizeZ 125
>>
>> MY PROBLEM IS THAT MY SYSTEM IS TOO SLOW
>> ======================================================
>> Utilization of 256 x Xeon 3Gz CPUs requires 41 hours for 10 ns
>> !!!!!!!!!!!!!!!!!!!
>> ======================================================
>>
>> I consider my system to be of a typical size. 100k atoms fro 10 ns is
>> normal in our days. But it is very hard calculate with NAMD even on
>> modern clusters.
>>
>> 1/ Is this speed normal for namd or did i do something wrong?
>> 2/ Can I use 4fs for timestep with this parameters?
>> 3/ what do you do to shortcut your calculations?
>>
>> Thanks
>> SDA
>>
>>
>
> -----------------------------------------------------
> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
> +1 (206) 685 4435
> +1 (206) 714 4303
> http://artemide.bioeng.washington.edu/
>
> Postdoc at the Department of Bioengineering
> at the University of Washington, Seattle WA U.S.A.
> -----------------------------------------------------

-- 
David J. Hardy, Ph.D.
Theoretical and Computational Biophysics
Beckman Institute, University of Illinois
dhardy_at_ks.uiuc.edu
http://www.ks.uiuc.edu/~dhardy/

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:49 CST