Re: which portions of NAMD are CUDA accelerated?

From: Paul Rigor (uci-ics) (prigor_at_ics.uci.edu)
Date: Fri Dec 04 2009 - 21:59:12 CST

Hmm... I'm not sure how to answer that but I ran the latest version of NAMD
(2.7b2) CUDA and non-CUDA with the following (non-beefy) hardware
configuration and simulation system. The two processes sharing one single
GPU device had over 2X speed up over the cpu-only processes. I'm not sure
if you can extrapolate the amount of time spent on non-bonded forces, but it
sure does cut down the compute time by half.. For kicks, I also ran the
same simulation on a beefy Sun X4150 server. It outperforms the dinky
desktop by 1/3 though it uses 8 cores to do so.

For a larger simulation system (NAMD, 2.7b2-TCP), I'm actually using six of
these machines for a system with 5X the number of atoms. I'm not liking the
amount of time spent passing messages. So, for minimization of 40K atoms
(5000 steps) on a single Sun server, I get

WallClock: 1469.357788s CPUTime: 589.791321s

but on 6 nodes running 8 processes each, I get

WallClock: 1135.021240s CPUTime: 199.707642s

I don't know what the nice number of for the number of nodes (and
cores/node) vs. the number of atoms to achieve optimum performance. In any
case, I'm looking forward to setting up GPU cluster with Nehalem chipsets
and 2x GPU devices!

Sorry for the digression!

Cheers,
Paul

===STATS BELOW!===

*===Beefy hardware==*
Sun X4150 (1333Mhz FSB)
Gentoo Linux
2x Quad-core Intel Xeon CPU E5450 @ 3.00GHz
16GB DDR2 (667Mhz)
(The clusters are connected via 1GbE through a switch with a 10GbE
backplane... sorry, no Infinibad interconnect!)

*===Not-so-beefy hardware, but CUDA-equipped===*
Dell Vostro 220 mini
Fedora Core 11
IntelĀ® G45 Express Chipset
Intel Core2 Duo CPU E7400 @ 2.80GHz
4GB DDR2 RAM (800Mhz)
NVidia GeForce GTX 260 (192 PE, 896MB DDR3 RAM)

*===MD (5 million steps; 1 ns duration; already minimized; water box 5A,
NVT)===*
Info: SUMMARY OF PARAMETERS:
Info: 307 BONDS
Info: 769 ANGLES
Info: 1254 DIHEDRAL
Info: 81 IMPROPER
Info: 6 CROSSTERM
Info: 190 VDW
Info: 0 VDW_PAIRS
Info: TIME FOR READING PSF FILE: 0.0717082
Info: TIME FOR READING PDB FILE: 0.0138652
Info:
Info: Reading from binary file xxx.restart.coor
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 8389 ATOMS
Info: 5964 BONDS
Info: 4429 ANGLES
Info: 2913 DIHEDRALS
Info: 176 IMPROPERS
Info: 66 CROSSTERMS
Info: 0 EXCLUSIONS
Info: 7845 RIGID BONDS
Info: 17322 DEGREES OF FREEDOM
Info: 2976 HYDROGEN GROUPS
Info: TOTAL MASS = 51577.3 amu
Info: TOTAL CHARGE = 1.02818e-06 e
Info: MASS DENSITY = 0.914731 g/cm^3
Info: ATOM DENSITY = 0.0895954 atoms/A^3

*===Results of MD===*
Desktop, CUDA-enabled, +p2 ++local
WallClock: 15890.967773 CPUTime: 15880.965820 Memory: 14.010666 MB

Desktop, CPU-only, +p2 ++local
WallClock: 36608.851562 CPUTime: 36044.246094 Memory: 20.907639 MB

Sun X4150 server, CPU-only, +p8 ++local
WallClock: 10322.159180 CPUTime: 9856.663086 Memory: 12.163300 MB

On Thu, Dec 3, 2009 at 6:29 PM, Biff Forbush <biff.forbush_at_yale.edu> wrote:

> Is there an estimate of how much of the total calculation time is taken by
> the real space part of non-bonded forces with cpu alone?....recognizing that
> this will be machine and problem-size dependent, is the answer available for
> benchmark examples?
> regards,
> biff
>
>
> Axel Kohlmeyer wrote:
>
>> On Wed, 2009-12-02 at 18:00 -0800, Paul Rigor (uci) wrote:
>>
>>
>>> Hi,
>>>
>>>
>>> Was wondering if there's a break down of the portions of NAMD that
>>> currently CUDA accelerated?
>>>
>>>
>>
>> very simple: the calculation of the real space part of the non-bonded
>> forces.
>>
>> cheers,
>> axel.
>>
>>
>>
>>> Thanks!
>>> Paul
>>>
>>> --
>>> Paul Rigor Pre-doctoral BIT Fellow and Graduate Student Institute for
>>> Genomics and Bioinformatics Donald Bren School of Information and Computer
>>> Sciences University of California, Irvine
>>> http://www.ics.uci.edu/~prigor
>>>
>>>
>>>
>>
>

-- 
Paul Rigor
Pre-doctoral BIT Fellow and Graduate Student
Institute for Genomics and Bioinformatics
Donald Bren School of Information and Computer Sciences
University of California, Irvine
http://www.ics.uci.edu/~prigor

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:22:35 CST