Re: Running NAMD on Forge (CUDA)

From: Aron Broom (broomsday_at_gmail.com)
Date: Thu Jul 12 2012 - 13:50:59 CDT

hmmm,

are other people also using those GPUs?

What are the benchmark timings that you are given after ~1000 steps? Using
the absolute amount of run time for such a short simulation might be
misleading, as there is some setup time.

I often run a system of ~100,000 atoms, and I generally see an order of
magnitude improvement in speed compared to the same number of cores without
the GPUs. I would test the non-CUDA precompiled cude on your Forge system
and see how that compares, it might be the fault of something other than
CUDA.

~Aron

On Thu, Jul 12, 2012 at 2:41 PM, Gianluca Interlandi <
gianluca_at_u.washington.edu> wrote:

> Hi Aron,
>
> Thanks for the explanations. I don't know whether I'm doing everything
> right. I don't see any speed advantage running on the CUDA cluster (Forge)
> versus running on a non-CUDA cluster.
>
> I did the following benchmarks on Forge (the system has 127,000 atoms and
> ran for 1000 steps):
>
> np 1: 506 sec
> np 2: 281 sec
> np 4: 163 sec
> np 6: 136 sec
> np 12: 218 sec
>
> On the other hand, running the same system on 16 cores of Trestles (AMD
> Magny Cours) takes 129 sec. It seems that I'm not really making good use of
> SUs by running on the CUDA cluster. Or, maybe I'm doing something wrong?
> I'm using the ibverbs-smp-CUDA pre-compiled version of NAMD 2.9.
>
> Thanks,
>
> Gianluca
>
>
> On Tue, 10 Jul 2012, Aron Broom wrote:
>
> if it is truly just one node, you can use the multicore-CUDA version and
>> avoid the
>> MPI charmrun stuff. Still, it boils down to much the same thing I
>> think. If you do
>> what you've done below, you are running one job with 12 CPU cores and all
>> GPUs. If
>> you don't specify the +devices, NAMD will automatically find the
>> available GPUs, so I
>> think the main benefit of specifying them is when you are running more
>> than one job
>> and don't want the jobs sharing GPUs.
>>
>> I'm not sure you'll see great scaling across 6 GPUs for a single job, but
>> that would
>> be great if you did.
>>
>> ~Aron
>>
>> On Tue, Jul 10, 2012 at 1:14 PM, Gianluca Interlandi <
>> gianluca_at_u.washington.edu>
>> wrote:
>> Hi,
>>
>> I have a question concerning running NAMD on a CUDA cluster.
>>
>> NCSA Forge has for example 6 CUDA devices and 16 CPU cores per
>> node. If I
>> want to use all 6 CUDA devices in a node, how many processes is it
>> recommended to spawn? Do I need to specify "+devices"?
>>
>> So, if for example I want to spawn 12 processes, do I need to
>> specify:
>>
>> charmrun +p12 -machinefile $PBS_NODEFILE +devices 0,1,2,3,4,5 namd2
>> +idlepoll
>>
>> Thanks,
>>
>> Gianluca
>>
>> ------------------------------**-----------------------
>> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
>> +1 (206) 685 4435
>> http://artemide.bioeng.**washington.edu/>
>>
>> Research Scientist at the Department of Bioengineering
>> at the University of Washington, Seattle WA U.S.A.
>> ------------------------------**-----------------------
>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>>
>>
>>
>>
> ------------------------------**-----------------------
> Gianluca Interlandi, PhD
gianluca_at_u.washington.edu
> +1 (206) 685 4435
> http://artemide.bioeng.**washington.edu/>
>
> Research Scientist at the Department of Bioengineering
> at the University of Washington, Seattle WA U.S.A.
> ------------------------------**-----------------------
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:46 CST