Re: Running NAMD on Forge (CUDA)

From: Aron Broom (broomsday_at_gmail.com)
Date: Thu Jul 12 2012 - 15:05:59 CDT

What are your simulation parameters:

timestep (and also any multistepping values)
cutoff (and also the pairlist and PME grid spacing)

Have you tried giving it just 1 or 2 GPUs alone (using the +devices)? If
not, that is something I would try, I don't know the configuration of the
motherboard, but if those 6 GPUs are sharing memory bandwidth, then using
more of them will actually tend to slow down your simulation when using
NAMD.

On Thu, Jul 12, 2012 at 3:53 PM, Gianluca Interlandi <
gianluca_at_u.washington.edu> wrote:

> With multicore I get 96 sec per 1000 steps from the Benchmark time. It's
> slightly faster but still not that much faster than running on 16 CPU cores.
>
> Gianluca
>
>
> On Thu, 12 Jul 2012, Aron Broom wrote:
>
> have you tried the multicore build? I wonder if the prebuilt smp one is
>> just not
>> working for you.
>>
>> On Thu, Jul 12, 2012 at 3:21 PM, Gianluca Interlandi <
>> gianluca_at_u.washington.edu>
>> wrote:
>> are other people also using those GPUs?
>>
>>
>> I don't think so since I reserved the entire node.
>>
>> What are the benchmark timings that you are given after ~1000
>> steps?
>>
>>
>> The benchmark time with 6 processes is 101 sec for 1000 steps. This is
>> only
>> slightly faster than Trestles where I get 109 sec for 1000 steps running
>> on 16
>> CPUs. So, yes 6 GPUs on Forge are much faster than 6 cores on Trestles,
>> but in
>> terms of SUs it makes no difference, since on Forge I still have to
>> reserve the
>> entire node (16 cores).
>>
>> Gianluca
>>
>> is some setup time.
>>
>> I often run a system of ~100,000 atoms, and I generally see an
>> order of magnitude
>> improvement in speed compared to the same number of cores without
>> the GPUs. I would
>> test the non-CUDA precompiled cude on your Forge system and see how
>> that compares, it
>> might be the fault of something other than CUDA.
>>
>> ~Aron
>>
>> On Thu, Jul 12, 2012 at 2:41 PM, Gianluca Interlandi
>> <gianluca_at_u.washington.edu>
>> wrote:
>> Hi Aron,
>>
>> Thanks for the explanations. I don't know whether I'm doing
>> everything
>> right. I don't see any speed advantage running on the CUDA
>> cluster
>> (Forge) versus running on a non-CUDA cluster.
>>
>> I did the following benchmarks on Forge (the system has
>> 127,000 atoms and
>> ran for 1000 steps):
>>
>> np 1: 506 sec
>> np 2: 281 sec
>> np 4: 163 sec
>> np 6: 136 sec
>> np 12: 218 sec
>>
>> On the other hand, running the same system on 16 cores of
>> Trestles (AMD
>> Magny Cours) takes 129 sec. It seems that I'm not really
>> making good use
>> of SUs by running on the CUDA cluster. Or, maybe I'm doing
>> something
>> wrong? I'm using the ibverbs-smp-CUDA pre-compiled version of
>> NAMD 2.9.
>>
>> Thanks,
>>
>> Gianluca
>>
>> On Tue, 10 Jul 2012, Aron Broom wrote:
>>
>> if it is truly just one node, you can use the
>> multicore-CUDA
>> version and avoid the
>> MPI charmrun stuff. Still, it boils down to much the
>> same
>> thing I think. If you do
>> what you've done below, you are running one job with 12
>> CPU
>> cores and all GPUs. If
>> you don't specify the +devices, NAMD will automatically
>> find
>> the available GPUs, so I
>> think the main benefit of specifying them is when you
>> are
>> running more than one job
>> and don't want the jobs sharing GPUs.
>>
>> I'm not sure you'll see great scaling across 6 GPUs for
>> a
>> single job, but that would
>> be great if you did.
>>
>> ~Aron
>>
>> On Tue, Jul 10, 2012 at 1:14 PM, Gianluca Interlandi
>> <gianluca_at_u.washington.edu>
>> wrote:
>> Hi,
>>
>> I have a question concerning running NAMD on a
>> CUDA
>> cluster.
>>
>> NCSA Forge has for example 6 CUDA devices and 16
>> CPU
>> cores per node. If I
>> want to use all 6 CUDA devices in a node, how
>> many
>> processes is it
>> recommended to spawn? Do I need to specify
>> "+devices"?
>>
>> So, if for example I want to spawn 12 processes,
>> do I
>> need to specify:
>>
>> charmrun +p12 -machinefile $PBS_NODEFILE +devices
>> 0,1,2,3,4,5 namd2
>> +idlepoll
>>
>> Thanks,
>>
>> Gianluca
>>
>>
>> ------------------------------**-----------------------
>> Gianluca Interlandi, PhD
>> gianluca_at_u.washington.edu
>> +1 (206) 685 4435
>>
>> http://artemide.bioeng.**washington.edu/>
>>
>> Research Scientist at the Department of
>> Bioengineering
>> at the University of Washington, Seattle WA
>> U.S.A.
>>
>> ------------------------------**-----------------------
>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>>
>>
>>
>>
>> ------------------------------**-----------------------
>> Gianluca Interlandi, PhD
gianluca_at_u.washington.edu
>> +1 (206) 685 4435
>> http://artemide.bioeng.**washington.edu/>
>>
>> Research Scientist at the Department of Bioengineering
>> at the University of Washington, Seattle WA U.S.A.
>> ------------------------------**-----------------------
>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>>
>>
>>
>>
>> ------------------------------**-----------------------
>> Gianluca Interlandi, PhD
gianluca_at_u.washington.edu
>> +1 (206) 685 4435
>> http://artemide.bioeng.**washington.edu/>
>>
>> Research Scientist at the Department of Bioengineering
>> at the University of Washington, Seattle WA U.S.A.
>> ------------------------------**-----------------------
>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>>
>>
>>
>>
> ------------------------------**-----------------------
> Gianluca Interlandi, PhD
gianluca_at_u.washington.edu
> +1 (206) 685 4435
> http://artemide.bioeng.**washington.edu/>
>
> Research Scientist at the Department of Bioengineering
> at the University of Washington, Seattle WA U.S.A.
> ------------------------------**-----------------------
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:46 CST