Re: Running NAMD on Forge (CUDA)

From: Gianluca Interlandi (gianluca_at_u.washington.edu)
Date: Thu Jul 12 2012 - 14:53:44 CDT

With multicore I get 96 sec per 1000 steps from the Benchmark time. It's
slightly faster but still not that much faster than running on 16
CPU cores.

Gianluca

On Thu, 12 Jul 2012, Aron Broom wrote:

> have you tried the multicore build?  I wonder if the prebuilt smp one is just not
> working for you.
>
> On Thu, Jul 12, 2012 at 3:21 PM, Gianluca Interlandi <gianluca_at_u.washington.edu>
> wrote:
> are other people also using those GPUs?
>
>
> I don't think so since I reserved the entire node.
>
> What are the benchmark timings that you are given after ~1000
> steps?
>
>
> The benchmark time with 6 processes is 101 sec for 1000 steps. This is only
> slightly faster than Trestles where I get 109 sec for 1000 steps running on 16
> CPUs. So, yes 6 GPUs on Forge are much faster than 6 cores on Trestles, but in
> terms of SUs it makes no difference, since on Forge I still have to reserve the
> entire node (16 cores).
>
> Gianluca
>
> is some setup time.
>
> I often run a system of ~100,000 atoms, and I generally see an
> order of magnitude
> improvement in speed compared to the same number of cores without
> the GPUs.  I would
> test the non-CUDA precompiled cude on your Forge system and see how
> that compares, it
> might be the fault of something other than CUDA.
>
> ~Aron
>
> On Thu, Jul 12, 2012 at 2:41 PM, Gianluca Interlandi
> <gianluca_at_u.washington.edu>
> wrote:
>       Hi Aron,
>
>       Thanks for the explanations. I don't know whether I'm doing
> everything
>       right. I don't see any speed advantage running on the CUDA
> cluster
>       (Forge) versus running on a non-CUDA cluster.
>
>       I did the following benchmarks on Forge (the system has
> 127,000 atoms and
>       ran for 1000 steps):
>
>       np 1:  506 sec
>       np 2:  281 sec
>       np 4:  163 sec
>       np 6:  136 sec
>       np 12: 218 sec
>
>       On the other hand, running the same system on 16 cores of
> Trestles (AMD
>       Magny Cours) takes 129 sec. It seems that I'm not really
> making good use
>       of SUs by running on the CUDA cluster. Or, maybe I'm doing
> something
>       wrong? I'm using the ibverbs-smp-CUDA pre-compiled version of
> NAMD 2.9.
>
>       Thanks,
>
>            Gianluca
>
>       On Tue, 10 Jul 2012, Aron Broom wrote:
>
>             if it is truly just one node, you can use the
> multicore-CUDA
>             version and avoid the
>             MPI charmrun stuff.  Still, it boils down to much the
> same
>             thing I think.  If you do
>             what you've done below, you are running one job with 12
> CPU
>             cores and all GPUs.  If
>             you don't specify the +devices, NAMD will automatically
> find
>             the available GPUs, so I
>             think the main benefit of specifying them is when you
> are
>             running more than one job
>             and don't want the jobs sharing GPUs.
>
>             I'm not sure you'll see great scaling across 6 GPUs for
> a
>             single job, but that would
>             be great if you did.
>
>             ~Aron
>
>             On Tue, Jul 10, 2012 at 1:14 PM, Gianluca Interlandi
>             <gianluca_at_u.washington.edu>
>             wrote:
>                   Hi,
>
>                   I have a question concerning running NAMD on a
> CUDA
>             cluster.
>
>                   NCSA Forge has for example 6 CUDA devices and 16
> CPU
>             cores per node. If I
>                   want to use all 6 CUDA devices in a node, how
> many
>             processes is it
>                   recommended to spawn? Do I need to specify
> "+devices"?
>
>                   So, if for example I want to spawn 12 processes,
> do I
>             need to specify:
>
>                   charmrun +p12 -machinefile $PBS_NODEFILE +devices
>             0,1,2,3,4,5 namd2
>                   +idlepoll
>
>                   Thanks,
>
>                        Gianluca
>
>                  
> -----------------------------------------------------
>                   Gianluca Interlandi, PhD
> gianluca_at_u.washington.edu
>                                       +1 (206) 685 4435
>                                      
>             http://artemide.bioeng.washington.edu/
>
>                   Research Scientist at the Department of
> Bioengineering
>                   at the University of Washington, Seattle WA
> U.S.A.
>                  
> -----------------------------------------------------
>
>
>
>
>             --
>             Aron Broom M.Sc
>             PhD Student
>             Department of Chemistry
>             University of Waterloo
>
>
>
>
>       -----------------------------------------------------
>       Gianluca Interlandi, PhD gianluca_at_u.washington.edu
>                           +1 (206) 685 4435
>                           http://artemide.bioeng.washington.edu/
>
>       Research Scientist at the Department of Bioengineering
>       at the University of Washington, Seattle WA U.S.A.
>       -----------------------------------------------------
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>
>
> -----------------------------------------------------
> Gianluca Interlandi, PhD gianluca_at_u.washington.edu
>                     +1 (206) 685 4435
>                     http://artemide.bioeng.washington.edu/
>
> Research Scientist at the Department of Bioengineering
> at the University of Washington, Seattle WA U.S.A.
> -----------------------------------------------------
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>

-----------------------------------------------------
Gianluca Interlandi, PhD gianluca_at_u.washington.edu
                     +1 (206) 685 4435
                     http://artemide.bioeng.washington.edu/

Research Scientist at the Department of Bioengineering
at the University of Washington, Seattle WA U.S.A.
-----------------------------------------------------

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:46 CST