From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon Apr 02 2012 - 01:43:07 CDT
Hi,
I never observed what you do. Always the precompiled were as fast as self
compiled builds. Also I got a 6% speedup just by switching to 2.9b1 on my 6
TeslaC2050 caused by the energy evaluation done on the gpu now, so I guess.
What comes to my mind:
1. You did something wrong when compiling. Check the timing of a self
compiled 2.8 build. One can see that the precompiled 2.9b2 builds
were faster than yours on one node.
2. A little part more is done on the gpu now, check your settings in
script if they harm this (low outputenergies that mess up your pcie
bandwidth etc.)
More Off Topic: Check your overall scaling. You only get a speedup of 6
instead of 8 even with 2.8 from 4 to 32 cores. That’s not that nice
and should scale better even with SDR.
Let us know.
Norman Geist.
> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von Nicholas M Glykos
> Gesendet: Sonntag, 1. April 2012 10:41
> An: Thomas Albers
> Cc: namd-l_at_ks.uiuc.edu
> Betreff: Re: namd-l: NAMD 2.9b2-cuda does not scale well compared to
> NAMD 2.8
>
>
>
>
> .. which is even more interesting given that when we moved from the
> Linux-x86_64-CUDA executable to 2.9's Linux-x86_64-multicore-CUDA we
> saw
> speed-ups of the order of ~40%. For a tiny system (7400 atoms) on a
> single node based on AMD FX-8150 + GTX 570 the Linux-x86_64-multicore-
> CUDA
> executable is 45% faster than the 'CVS-2012-02-26 for Linux-x86_64-
> CUDA'.
> Hmmm ....
>
>
>
> On Sat, 31 Mar 2012, Thomas Albers wrote:
>
> > Hello!
> >
> > We have a cluster consisting of 8 AMD Phenom II x4 computers with GTX
> > 460 video card linked with SDR Infiniband, and we find that the CUDA
> > version of NAMD 2.9b2 scales worse than NAMD 2.8 and runs slower. I
> > did compile the program myself since UIUC offers only the
> > Linux-x86_64-ibverbs-smp-CUDA binary for download, not the
> > Linux-x86_64-ibverbs-CUDA version that would be suitable for us.
> >
> > Some timing results, all with the F1ATPase benchmark:
> > NAMD 2.9b2 Linux-x86_64-ibverbs, 32 cores: 0.160 s/step
> > NAMD 2.8 Linux-x86_64-ibverbs, 32 cores: 0.160 s/step
> > NAMD 2.9.b2, compiled w/ gcc 4.5.3, 32 cores: 0.160 s/step
> >
> > (One node:)
> > NAMD 2.9b2 Linux-x86_64-multicore-CUDA, 4 cores: 0.238 s/step
> > NAMD 2.8 Linux-x86_64-ibverbs-CUDA, 4 cores: 0.231 s/step
> > NAMD 2.9b2, compiled w/ gcc 4.5.3, 4 cores: 0.251 s/step
> >
> > NAMD 2.9b2, compiled w/ gcc 4.5.3, 4 cores: 0.251 s/step
> > NAMD 2.9b2, compiled w/ gcc 4.5.3, 8 cores: 0.173 s/step
> > NAMD 2.9b2, compiled w/ gcc 4.5.3, 16 cores: 0.104 s/step
> > NAMD 2.9b2, compiled w/ gcc 4.5.3, 32 cores: 0.065 s/step
> > NAMD 2.8 Linux-x86_64-ibverbs-CUDA, 32 cores: 0.039 s/step
> >
> > What is interesting is that timing results on one node are comparable
> > between versions, and that the non-CUDA version also does not seem to
> > be affected. It's only the CUDA version of NAMD 2.9 that shows this
> > odd scaling behavior. What is going on?
> >
> > Thomas
> >
>
> --
>
>
> Nicholas M. Glykos, Department of Molecular Biology
> and Genetics, Democritus University of Thrace, University Campus,
> Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office)
> +302551030620,
> Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:23 CST