Re: Benchmarks for GTX 690's and 590's

From: Dr. Eddie (eackad_at_gmail.com)
Date: Wed Nov 28 2012 - 10:35:53 CST

Any architecture.I'm sorry my question was poorly phrased.

I'm curious if anyone has, for around 20000 particles, got around 0.077
days/ns. If yes, what was the number of cores and architecture?

On Wed, Nov 28, 2012 at 10:11 AM, Axel Kohlmeyer <akohlmey_at_gmail.com> wrote:

> On Wed, Nov 28, 2012 at 5:04 PM, Norman Geist
> <norman.geist_at_uni-greifswald.de> wrote:
> > Of which clock rate, which architecture, which memory bandwidth, which
> > interconnect, which configuration …?
> >
> > This is one of the most annoying things in HPC, please don’t compare in
> > “cores” as they can differ by hundreds of percent in a few generations. A
> > good example are the new sandy bridge xeon cpus which are twice as fast
> as
> > the earlier generation due two doubled floating point performance and
> > heavily increased memory bandwidth.
>
> or about as fast, if you have an application that doesn't require much
> memory bandwidth and doesn't vectorize. the non-SSE floating point
> performance of a sandybridge CPU is equivalent to a westmere/nehalem
> of the same clock in that case. ;-)
>
> axel.
>
>
> >
> >
> >
> > For what reason you need this comparison?
> >
> >
> >
> > To easy get such a comparison, you would need a cpu cluster and just
> > benchmark. But this would only compare this particular cpu architecture
> and
> > configuration. Whereas to really say a thing, you would need to get these
> > results into absolute values like flop/s to compare to any hardware. This
> > requires to determine what part of the theoretical performance of the
> GPUs
> > and CPUs you can really use in which case with namd. This requires a
> model
> > to describe software<->hardware dependencies and the losses due
> bottlenecks
> > like memory bandwidth and network. Now you would be able to determine the
> > real used flop/s on a CPU cluster. But you still can’t compare to the
> GPUs
> > as you have to describe the losses there, too. So extend the model by
> > metrics for PCIE. If done that also, you are finally able to compare the
> > power of gpus to any cpu out there as you can approximate what hardware
> > configuration will be able to use what part of its theoretical
> performance.
> > It’s a lot of work, I’ve done it already and it took some time.
> >
> >
> >
> > Norman Geist.
> >
> >
> >
> > Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag
> > von Dr. Eddie
> > Gesendet: Mittwoch, 28. November 2012 16:12
> > An: Aron Broom
> > Cc: namd-l_at_ks.uiuc.edu
> > Betreff: Re: namd-l: Benchmarks for GTX 690's and 590's
> >
> >
> >
> > Anyone have a guess as to how many processors (cores) would be needed to
> get
> > around 0.0768112 days/ns with cpu's alone using 20 000 particles?
> >
> > Thanks,
> >
> > Eddie
> >
> >
> >
> > On Sun, Nov 25, 2012 at 8:11 PM, Dr. Eddie <eackad_at_gmail.com> wrote:
> >
> > I think it is the gpu's. The cpu's are the same clock speed. The number
> of
> > cuda cores is triple that of the 580 class in the 600 series. So the
> > scaling seems to be almost linear with the number of cuda-cores.
> >
> > Eddie
> >
> >
> >
> > On Sun, Nov 25, 2012 at 1:25 PM, Aron Broom <broomsday_at_gmail.com> wrote:
> >
> > neat, that seems like a fair speedup!
> >
> > Do you think most of that is a result of the GPUs, or are there also big
> > differences in the CPUs? I know NAMD 2.9 gave some minor boosts to GPU
> > performance also, but certainly nothing on the order that you're seeing
> > there.
> >
> > ~Aron
> >
> >
> >
> > On Sun, Nov 25, 2012 at 12:41 PM, Dr. Eddie <eackad_at_gmail.com> wrote:
> >
> > Hi all,
> >
> > I you are interested I have some benchmarks of namd 2.9 on nvidia gtx
> 690's
> > and 590's:
> >
> >
> >
> > The gtx 690's are two video cards per node (thus 4 gpu's per node) on a
> dual
> > 16-core AMD processor board (thus 32 cores). for 19777 particles I get:
> >
> >
> >
> > Info: Benchmark time: 20 CPUs 0.00808809 s/step 0.0936121 days/ns
> 223.324 MB
> > memory
> >
> >
> >
> > 16 cpu's
> >
> > Info: Benchmark time: 16 CPUs 0.00663649 s/step 0.0768112 days/ns
> 206.988 MB
> > memory
> >
> >
> >
> > This compares with namd 2.8 on a dual 12-core system with a gtx 590:
> >
> > Info: Benchmark time: 12 CPUs 0.0314254 s/step 0.363719 days/ns 17.9045
> MB
> > memory
> >
> >
> >
> > This compares with namd 2.8 on a dual 12-core system with a gtx 590 and a
> > gtx 580 (3 gpu's):
> >
> > Info: Benchmark time: 12 CPUs 0.0216837 s/step 0.250969 days/ns 17.9739
> MB
> > memory
> >
> >
> >
> > It seems after 3 gpu's, 4 cpu's per gpu seems to be optimum.
> >
> > I had tested namd 2.8 with other cpu numbers and 12 always was best. I
> don't
> > know if it is something about exceeding the number of cores on a physical
> > processor or something else.
> >
> >
> >
> > Hope this helps!
> >
> > Eddie
> >
> >
> >
> > --
> > Aron Broom M.Sc
> > PhD Student
> > Department of Chemistry
> > University of Waterloo
> >
> >
> >
> >
> >
> > --
> > Eddie
> >
> >
> >
> >
> >
> > --
> > Eddie
>
>
>
> --
> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
> International Centre for Theoretical Physics, Trieste. Italy.
>

-- 
Eddie

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:19 CST