From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Wed Nov 28 2012 - 10:11:27 CST
On Wed, Nov 28, 2012 at 5:04 PM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:
> Of which clock rate, which architecture, which memory bandwidth, which
> interconnect, which configuration …?
>
> This is one of the most annoying things in HPC, please don’t compare in
> “cores” as they can differ by hundreds of percent in a few generations. A
> good example are the new sandy bridge xeon cpus which are twice as fast as
> the earlier generation due two doubled floating point performance and
> heavily increased memory bandwidth.
or about as fast, if you have an application that doesn't require much
memory bandwidth and doesn't vectorize. the non-SSE floating point
performance of a sandybridge CPU is equivalent to a westmere/nehalem
of the same clock in that case. ;-)
axel.
>
>
>
> For what reason you need this comparison?
>
>
>
> To easy get such a comparison, you would need a cpu cluster and just
> benchmark. But this would only compare this particular cpu architecture and
> configuration. Whereas to really say a thing, you would need to get these
> results into absolute values like flop/s to compare to any hardware. This
> requires to determine what part of the theoretical performance of the GPUs
> and CPUs you can really use in which case with namd. This requires a model
> to describe software<->hardware dependencies and the losses due bottlenecks
> like memory bandwidth and network. Now you would be able to determine the
> real used flop/s on a CPU cluster. But you still can’t compare to the GPUs
> as you have to describe the losses there, too. So extend the model by
> metrics for PCIE. If done that also, you are finally able to compare the
> power of gpus to any cpu out there as you can approximate what hardware
> configuration will be able to use what part of its theoretical performance.
> It’s a lot of work, I’ve done it already and it took some time.
>
>
>
> Norman Geist.
>
>
>
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
> von Dr. Eddie
> Gesendet: Mittwoch, 28. November 2012 16:12
> An: Aron Broom
> Cc: namd-l_at_ks.uiuc.edu
> Betreff: Re: namd-l: Benchmarks for GTX 690's and 590's
>
>
>
> Anyone have a guess as to how many processors (cores) would be needed to get
> around 0.0768112 days/ns with cpu's alone using 20 000 particles?
>
> Thanks,
>
> Eddie
>
>
>
> On Sun, Nov 25, 2012 at 8:11 PM, Dr. Eddie <eackad_at_gmail.com> wrote:
>
> I think it is the gpu's. The cpu's are the same clock speed. The number of
> cuda cores is triple that of the 580 class in the 600 series. So the
> scaling seems to be almost linear with the number of cuda-cores.
>
> Eddie
>
>
>
> On Sun, Nov 25, 2012 at 1:25 PM, Aron Broom <broomsday_at_gmail.com> wrote:
>
> neat, that seems like a fair speedup!
>
> Do you think most of that is a result of the GPUs, or are there also big
> differences in the CPUs? I know NAMD 2.9 gave some minor boosts to GPU
> performance also, but certainly nothing on the order that you're seeing
> there.
>
> ~Aron
>
>
>
> On Sun, Nov 25, 2012 at 12:41 PM, Dr. Eddie <eackad_at_gmail.com> wrote:
>
> Hi all,
>
> I you are interested I have some benchmarks of namd 2.9 on nvidia gtx 690's
> and 590's:
>
>
>
> The gtx 690's are two video cards per node (thus 4 gpu's per node) on a dual
> 16-core AMD processor board (thus 32 cores). for 19777 particles I get:
>
>
>
> Info: Benchmark time: 20 CPUs 0.00808809 s/step 0.0936121 days/ns 223.324 MB
> memory
>
>
>
> 16 cpu's
>
> Info: Benchmark time: 16 CPUs 0.00663649 s/step 0.0768112 days/ns 206.988 MB
> memory
>
>
>
> This compares with namd 2.8 on a dual 12-core system with a gtx 590:
>
> Info: Benchmark time: 12 CPUs 0.0314254 s/step 0.363719 days/ns 17.9045 MB
> memory
>
>
>
> This compares with namd 2.8 on a dual 12-core system with a gtx 590 and a
> gtx 580 (3 gpu's):
>
> Info: Benchmark time: 12 CPUs 0.0216837 s/step 0.250969 days/ns 17.9739 MB
> memory
>
>
>
> It seems after 3 gpu's, 4 cpu's per gpu seems to be optimum.
>
> I had tested namd 2.8 with other cpu numbers and 12 always was best. I don't
> know if it is something about exceeding the number of cores on a physical
> processor or something else.
>
>
>
> Hope this helps!
>
> Eddie
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>
>
>
> --
> Eddie
>
>
>
>
>
> --
> Eddie
-- Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0 International Centre for Theoretical Physics, Trieste. Italy.
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:19 CST