AW: Benchmarks for GTX 690's and 590's

From: Norman Geist (
Date: Wed Nov 28 2012 - 10:04:50 CST

Of which clock rate, which architecture, which memory bandwidth, which
interconnect, which configuration .?

This is one of the most annoying things in HPC, please don't compare in
"cores" as they can differ by hundreds of percent in a few generations. A
good example are the new sandy bridge xeon cpus which are twice as fast as
the earlier generation due two doubled floating point performance and
heavily increased memory bandwidth.


For what reason you need this comparison?


To easy get such a comparison, you would need a cpu cluster and just
benchmark. But this would only compare this particular cpu architecture and
configuration. Whereas to really say a thing, you would need to get these
results into absolute values like flop/s to compare to any hardware. This
requires to determine what part of the theoretical performance of the GPUs
and CPUs you can really use in which case with namd. This requires a model
to describe software<->hardware dependencies and the losses due bottlenecks
like memory bandwidth and network. Now you would be able to determine the
real used flop/s on a CPU cluster. But you still can't compare to the GPUs
as you have to describe the losses there, too. So extend the model by
metrics for PCIE. If done that also, you are finally able to compare the
power of gpus to any cpu out there as you can approximate what hardware
configuration will be able to use what part of its theoretical performance.
It's a lot of work, I've done it already and it took some time.


Norman Geist.


Von: [] Im Auftrag
von Dr. Eddie
Gesendet: Mittwoch, 28. November 2012 16:12
An: Aron Broom
Betreff: Re: namd-l: Benchmarks for GTX 690's and 590's


Anyone have a guess as to how many processors (cores) would be needed to get
around 0.0768112 days/ns with cpu's alone using 20 000 particles?




On Sun, Nov 25, 2012 at 8:11 PM, Dr. Eddie <> wrote:

I think it is the gpu's. The cpu's are the same clock speed. The number of
cuda cores is triple that of the 580 class in the 600 series. So the
scaling seems to be almost linear with the number of cuda-cores.



On Sun, Nov 25, 2012 at 1:25 PM, Aron Broom <> wrote:

neat, that seems like a fair speedup!

Do you think most of that is a result of the GPUs, or are there also big
differences in the CPUs? I know NAMD 2.9 gave some minor boosts to GPU
performance also, but certainly nothing on the order that you're seeing



On Sun, Nov 25, 2012 at 12:41 PM, Dr. Eddie <> wrote:

Hi all,

I you are interested I have some benchmarks of namd 2.9 on nvidia gtx 690's
and 590's:


The gtx 690's are two video cards per node (thus 4 gpu's per node) on a dual
16-core AMD processor board (thus 32 cores). for 19777 particles I get:


Info: Benchmark time: 20 CPUs 0.00808809 s/step 0.0936121 days/ns 223.324 MB


16 cpu's

Info: Benchmark time: 16 CPUs 0.00663649 s/step 0.0768112 days/ns 206.988 MB


This compares with namd 2.8 on a dual 12-core system with a gtx 590:

Info: Benchmark time: 12 CPUs 0.0314254 s/step 0.363719 days/ns 17.9045 MB


This compares with namd 2.8 on a dual 12-core system with a gtx 590 and a
gtx 580 (3 gpu's):

Info: Benchmark time: 12 CPUs 0.0216837 s/step 0.250969 days/ns 17.9739 MB


It seems after 3 gpu's, 4 cpu's per gpu seems to be optimum.

I had tested namd 2.8 with other cpu numbers and 12 always was best. I don't
know if it is something about exceeding the number of cores on a physical
processor or something else.


Hope this helps!


Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:18 CST