AW: no. of CPUs for optimal GTX-690 performance

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Wed Nov 07 2012 - 01:56:25 CST

Hi,

 

just a few things came to my mind:

 

1. I'm using each GPU (Tesla C2050) with one Xeon E5649 6-core 2.53GHz
what lead to nice utilization of the GPU with a system that's big enough
(but fullelectfrequency 4)

2. Sandy Bridge series have doubled floating point performance with
8/4 single/double precision flops/cycle.

3. Kepler GPUs have doubled performance due 3 times the cores and half
the clock rate compared to Fermi.

 

All this means, that you are going to bind doubled CPU power with doubled
GPU power, therefore I would say one 6-core sandy bridge per Kepler GPU,
should be the same relation as mine (both doubled performance), neglected
the Tesla feature of being able to use the pcie bi-directional due two dma
engines (not used by NAMD) and PCIe3 is surely necessary as the need for
data transfer will increase heavily.

 

Good luck

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Giacomo Fiorin
Gesendet: Dienstag, 6. November 2012 21:46
An: mpurdy_at_virginia.edu
Cc: NAMD list
Betreff: Re: namd-l: no. of CPUs for optimal GTX-690 performance

 

One additional thing that complicates things for Sandy Bridge processors is
the Turbo Boost. You had equal speed between 4 cores and 7 cores, so things
were not going so well. Many people have dealt with this problem for
benchmarking purposes, and posted different solutions online to disable it.
(Ajasja: how are your scalings without GPU?)

 

In any case, the main problem is most often the limited bandwidth between
CPU and GPU, like Ajasja and Aron already said. The motherboard that you're
planning to use is a good choice, the one you're currently making tests on
may not be: what is it?. Also not knowing which Opterons you had nor the
PCI-e bus speed, the comparison you made with the ThinkPad is not
informative.

 

That said, I don't think it's worth going beyond 1 CPU for every GPU.
First, it will be hard to find suitable motherboards. Second and most
important, 12-16 CPU cores plus 2 GPUs all exchanging data on the same bus
will probably already clog up the PCI-e bus. I agree with Ajasja that
hyperthreading may be useless, and actually harmful if you're sharing the
bandwidth (that would be 24-32 CPU cores.. again all sharing the same bus).

 

On which CPU, I would vote for less cores but higher clock (e.g. Xeon 2640
or 2667), if you're planning to use them with a GPU.

 

Giacomo

 

On Tue, Nov 6, 2012 at 2:01 PM, Michael Purdy <mdp3w_at_virginia.edu> wrote:

Hello, I am running NAMD simulations (multicore-CUDA) on a ThinkPad with
dual Core i7-2760QM CPUs and a Quadro 2000M running Debian. For a 150k atom
system I get performance like this:

Benchmark time: 4 CPUs 0.287062 s/step 1.66124 days/ns 387.641 MB memory
Benchmark time: 7 CPUs 0.289229 s/step 1.67378 days/ns 428.574 MB memory

Things are going well so we purchased a GTX-690 which we installed in a
workstation with two dual core Opterons, which is evidently far short of the
CPU cores we need to get the most of the 2 GPUs and 3072 cuda cores.
Performance was just slightly better than the ThinkPad:

Benchmark time: 4 CPUs ~0.2 s/step ~1.4 days/ns

We would like to build a new workstation to get the most out of the GTX-690
and I'd like to know how many CPU cores we need. I'm considering two Core
i7-3930k (6-core/12-thread) or two Xeon E5-2650 (8-core/16-thread). Will
either of these be a good match for the GTX-690 or will I still be short
running short on CPUs? The current plans is to build this on an Asus Z9PE-D8
WS board.

Michael

 

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:43 CST