Re: no. of CPUs for optimal GTX-690 performance

From: Ajasja Ljubetič (
Date: Wed Nov 07 2012 - 04:34:10 CST

Can't find the benchmarks right now, but if I remember
correctly, hyper-threading was not much use. Under linux, there was no
speedup when using 8 instead of 4 threads (either with or without the
GPU). Under windows there was about 5-10% gain when using hyper threading.

I think Giacomo's advice is very sound: I'd go for higher clock speeds as

Best regards,

On 6 November 2012 21:46, Giacomo Fiorin <> wrote:

> One additional thing that complicates things for Sandy Bridge processors
> is the Turbo Boost. You had equal speed between 4 cores and 7 cores, so
> things were not going so well. Many people have dealt with this problem
> for benchmarking purposes, and posted different solutions online to disable
> it. (Ajasja: how are your scalings without GPU?)
> In any case, the main problem is most often the limited bandwidth between
> CPU and GPU, like Ajasja and Aron already said. The motherboard that
> you're planning to use is a good choice, the one you're currently making
> tests on may not be: what is it?. Also not knowing which Opterons you had
> nor the PCI-e bus speed, the comparison you made with the ThinkPad is not
> informative.
> That said, I don't think it's worth going beyond 1 CPU for every GPU.
> First, it will be hard to find suitable motherboards. Second and most
> important, 12-16 CPU cores plus 2 GPUs all exchanging data on the same bus
> will probably already clog up the PCI-e bus. I agree with Ajasja that
> hyperthreading may be useless, and actually harmful if you're sharing the
> bandwidth (that would be 24-32 CPU cores.. again all sharing the same bus).
> On which CPU, I would vote for less cores but higher clock (e.g. Xeon 2640
> or 2667), if you're planning to use them with a GPU.
> Giacomo
> On Tue, Nov 6, 2012 at 2:01 PM, Michael Purdy <> wrote:
>> Hello, I am running NAMD simulations (multicore-CUDA) on a ThinkPad with
>> dual Core i7-2760QM CPUs and a Quadro 2000M running Debian. For a 150k atom
>> system I get performance like this:
>> Benchmark time: 4 CPUs 0.287062 s/step 1.66124 days/ns 387.641 MB memory
>> Benchmark time: 7 CPUs 0.289229 s/step 1.67378 days/ns 428.574 MB memory
>> Things are going well so we purchased a GTX-690 which we installed in a
>> workstation with two dual core Opterons, which is evidently far short of
>> the CPU cores we need to get the most of the 2 GPUs and 3072 cuda cores.
>> Performance was just slightly better than the ThinkPad:
>> Benchmark time: 4 CPUs ~0.2 s/step ~1.4 days/ns
>> We would like to build a new workstation to get the most out of the
>> GTX-690 and I'd like to know how many CPU cores we need. I'm considering
>> two Core i7-3930k (6-core/12-thread) or two Xeon E5-2650
>> (8-core/16-thread). Will either of these be a good match for the GTX-690 or
>> will I still be short running short on CPUs? The current plans is to build
>> this on an Asus Z9PE-D8 WS board.
>> Michael

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:13 CST