Re: no. of CPUs for optimal GTX-690 performance

From: Aron Broom (broomsday_at_gmail.com)
Date: Wed Nov 07 2012 - 02:16:46 CST

Although I also wonder about planning for the future. That is, in 6-months
or whenever, will the next NAMD release have a binary that is completely
ported to the GPU minus the very infrequent file I/O? If so, those nice
CPUs would be largely wasted.

Just a thought. I guess it depends on their cost, but I'd imagine a nice
workstation CPU is ~the cost of another GTX-690.

~Aron

On Wed, Nov 7, 2012 at 2:56 AM, Norman Geist <norman.geist_at_uni-greifswald.de
> wrote:

> Hi,****
>
> ** **
>
> just a few things came to my mind:****
>
> ** **
>
> **1. **Im using each GPU (Tesla C2050) with one Xeon E5649 6-core
> 2.53GHz what lead to nice utilization of the GPU with a system thats big
> enough (but fullelectfrequency 4)****
>
> **2. **Sandy Bridge series have doubled floating point performance
> with 8/4 single/double precision flops/cycle.****
>
> **3. **Kepler GPUs have doubled performance due 3 times the cores
> and half the clock rate compared to Fermi.****
>
> ** **
>
> All this means, that you are going to bind doubled CPU power with doubled
> GPU power, therefore I would say one 6-core sandy bridge per Kepler GPU,
> should be the same relation as mine (both doubled performance), neglected
> the Tesla feature of being able to use the pcie bi-directional due two dma
> engines (not used by NAMD) and PCIe3 is surely necessary as the need for
> data transfer will increase heavily.****
>
> ** **
>
> Good luck****
>
> ** **
>
> Norman Geist.****
>
> ** **
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Giacomo Fiorin
> *Gesendet:* Dienstag, 6. November 2012 21:46
> *An:* mpurdy_at_virginia.edu
> *Cc:* NAMD list
> *Betreff:* Re: namd-l: no. of CPUs for optimal GTX-690 performance****
>
> ** **
>
> One additional thing that complicates things for Sandy Bridge processors
> is the Turbo Boost. You had equal speed between 4 cores and 7 cores, so
> things were not going so well. Many people have dealt with this problem
> for benchmarking purposes, and posted different solutions online to disable
> it. (Ajasja: how are your scalings without GPU?)****
>
> ** **
>
> In any case, the main problem is most often the limited bandwidth between
> CPU and GPU, like Ajasja and Aron already said. The motherboard that
> you're planning to use is a good choice, the one you're currently making
> tests on may not be: what is it?. Also not knowing which Opterons you had
> nor the PCI-e bus speed, the comparison you made with the ThinkPad is not
> informative.****
>
> ** **
>
> That said, I don't think it's worth going beyond 1 CPU for every GPU.
> First, it will be hard to find suitable motherboards. Second and most
> important, 12-16 CPU cores plus 2 GPUs all exchanging data on the same bus
> will probably already clog up the PCI-e bus. I agree with Ajasja that
> hyperthreading may be useless, and actually harmful if you're sharing the
> bandwidth (that would be 24-32 CPU cores.. again all sharing the same bus).
> ****
>
> ** **
>
> On which CPU, I would vote for less cores but higher clock (e.g. Xeon 2640
> or 2667), if you're planning to use them with a GPU.****
>
> ** **
>
> Giacomo****
>
> ** **
>
> On Tue, Nov 6, 2012 at 2:01 PM, Michael Purdy <mdp3w_at_virginia.edu> wrote:*
> ***
>
> Hello, I am running NAMD simulations (multicore-CUDA) on a ThinkPad with
> dual Core i7-2760QM CPUs and a Quadro 2000M running Debian. For a 150k atom
> system I get performance like this:
>
> Benchmark time: 4 CPUs 0.287062 s/step 1.66124 days/ns 387.641 MB memory
> Benchmark time: 7 CPUs 0.289229 s/step 1.67378 days/ns 428.574 MB memory
>
> Things are going well so we purchased a GTX-690 which we installed in a
> workstation with two dual core Opterons, which is evidently far short of
> the CPU cores we need to get the most of the 2 GPUs and 3072 cuda cores.
> Performance was just slightly better than the ThinkPad:
>
> Benchmark time: 4 CPUs ~0.2 s/step ~1.4 days/ns
>
> We would like to build a new workstation to get the most out of the
> GTX-690 and I'd like to know how many CPU cores we need. I'm considering
> two Core i7-3930k (6-core/12-thread) or two Xeon E5-2650
> (8-core/16-thread). Will either of these be a good match for the GTX-690 or
> will I still be short running short on CPUs? The current plans is to build
> this on an Asus Z9PE-D8 WS board.
>
> Michael
>
>
> ****
>
> ** **
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:13 CST