Re: GTX-660 Ti benchmark

From: Michael Galloway (gallowaymd_at_ornl.gov)
Date: Tue Sep 18 2012 - 09:43:02 CDT

interesting discussion, i too have a new, single gpu node similar to
yours, i'd be interested in the details of benchmarking on this node as
well.

thanks for the interesting thead :-)

--- michael

On 09/18/2012 10:38 AM, Guanglei Cui wrote:
> Hi Aron and Norman,
>
> Thanks for the additional insights. I guess this explains why I saw
> slightly better performance on my Quadro 4000 than on M2090.
>
> I guess for small scale operation (as opposed to larger super
> computing centers), spending money on two M2090 cards doesn't make too
> much sense. One additional question ... for two M2090 cards in a
> single node (12 cores), what's the most optimal way of using them? In
> my experience, using two simultaneously doesn't seem to improve the
> namd2.9 (cuda and multicore) performance very much.
>
> Regards,
> Guanglei
>
> On Tue, Sep 18, 2012 at 2:36 AM, Norman Geist
> <norman.geist_at_uni-greifswald.de> wrote:
>> Hello,
>>
>>
>>
>> Just some comments:
>>
>>
>>
>> Nvidias workstation series are called Quadro, so it’s just wrong to call the
>> professional HPC Tesla series a workstation card and also to confuse them
>> with consumer hardware. The workstation cards are also consumer hardware,
>> the Tesla cards are non-consumer hardware.
>>
>>
>>
>> So:
>>
>>
>>
>> GTX – consumer - gaming
>>
>> Quadro – consumer - workstation
>>
>> Tesla – professional - HPC
>>
>>
>>
>> But I confirm with the other points you mentioned. Of course the gaming
>> cards have higher clocks and therefore better performance, as they are
>> meant for gaming and people don’t care about power consumption and heat
>> emission. Also the ECC slows the Tesla a little. But a professional
>> computing centre can’t use these overclocked gaming cards without heavy
>> cooling and their lack of administration abilities. Of course for some nodes
>> only, or a workstation, it’s ok to stay with the consumer hardware, in
>> professional space, they are not the best choice IMHO.
>>
>>
>>
>> Regards
>>
>> Norman Geist.
>>
>>
>>
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
>> von Aron Broom
>> Gesendet: Dienstag, 18. September 2012 03:57
>> An: Guanglei Cui
>> Cc: namd-l_at_ks.uiuc.edu
>> Betreff: Re: namd-l: GTX-660 Ti benchmark
>>
>>
>>
>> guanglei,
>>
>> just a quick point to make about cards: keep in mind that the very expensive
>> workstation cards aren't actually any faster than the consumer counterparts.
>> For instance, a GTX580 vs. an M2090, the 580 has the same number of cores
>> and actually faster clock and memory speeds. The M2090 has more memory and
>> that memory has error correcting code, hence the extra bucks. For the
>> kepler series (I'm not sure the workstation cards are out yet?) the consumer
>> cards will also be faster than the workstation ones at least in terms of
>> single precision, but I think it's supposed to be the reverse for double
>> precision.
>>
>> ~Aron
>>
>> On Mon, Sep 17, 2012 at 4:35 PM, Guanglei Cui <amber.mail.archive_at_gmail.com>
>> wrote:
>>
>> Hi Jason and Thomas,
>>
>> Thanks very much for your input. This is very useful, as I was
>> struggling to gauge my expectations on the GPU workstation we have
>> since I have no comparison. It seems Jason may have a similar hardware
>> setup. The OS installed here is Centos5.8. I'm not sure if this
>> matters.
>>
>> Thomas, if your timing was from 1GPU/1CPU, I'd be thoroughly upset
>> 'cause that is almost twice as fast as I could get on a much more
>> expensive card. Would you be able to share additional information on
>> your OS and any configurations that matter?
>>
>> Regards,
>> Guanglei
>>
>>
>> On Sun, Sep 16, 2012 at 6:08 PM, Roberts, Jason <Jason.Roberts_at_mh.org.au>
>> wrote:
>>> Hi Guanglei,
>>>
>>> We are running a 2U rack (2x Xeon E5645, 4xM2090) and although I don't
>>> have the same setup I ran the Apoa1 benchmark allocating 6 cores and 1 M2090
>>> (./namd2 +idlepoll +p6 +devices 0 apoa1.namd > apoa1_6.out). The default
>>> benchmark gave 0.049 s/step. I changed the outputEnergies and outputTiming
>>> values to 1000 and extended the run to 10000 steps and got 0.038 s/step.
>>>
>>> If I run the last simulation with 1 core and 1 GPU (./namd2 +idlepoll +p1
>>> +devices 0 apoa1.namd > apoa1_1.out) I get 0.122 s/step.
>>>
>>> Hope this helps.
>>>
>>> PS, if anyone is interested, I ran multiple simultaneous runs with
>>> different combinations of CPU and GPU allocations and obtained the following
>>> results:
>>>
>>> Apoa1 (10,000 steps, timestep = 1, outputs at 1000steps)
>>> 1 run (12xThreads 4xM2090) = 0.015 s/step
>>> 1 run (24xThreads 4xM2090) = 0.016 s/step
>>> 2 runs (6xThreads, 2xM2090) each = 0.027 s/step
>>> 2 runs (12xThreads, 4xM2090 shared) = 0.026 s/step
>>> 4 runs (3xThreads, 1xM2090) each = 0.051 s/step
>>> 4 runs (6xThreads, 4xM2090 shared) = 0.046 s/step
>>> 8 runs (3xThreads, 4xM2090 shared) = 0.088 s/step
>>>
>>> (Hyperthreading is ON)
>>>
>>> Cheers,
>>>
>>> Jason A. Roberts
>>> Senior Medical Scientist
>>> National Enterovirus Reference Laboratory
>>> WHO Poliomyelitis Regional Reference Laboratory
>>> VIDRL, 10 Wreckyn Street,
>>> North Melbourne, Australia, 3051
>>> Phone: +613 9342 2607
>>> Fax: +613 9342 2665
>>> email: polio_at_mh.org.au (lab enquiries)
>>> web site: www.vidrl.org.au
>>>
>>> Date: Fri, 14 Sep 2012 09:50:41 -0400
>>> From: Guanglei Cui <amber.mail.archive_at_gmail.com>
>>> Subject: Re: namd-l: GTX-660 Ti benchmark
>>>
>>> Hi,
>>>
>>> I'm curious what kind of performance I should expect from a M2090 card
>>> (Intel Xeon X5670, CentOS 5.8). With 1 CPU and 1GPU, I get 0.11 s/step on
>>> Apoa1 (2000 steps, timestep 1) using the namd2.9 multicore CUDA binary from
>>> the NAMD website. I suspect this is a reasonable speed. I wonder if someone
>>> would kindly point out what a reasonable expectation is for this type of
>>> setup, and how to achieve that. Thanks very much.
>>>
>>> Guanglei
>>>
>>> On Thu, Sep 13, 2012 at 11:10 PM, Wenyu Zhong <wenyuzhong_at_gmail.com>
>>> wrote:
>>>> Sorry, a correction.
>>>>
>>>> The power consumption with i5_at_3.7G+660ti running apoa1 is about 200w,
>>>> and with i5_at_3.7G+2*460 is about 260w.
>>>>
>>>> Wenyu
>>>
>>>
>>> - --
>>> Guanglei Cui
>>>
>>>
>>
>> --
>> Guanglei Cui
>>
>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:05 CST