AW: GTX-660 Ti benchmark

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Sep 18 2012 - 01:36:06 CDT

Next message: Norman Geist: "AW: periodicity error"
Previous message: Andrey: "Re: psfgen and CHARMM19 explicit exclusions"
In reply to: Aron Broom: "Re: GTX-660 Ti benchmark"
Next in thread: Guanglei Cui: "Re: GTX-660 Ti benchmark"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hello,

Just some comments:

Nvidias workstation series are called Quadro, so it's just wrong to call the
professional HPC Tesla series a workstation card and also to confuse them
with consumer hardware. The workstation cards are also consumer hardware,
the Tesla cards are non-consumer hardware.

So:

GTX - consumer - gaming

Quadro - consumer - workstation

Tesla - professional - HPC

But I confirm with the other points you mentioned. Of course the gaming
cards have higher clocks and therefore better performance, as they are
meant for gaming and people don't care about power consumption and heat
emission. Also the ECC slows the Tesla a little. But a professional
computing centre can't use these overclocked gaming cards without heavy
cooling and their lack of administration abilities. Of course for some nodes
only, or a workstation, it's ok to stay with the consumer hardware, in
professional space, they are not the best choice IMHO.

Regards

Norman Geist.

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Aron Broom
Gesendet: Dienstag, 18. September 2012 03:57
An: Guanglei Cui
Cc: namd-l_at_ks.uiuc.edu
Betreff: Re: namd-l: GTX-660 Ti benchmark

guanglei,

just a quick point to make about cards: keep in mind that the very expensive
workstation cards aren't actually any faster than the consumer counterparts.
For instance, a GTX580 vs. an M2090, the 580 has the same number of cores
and actually faster clock and memory speeds. The M2090 has more memory and
that memory has error correcting code, hence the extra bucks. For the
kepler series (I'm not sure the workstation cards are out yet?) the consumer
cards will also be faster than the workstation ones at least in terms of
single precision, but I think it's supposed to be the reverse for double
precision.

~Aron

On Mon, Sep 17, 2012 at 4:35 PM, Guanglei Cui <amber.mail.archive_at_gmail.com>
wrote:

Hi Jason and Thomas,

Thanks very much for your input. This is very useful, as I was
struggling to gauge my expectations on the GPU workstation we have
since I have no comparison. It seems Jason may have a similar hardware
setup. The OS installed here is Centos5.8. I'm not sure if this
matters.

Thomas, if your timing was from 1GPU/1CPU, I'd be thoroughly upset
'cause that is almost twice as fast as I could get on a much more
expensive card. Would you be able to share additional information on
your OS and any configurations that matter?

Regards,
Guanglei

On Sun, Sep 16, 2012 at 6:08 PM, Roberts, Jason <Jason.Roberts_at_mh.org.au>
wrote:
> Hi Guanglei,
>
> We are running a 2U rack (2x Xeon E5645, 4xM2090) and although I don't
have the same setup I ran the Apoa1 benchmark allocating 6 cores and 1 M2090
(./namd2 +idlepoll +p6 +devices 0 apoa1.namd > apoa1_6.out). The default
benchmark gave 0.049 s/step. I changed the outputEnergies and outputTiming
values to 1000 and extended the run to 10000 steps and got 0.038 s/step.
>
> If I run the last simulation with 1 core and 1 GPU (./namd2 +idlepoll +p1
+devices 0 apoa1.namd > apoa1_1.out) I get 0.122 s/step.
>
> Hope this helps.
>
> PS, if anyone is interested, I ran multiple simultaneous runs with
different combinations of CPU and GPU allocations and obtained the following
results:
>
> Apoa1 (10,000 steps, timestep = 1, outputs at 1000steps)
> 1 run (12xThreads 4xM2090) = 0.015 s/step
> 1 run (24xThreads 4xM2090) = 0.016 s/step
> 2 runs (6xThreads, 2xM2090) each = 0.027 s/step
> 2 runs (12xThreads, 4xM2090 shared) = 0.026 s/step
> 4 runs (3xThreads, 1xM2090) each = 0.051 s/step
> 4 runs (6xThreads, 4xM2090 shared) = 0.046 s/step
> 8 runs (3xThreads, 4xM2090 shared) = 0.088 s/step
>
> (Hyperthreading is ON)
>
> Cheers,
>
> Jason A. Roberts
> Senior Medical Scientist
> National Enterovirus Reference Laboratory
> WHO Poliomyelitis Regional Reference Laboratory
> VIDRL, 10 Wreckyn Street,
> North Melbourne, Australia, 3051
> Phone: +613 9342 2607
> Fax: +613 9342 2665
> email: polio_at_mh.org.au (lab enquiries)
> web site: www.vidrl.org.au
>
> Date: Fri, 14 Sep 2012 09:50:41 -0400
> From: Guanglei Cui <amber.mail.archive_at_gmail.com>
> Subject: Re: namd-l: GTX-660 Ti benchmark
>
> Hi,
>
> I'm curious what kind of performance I should expect from a M2090 card
(Intel Xeon X5670, CentOS 5.8). With 1 CPU and 1GPU, I get 0.11 s/step on
Apoa1 (2000 steps, timestep 1) using the namd2.9 multicore CUDA binary from
the NAMD website. I suspect this is a reasonable speed. I wonder if someone
would kindly point out what a reasonable expectation is for this type of
setup, and how to achieve that. Thanks very much.
>
> Guanglei
>
> On Thu, Sep 13, 2012 at 11:10 PM, Wenyu Zhong <wenyuzhong_at_gmail.com>
wrote:
>> Sorry, a correction.
>>
>> The power consumption with i5_at_3.7G+660ti running apoa1 is about 200w,
>> and with i5_at_3.7G+2*460 is about 260w.
>>
>> Wenyu
>
>
>
> - --
> Guanglei Cui
>
>

--
Guanglei Cui
-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

Next message: Norman Geist: "AW: periodicity error"
Previous message: Andrey: "Re: psfgen and CHARMM19 explicit exclusions"
In reply to: Aron Broom: "Re: GTX-660 Ti benchmark"
Next in thread: Guanglei Cui: "Re: GTX-660 Ti benchmark"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:05 CST