From: Guanglei Cui (amber.mail.archive_at_gmail.com)
Date: Mon Sep 17 2012 - 15:35:34 CDT
Hi Jason and Thomas,
Thanks very much for your input. This is very useful, as I was
struggling to gauge my expectations on the GPU workstation we have
since I have no comparison. It seems Jason may have a similar hardware
setup. The OS installed here is Centos5.8. I'm not sure if this
matters.
Thomas, if your timing was from 1GPU/1CPU, I'd be thoroughly upset
'cause that is almost twice as fast as I could get on a much more
expensive card. Would you be able to share additional information on
your OS and any configurations that matter?
Regards,
Guanglei
On Sun, Sep 16, 2012 at 6:08 PM, Roberts, Jason <Jason.Roberts_at_mh.org.au> wrote:
> Hi Guanglei,
>
> We are running a 2U rack (2x Xeon E5645, 4xM2090) and although I don't have the same setup I ran the Apoa1 benchmark allocating 6 cores and 1 M2090 (./namd2 +idlepoll +p6 +devices 0 apoa1.namd > apoa1_6.out). The default benchmark gave 0.049 s/step. I changed the outputEnergies and outputTiming values to 1000 and extended the run to 10000 steps and got 0.038 s/step.
>
> If I run the last simulation with 1 core and 1 GPU (./namd2 +idlepoll +p1 +devices 0 apoa1.namd > apoa1_1.out) I get 0.122 s/step.
>
> Hope this helps.
>
> PS, if anyone is interested, I ran multiple simultaneous runs with different combinations of CPU and GPU allocations and obtained the following results:
>
> Apoa1 (10,000 steps, timestep = 1, outputs at 1000steps)
> 1 run (12xThreads 4xM2090) = 0.015 s/step
> 1 run (24xThreads 4xM2090) = 0.016 s/step
> 2 runs (6xThreads, 2xM2090) each = 0.027 s/step
> 2 runs (12xThreads, 4xM2090 shared) = 0.026 s/step
> 4 runs (3xThreads, 1xM2090) each = 0.051 s/step
> 4 runs (6xThreads, 4xM2090 shared) = 0.046 s/step
> 8 runs (3xThreads, 4xM2090 shared) = 0.088 s/step
>
> (Hyperthreading is ON)
>
> Cheers,
>
> Jason A. Roberts
> Senior Medical Scientist
> National Enterovirus Reference Laboratory
> WHO Poliomyelitis Regional Reference Laboratory
> VIDRL, 10 Wreckyn Street,
> North Melbourne, Australia, 3051
> Phone: +613 9342 2607
> Fax: +613 9342 2665
> email: polio_at_mh.org.au (lab enquiries)
> web site: www.vidrl.org.au
>
> Date: Fri, 14 Sep 2012 09:50:41 -0400
> From: Guanglei Cui <amber.mail.archive_at_gmail.com>
> Subject: Re: namd-l: GTX-660 Ti benchmark
>
> Hi,
>
> I'm curious what kind of performance I should expect from a M2090 card (Intel Xeon X5670, CentOS 5.8). With 1 CPU and 1GPU, I get 0.11 s/step on Apoa1 (2000 steps, timestep 1) using the namd2.9 multicore CUDA binary from the NAMD website. I suspect this is a reasonable speed. I wonder if someone would kindly point out what a reasonable expectation is for this type of setup, and how to achieve that. Thanks very much.
>
> Guanglei
>
> On Thu, Sep 13, 2012 at 11:10 PM, Wenyu Zhong <wenyuzhong_at_gmail.com> wrote:
>> Sorry, a correction.
>>
>> The power consumption with i5_at_3.7G+660ti running apoa1 is about 200w,
>> and with i5_at_3.7G+2*460 is about 260w.
>>
>> Wenyu
>
>
>
> - --
> Guanglei Cui
>
>
-- Guanglei Cui
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:05 CST