Fwd: Three GPU cards on shared-mem motherboard

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Wed May 30 2012 - 12:33:47 CDT

Forgot the list previously. Sorry
f.

---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Wed, May 30, 2012 at 7:28 PM
Subject: Re: namd-l: Three GPU cards on shared-mem motherboard
To: Vincent Leroux <vincent.leroux_at_loria.fr>

On Wed, May 30, 2012 at 6:14 PM, Vincent Leroux <vincent.leroux_at_loria.fr> wrote:
> Hi,
>
> A GTX690 basically is two GTX680 chips on a single board. No surprise, this
> is twice as expensive. You may have a hard time finding one on the market.
> In addition, I am not sure you can put more than two on a single
> motherboard, this may be technically impossible. And if you have two you
> have to make sure the motherboard design leaves enough space between the two
> cards, if they are too close the top one will probably die very quickly.

Actually, the GA-890FXA-UD5 I have puts the two GTX-580 I have close
to one another. However, The Antec TwelveHundred case is very
efficient (have a look at the design) and the two cards never go above
85 centigrade. I check regularly the cards with cuda-memtest and no
problem came out. The boost of the second 580 was tremendous. I can't
give figures, short of time.
>
> The Quadros or the Teslas are indeed optimized for double precision and
> failsafe operation,

As I wrote, I can't have interest in double precision.

but they are very expensive, and AFAIK they are based
> off the previous generations of nVidia GPUs, so even if the GTX680/690
> performance suffers from not being double precision-optimized it may still
> be more efficient at the present time for MD simulations.
>
> But in any case, are you sure your problem will be PCI bandwidth rather than
> CPU? While I agree that generally AMD CPUs offer a better performance/value
> ratio, I am unsure a single 6-core CPU will be enough, even if the GPU does
> most of the job...

It has been competently posted here often that two CPUs per GPU card
are enough. Do two GTX-580 make four cards? If so you may be right.
And I can't demostrate that I am getting all that the GTX-580 could
do.

> In addition, you will probably need to have very large
> systems

Should I had small systems, I would be al multi CPUs.

>so that the simulation will scale well across two or more GTX680
> units. I would suggest you build a system with a single GTX680 card.
I have two GTX-580. Should I go to a single GTX-680? I am not mad (as yet).
Cheers
francesco

 If
> performance is good enough (and please post results on the mailing list, I
> too would be interested) and you have more money, you may want to build the
> same system again rather than adding another GTX680...
>
> Regards
> VL
>
>
>
> On 30/05/2012 17:33, Francesco Pietra wrote:
>>
>> Putting together what you (very interestingly) said, could you suggest
>> consumer motherboards that support PCI-E 3.0 and the socket you
>> suggest? I know NAMD, I have much work begun with that code, and which
>> awaits development. Don't need higher than single precision, as it
>> will be a long way before ab-initio multireference code will be made
>> available on GPUs (also, because of the increasing prices for
>> electricity in the country where I live, I have set aside all
>> multi-CPU servers with large power sources), and I have no plans for
>> DFT-MD, which would not fit the multireference species of my interest
>> (such as, simply, triplet oxygen). Thus, GTX-690 (not so much GTX-680,
>> as far as I can see from gaming benchmarks, for what they can tell) is
>> alluring. I tried in the past GROMACS (for which amd64 Debian on my
>> computers provides packages) by I found difficult to go on,
>> particularly as to the parameterization of new, unusual molecules.
>>
>> Thanks
>> francesco pietra
>>
>> On Wed, May 30, 2012 at 4:14 PM, Aron Broom<broomsday_at_gmail.com>  wrote:
>>>
>>> I haven't seen GTX600 series benchmarks for NAMD, those would be very
>>> nice.
>>> These new consumer 600 series cards have substantially less double
>>> precision
>>> computational power than the 500 series, but a lot more single precision.
>>>  From my understanding, NAMD and OpenMM/GROMACS do all single precision
>>> on
>>> the GPU, so you might see a tremendous speedup on 600 series, but that is
>>> quite speculative.
>>>
>>> OpenCL is ~equivalent, or even faster than CUDA for OpenMM/GROMACS.
>>>  OpenCL
>>> used to be much slower, but there have been a lot of improvements.  That
>>> being said, I don't know that Radeon cards are necessarily faster than
>>> nVidia cards for OpenMM/GROMACS.
>>>
>>> Two points I would make, maybe they aren't relevant to your particular
>>> situation, but in case others find this thread:
>>>
>>> 1) If you are going to get the 600 series GTX cards, and you want to use
>>> NAMD, you should really get a motherboard that supports PCI-E 3.0 rather
>>> than 2.0.  This is because the new GTX680 actually has only 256-bit
>>> bandwidth compared to 384-bit for the GTX580, but PCI-E 3.0 allows double
>>> the transfer rate compared with 2.0, so you actually come out ahead.  For
>>> AMBER or OpenMM/GROMACS I'm not sure this is that critical, but for NAMD,
>>> because the tasks are split between the CPU and GPU, you need
>>> communication
>>> every step, and so that bandwidth is likely the limiting performance
>>> factor.
>>>
>>> 2) If you are making a machine for doing MD, and trying to save money, I
>>> would never buy intel CPUs, you generally pay for a lot of features (like
>>> hyperthreading) that you will not be using.  I would instead see if you
>>> can
>>> get a hold of any of the older AMD Thuban 6-core chips, they used to sell
>>> the 3.2GHz one for ~$150-200.  I think the newer bulldozer AMD chips
>>> aren't
>>> doing that well, so I'm not sure I would recommend those.
>>>
>>> ~Aron
>>>
>>>
>>> On Wed, May 30, 2012 at 3:45 AM, Francesco Pietra<chiendarret_at_gmail.com>
>>> wrote:
>>>>
>>>>
>>>> Norman:
>>>> Thanks indeed. Because of the poor economic situation of the country
>>>> where I am currently based, for the moment I have to stick to the
>>>> consumer board. Possibly only upgrading to the Intel socket LGA2011
>>>> board with Core i7-3930K or i7-3960X with 6 physical CPUs and four
>>>> memory controllers instead of two for AMD in the board GA-890FXA-UDC5
>>>> I have now. That, if more memory bandwidth is needed for GPU higher
>>>> that the two GTX-580 I have.
>>>>
>>>> However, are there benchmarks with NAMD (or other MD code) that show
>>>> GTX-680 or GTX-690 faster enough than GTX-580 to justify the money?
>>>>
>>>> Thanks
>>>>
>>>> francesco pietra
>>>>
>>>> On Wed, May 30, 2012 at 8:55 AM, Norman Geist
>>>> <norman.geist_at_uni-greifswald.de>  wrote:
>>>>>
>>>>> Hi Francesco,
>>>>>
>>>>> I just wanted to share what I know about the Radeon cards. As far as I
>>>>> know, they do _NOT_ support CUDA, only OpenCL which can run on both
>>>>> hardware. Namd is written in CUDA so it cannot run with non Nvidia
>>>>> cards.
>>>>> ACML for example is written in OpenCL. There were benchmarks that
>>>>> showed
>>>>> that OpenCL is faster on ATI cards than on Nvidia cards, but still CUDA
>>>>> is
>>>>> faster than OpenCL.
>>>>>
>>>>> So I think you won't be able to run NAMD on ATI cards.
>>>>>
>>>>> You are maybe also interested in the machines from FluiDyna that
>>>>> support
>>>>> up to 8 GPU cards.
>>>>> Also you will maybe find a motherboard that fits your needs better than
>>>>> this consumer/gamer hardware.
>>>>>
>>>>> Best wishes
>>>>>
>>>>> Norman Geist.
>>>>>
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>>>>>> Auftrag von Francesco Pietra
>>>>>> Gesendet: Mittwoch, 30. Mai 2012 00:00
>>>>>> An: Axel Kohlmeyer; NAMD
>>>>>> Betreff: Re: namd-l: Three GPU cards on shared-mem motherboard
>>>>>>
>>>>>> To the extent that a reader may be interested here in consumer
>>>>>> mainboards:
>>>>>>
>>>>>> After much looking around, it came out that consumer mainboards are
>>>>>> limited to two real x16 2.0. The 990FXA-GD80, with declared four x16
>>>>>> 2.0, is the only exception I was able to find, albeit a suspicious
>>>>>> one.
>>>>>>
>>>>>> Thus, I am inclined to stick at the GA-890FXA-UD5 I have, which
>>>>>> performs quite well in MD/CUDA with two GTX-580 (the new Gigabyte that
>>>>>> replaces this one is still at two x16 2.0). What I would like to do is
>>>>>> replacing the two GTX-580 with faster cards. I can find a good
>>>>>> arrangement for that. Unease decision, however, unless someone comes
>>>>>> out here with classical molecular dynamics benchmarks for recent GPU
>>>>>> cards.
>>>>>>
>>>>>>  From tests for gaming, most often carried out on OpenCL rather than
>>>>>> CUDA, the Radeon HD 7970 wins over GTX-580 by a factor of two, and
>>>>>> even more on GTX-680 in LuxMark's OpenCL-driven ray-tracing test. In
>>>>>> other game tests the difference is modest:
>>>>>> http://techreport.com/articles.x/22653/7
>>>>>>
>>>>>> Even the very expensive GTX-690 is outperformed by Radeon HD 7970 in
>>>>>> LuxMark's OpenCL-driven ray-tracing test:
>>>>>> http://www.hardwareluxx.de/index.php...i.html?start=5.
>>>>>>
>>>>>> What would be needed at this point is a benchmark for Radeon HD 7970
>>>>>> with CUDA/MD.
>>>>>>
>>>>>> At any event, whether the memory bandwidth of my GA-890FXA-UD5 is
>>>>>> enough for two HD 7970, or an Intel socket LGA2011 board is needed
>>>>>> with Core i7-3930K or i7-3960X (6 physical CPUs) and four memory
>>>>>> controllers instead of two for AMD, is another issue that I am also
>>>>>> unable to take.
>>>>>>
>>>>>> I would be very grateful for comments on these points. Doubling the
>>>>>> speed of the simulation (as it occurred when I replaced the GTX-470
>>>>>> with GTX-580) is worth the money.
>>>>>>
>>>>>> francesco pietra
>>>>>>
>>>>>>
>>>>>> On Mon, May 28, 2012 at 6:20 PM, Axel Kohlmeyer<akohlmey_at_gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Mon, May 28, 2012 at 12:03 PM, Francesco Pietra
>>>>>>> <chiendarret_at_gmail.com>  wrote:
>>>>>>>
>>>>>>>>> When referring to NAMD, I wanted to imply (badly, I admit)
>>>>>>
>>>>>> performance
>>>>>>>>>
>>>>>>>>> boost by the third GPU.
>>>>>>>
>>>>>>>
>>>>>>> as i was mentioning before. that is near impossible to predict.
>>>>>>>
>>>>>>>>> The  PCI specification is described by the manufacturer as follows
>>>>>>>>>
>>>>>>>>> -- PCI Express slots version: 2.0.
>>>>>>>>>
>>>>>>>>> -- PCI slots: 1.
>>>>>>>>>
>>>>>>>>> -- PCI express x1 slots: 1.
>>>>>>>>>
>>>>>>>>> -- PCI express x16 slots: 4.
>>>>>>>
>>>>>>>
>>>>>>> that doesn't mean anything. labeling slots as x16
>>>>>>> only means that you can stick an x16 wide card
>>>>>>> into it. each of these slots can be wired with 16,
>>>>>>> 8, 4, 2 or 1 lane. also, some boards claim they
>>>>>>> have all 16-lane slots, but then two slots are
>>>>>>> connected to a little bridge chip. resulting in
>>>>>>> two cards each having to share the bandwidth.
>>>>>>>
>>>>>>>>> Whether these are real x16 2.x, or not, is beyond my understanding.
>>>>>>
>>>>>> I
>>>>>>>
>>>>>>>
>>>>>>> with out that information, you can't judge.
>>>>>>> contact the vendor or find somebody that
>>>>>>> has time to research it.
>>>>>>>
>>>>>>>>> can only compare with the corresponding description for the
>>>>>>
>>>>>> mainboard
>>>>>>>>>
>>>>>>>>> I am currently using: GA-890FXA-UD5:
>>>>>>>>>
>>>>>>>>> 2 x PCI Express x16, running at x16 (PCIEX16_1, PCIEX16_2).
>>>>>>>>>
>>>>>>>>> 1 x PCI Express x16 slot, running at x8 (PCIEX8).
>>>>>>>>>
>>>>>>>>> 1 x PCI Express x16 slot, running at x4 (PCIEX4).
>>>>>>>>>
>>>>>>>>> 2 x PCI Express x1 slots.
>>>>>>>>>  (All PCI Express slots conform to the PCI Express 2.0)
>>>>>>>>>
>>>>>>>>> 1 x PCI slot.
>>>>>>>>>
>>>>>>>>> With this latter mainboard, adding a second GTX-580 gave the
>>>>>>
>>>>>> expected
>>>>>>>>>
>>>>>>>>> acceleration. Data for PCIs of the two mainboards being comparable,
>>>>>>
>>>>>> I
>>>>>>>>>
>>>>>>>>> would expect that a third GTX-580 on the 990.. motherboard should
>>>>>>
>>>>>> play
>>>>>>>>>
>>>>>>>>> well its job. Is it this naive extrapolaion a sound one?
>>>>>>>
>>>>>>>
>>>>>>> no. you usually overload the memory bandwidth of the CPU
>>>>>>> with the third GPU and thus you won't get the full speedup.
>>>>>>> how much speedup you'll get depends on the individual
>>>>>>> characteristics of your input.
>>>>>>>
>>>>>>> axel.
>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks indeed for further advice
>>>>>>>>>
>>>>>>>>> francesco pietra
>>>>>>>>>
>>>>>>> --
>>>>>>> Dr. Axel Kohlmeyer
>>>>>>> akohlmey_at_gmail.com  http://goo.gl/1wk0
>>>>>>>
>>>>>>> College of Science and Technology
>>>>>>> Temple University, Philadelphia PA, USA.
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Aron Broom M.Sc
>>> PhD Student
>>> Department of Chemistry
>>> University of Waterloo
>>>
>>
>>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:35 CST