Re: Three GPU cards on shared-mem motherboard

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Wed May 30 2012 - 10:33:14 CDT

Putting together what you (very interestingly) said, could you suggest
consumer motherboards that support PCI-E 3.0 and the socket you
suggest? I know NAMD, I have much work begun with that code, and which
awaits development. Don't need higher than single precision, as it
will be a long way before ab-initio multireference code will be made
available on GPUs (also, because of the increasing prices for
electricity in the country where I live, I have set aside all
multi-CPU servers with large power sources), and I have no plans for
DFT-MD, which would not fit the multireference species of my interest
(such as, simply, triplet oxygen). Thus, GTX-690 (not so much GTX-680,
as far as I can see from gaming benchmarks, for what they can tell) is
alluring. I tried in the past GROMACS (for which amd64 Debian on my
computers provides packages) by I found difficult to go on,
particularly as to the parameterization of new, unusual molecules.

Thanks
francesco pietra

On Wed, May 30, 2012 at 4:14 PM, Aron Broom <broomsday_at_gmail.com> wrote:
> I haven't seen GTX600 series benchmarks for NAMD, those would be very nice.
> These new consumer 600 series cards have substantially less double precision
> computational power than the 500 series, but a lot more single precision.
> From my understanding, NAMD and OpenMM/GROMACS do all single precision on
> the GPU, so you might see a tremendous speedup on 600 series, but that is
> quite speculative.
>
> OpenCL is ~equivalent, or even faster than CUDA for OpenMM/GROMACS.  OpenCL
> used to be much slower, but there have been a lot of improvements.  That
> being said, I don't know that Radeon cards are necessarily faster than
> nVidia cards for OpenMM/GROMACS.
>
> Two points I would make, maybe they aren't relevant to your particular
> situation, but in case others find this thread:
>
> 1) If you are going to get the 600 series GTX cards, and you want to use
> NAMD, you should really get a motherboard that supports PCI-E 3.0 rather
> than 2.0.  This is because the new GTX680 actually has only 256-bit
> bandwidth compared to 384-bit for the GTX580, but PCI-E 3.0 allows double
> the transfer rate compared with 2.0, so you actually come out ahead.  For
> AMBER or OpenMM/GROMACS I'm not sure this is that critical, but for NAMD,
> because the tasks are split between the CPU and GPU, you need communication
> every step, and so that bandwidth is likely the limiting performance factor.
>
> 2) If you are making a machine for doing MD, and trying to save money, I
> would never buy intel CPUs, you generally pay for a lot of features (like
> hyperthreading) that you will not be using.  I would instead see if you can
> get a hold of any of the older AMD Thuban 6-core chips, they used to sell
> the 3.2GHz one for ~$150-200.  I think the newer bulldozer AMD chips aren't
> doing that well, so I'm not sure I would recommend those.
>
> ~Aron
>
>
> On Wed, May 30, 2012 at 3:45 AM, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>>
>> Norman:
>> Thanks indeed. Because of the poor economic situation of the country
>> where I am currently based, for the moment I have to stick to the
>> consumer board. Possibly only upgrading to the Intel socket LGA2011
>> board with Core i7-3930K or i7-3960X with 6 physical CPUs and four
>> memory controllers instead of two for AMD in the board GA-890FXA-UDC5
>> I have now. That, if more memory bandwidth is needed for GPU higher
>> that the two GTX-580 I have.
>>
>> However, are there benchmarks with NAMD (or other MD code) that show
>> GTX-680 or GTX-690 faster enough than GTX-580 to justify the money?
>>
>> Thanks
>>
>> francesco pietra
>>
>> On Wed, May 30, 2012 at 8:55 AM, Norman Geist
>> <norman.geist_at_uni-greifswald.de> wrote:
>> > Hi Francesco,
>> >
>> > I just wanted to share what I know about the Radeon cards. As far as I
>> > know, they do _NOT_ support CUDA, only OpenCL which can run on both
>> > hardware. Namd is written in CUDA so it cannot run with non Nvidia cards.
>> > ACML for example is written in OpenCL. There were benchmarks that showed
>> > that OpenCL is faster on ATI cards than on Nvidia cards, but still CUDA is
>> > faster than OpenCL.
>> >
>> > So I think you won't be able to run NAMD on ATI cards.
>> >
>> > You are maybe also interested in the machines from FluiDyna that support
>> > up to 8 GPU cards.
>> > Also you will maybe find a motherboard that fits your needs better than
>> > this consumer/gamer hardware.
>> >
>> > Best wishes
>> >
>> > Norman Geist.
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> >> Auftrag von Francesco Pietra
>> >> Gesendet: Mittwoch, 30. Mai 2012 00:00
>> >> An: Axel Kohlmeyer; NAMD
>> >> Betreff: Re: namd-l: Three GPU cards on shared-mem motherboard
>> >>
>> >> To the extent that a reader may be interested here in consumer
>> >> mainboards:
>> >>
>> >> After much looking around, it came out that consumer mainboards are
>> >> limited to two real x16 2.0. The 990FXA-GD80, with declared four x16
>> >> 2.0, is the only exception I was able to find, albeit a suspicious
>> >> one.
>> >>
>> >> Thus, I am inclined to stick at the GA-890FXA-UD5 I have, which
>> >> performs quite well in MD/CUDA with two GTX-580 (the new Gigabyte that
>> >> replaces this one is still at two x16 2.0). What I would like to do is
>> >> replacing the two GTX-580 with faster cards. I can find a good
>> >> arrangement for that. Unease decision, however, unless someone comes
>> >> out here with classical molecular dynamics benchmarks for recent GPU
>> >> cards.
>> >>
>> >> From tests for gaming, most often carried out on OpenCL rather than
>> >> CUDA, the Radeon HD 7970 wins over GTX-580 by a factor of two, and
>> >> even more on GTX-680 in LuxMark's OpenCL-driven ray-tracing test. In
>> >> other game tests the difference is modest:
>> >> http://techreport.com/articles.x/22653/7
>> >>
>> >> Even the very expensive GTX-690 is outperformed by Radeon HD 7970 in
>> >> LuxMark's OpenCL-driven ray-tracing test:
>> >> http://www.hardwareluxx.de/index.php...i.html?start=5.
>> >>
>> >> What would be needed at this point is a benchmark for Radeon HD 7970
>> >> with CUDA/MD.
>> >>
>> >> At any event, whether the memory bandwidth of my GA-890FXA-UD5 is
>> >> enough for two HD 7970, or an Intel socket LGA2011 board is needed
>> >> with Core i7-3930K or i7-3960X (6 physical CPUs) and four memory
>> >> controllers instead of two for AMD, is another issue that I am also
>> >> unable to take.
>> >>
>> >> I would be very grateful for comments on these points. Doubling the
>> >> speed of the simulation (as it occurred when I replaced the GTX-470
>> >> with GTX-580) is worth the money.
>> >>
>> >> francesco pietra
>> >>
>> >>
>> >> On Mon, May 28, 2012 at 6:20 PM, Axel Kohlmeyer <akohlmey_at_gmail.com>
>> >> wrote:
>> >> > On Mon, May 28, 2012 at 12:03 PM, Francesco Pietra
>> >> > <chiendarret_at_gmail.com> wrote:
>> >> >
>> >> >>> When referring to NAMD, I wanted to imply (badly, I admit)
>> >> performance
>> >> >>> boost by the third GPU.
>> >> >
>> >> > as i was mentioning before. that is near impossible to predict.
>> >> >
>> >> >>> The  PCI specification is described by the manufacturer as follows
>> >> >>>
>> >> >>> -- PCI Express slots version: 2.0.
>> >> >>>
>> >> >>> -- PCI slots: 1.
>> >> >>>
>> >> >>> -- PCI express x1 slots: 1.
>> >> >>>
>> >> >>> -- PCI express x16 slots: 4.
>> >> >
>> >> > that doesn't mean anything. labeling slots as x16
>> >> > only means that you can stick an x16 wide card
>> >> > into it. each of these slots can be wired with 16,
>> >> > 8, 4, 2 or 1 lane. also, some boards claim they
>> >> > have all 16-lane slots, but then two slots are
>> >> > connected to a little bridge chip. resulting in
>> >> > two cards each having to share the bandwidth.
>> >> >
>> >> >>> Whether these are real x16 2.x, or not, is beyond my understanding.
>> >> I
>> >> >
>> >> > with out that information, you can't judge.
>> >> > contact the vendor or find somebody that
>> >> > has time to research it.
>> >> >
>> >> >>> can only compare with the corresponding description for the
>> >> mainboard
>> >> >>> I am currently using: GA-890FXA-UD5:
>> >> >>>
>> >> >>> 2 x PCI Express x16, running at x16 (PCIEX16_1, PCIEX16_2).
>> >> >>>
>> >> >>> 1 x PCI Express x16 slot, running at x8 (PCIEX8).
>> >> >>>
>> >> >>> 1 x PCI Express x16 slot, running at x4 (PCIEX4).
>> >> >>>
>> >> >>> 2 x PCI Express x1 slots.
>> >> >>>  (All PCI Express slots conform to the PCI Express 2.0)
>> >> >>>
>> >> >>> 1 x PCI slot.
>> >> >>>
>> >> >>> With this latter mainboard, adding a second GTX-580 gave the
>> >> expected
>> >> >>> acceleration. Data for PCIs of the two mainboards being comparable,
>> >> I
>> >> >>> would expect that a third GTX-580 on the 990.. motherboard should
>> >> play
>> >> >>> well its job. Is it this naive extrapolaion a sound one?
>> >> >
>> >> > no. you usually overload the memory bandwidth of the CPU
>> >> > with the third GPU and thus you won't get the full speedup.
>> >> > how much speedup you'll get depends on the individual
>> >> > characteristics of your input.
>> >> >
>> >> > axel.
>> >> >
>> >> >>>
>> >> >>> Thanks indeed for further advice
>> >> >>>
>> >> >>> francesco pietra
>> >> >>>
>> >> > --
>> >> > Dr. Axel Kohlmeyer
>> >> > akohlmey_at_gmail.com  http://goo.gl/1wk0
>> >> >
>> >> > College of Science and Technology
>> >> > Temple University, Philadelphia PA, USA.
>> >
>> >
>>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:35 CST