Re: Three GPU cards on shared-mem motherboard

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Thu May 31 2012 - 00:53:33 CDT

On Thu, May 31, 2012 at 2:50 AM, Aron Broom <broomsday_at_gmail.com> wrote:
> I'm not aware of any motherboards that have 3 PCI-E 3.0 slots yet, there
> appear to be some with 2.

As I have abandoned the idea of installing three double GPU cards (I
could not have answered my request of specifications about the x16
lanes of the four x16-lanes-motherboard that I mentioned) on a
consumer motherboard, I would be interested in those with two x16 PCIe
3.0, in view of installing two GTX-690. Which brands?

It is unfortunate that the Radeon cards can not be used with CUDA.
They are so much less expensive than nvidia cards, while performing
even better with game software/OpenCL.

francesco.

> I haven't looked very hard, I imagine in a year
> or so it will be more common.
>
> To reiterate something someone else mentioned, I believe the 690, much like
> the 590, is just two 680 cards together.  It also, therefore, needs two
> PCI-E slots for the one card.  In the case of the 500 series cards, I found
> the 570 performed at about 85% of the 580 on a 6-cpu-core system with a 100k
> atom system being simulated in NAMD, but the price was ~66%.  I'm not sure
> if this will be true for the 670.
>
> As a final note, there has been some attention on the fact that the latency
> (which particular latency I'm not sure) on the 600 series is higher, but
> that for games that is more than counterbalanced by the additional cores.
> It's not clear to me that this would be the case necessarily for MD, so I
> would really urge trying to find some avid gamer with a 670 or 680 and
> getting them to do a quick benchmark before investing in those cards.
>
> ~Aron
>
>
> On Wed, May 30, 2012 at 1:33 PM, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>>
>> Forgot the list previously. Sorry
>> f.
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret_at_gmail.com>
>> Date: Wed, May 30, 2012 at 7:28 PM
>> Subject: Re: namd-l: Three GPU cards on shared-mem motherboard
>> To: Vincent Leroux <vincent.leroux_at_loria.fr>
>>
>>
>> On Wed, May 30, 2012 at 6:14 PM, Vincent Leroux <vincent.leroux_at_loria.fr>
>> wrote:
>> > Hi,
>> >
>> > A GTX690 basically is two GTX680 chips on a single board. No surprise,
>> > this
>> > is twice as expensive. You may have a hard time finding one on the
>> > market.
>> > In addition, I am not sure you can put more than two on a single
>> > motherboard, this may be technically impossible. And if you have two you
>> > have to make sure the motherboard design leaves enough space between the
>> > two
>> > cards, if they are too close the top one will probably die very quickly.
>>
>> Actually, the GA-890FXA-UD5 I have puts the two GTX-580 I have close
>> to one another. However, The Antec TwelveHundred case is very
>> efficient (have a look at the design) and the two cards never go above
>> 85 centigrade. I check regularly the cards with cuda-memtest and no
>> problem came out. The boost of the second 580 was tremendous. I can't
>> give figures, short of time.
>> >
>> > The Quadros or the Teslas are indeed optimized for double precision and
>> > failsafe operation,
>>
>> As I wrote, I can't have interest in double precision.
>>
>> but they are very expensive, and AFAIK they are based
>> > off the previous generations of nVidia GPUs, so even if the GTX680/690
>> > performance suffers from not being double precision-optimized it may
>> > still
>> > be more efficient at the present time for MD simulations.
>> >
>> > But in any case, are you sure your problem will be PCI bandwidth rather
>> > than
>> > CPU? While I agree that generally AMD CPUs offer a better
>> > performance/value
>> > ratio, I am unsure a single 6-core CPU will be enough, even if the GPU
>> > does
>> > most of the job...
>>
>> It has been competently posted here often that two CPUs per GPU card
>> are enough. Do two GTX-580 make four cards? If so you may be right.
>> And I can't demostrate that I am getting all that the GTX-580 could
>> do.
>>
>>
>> > In addition, you will probably need to have very large
>> > systems
>>
>> Should I had small systems, I would be al multi CPUs.
>>
>> >so that the simulation will scale well across two or more GTX680
>> > units. I would suggest you build a system with a single GTX680 card.
>> I have two GTX-580. Should I go to a single GTX-680? I am not mad (as
>> yet).
>> Cheers
>> francesco
>>
>>
>>  If
>> > performance is good enough (and please post results on the mailing list,
>> > I
>> > too would be interested) and you have more money, you may want to build
>> > the
>> > same system again rather than adding another GTX680...
>> >
>> > Regards
>> > VL
>> >
>> >
>> >
>> > On 30/05/2012 17:33, Francesco Pietra wrote:
>> >>
>> >> Putting together what you (very interestingly) said, could you suggest
>> >> consumer motherboards that support PCI-E 3.0 and the socket you
>> >> suggest? I know NAMD, I have much work begun with that code, and which
>> >> awaits development. Don't need higher than single precision, as it
>> >> will be a long way before ab-initio multireference code will be made
>> >> available on GPUs (also, because of the increasing prices for
>> >> electricity in the country where I live, I have set aside all
>> >> multi-CPU servers with large power sources), and I have no plans for
>> >> DFT-MD, which would not fit the multireference species of my interest
>> >> (such as, simply, triplet oxygen). Thus, GTX-690 (not so much GTX-680,
>> >> as far as I can see from gaming benchmarks, for what they can tell) is
>> >> alluring. I tried in the past GROMACS (for which amd64 Debian on my
>> >> computers provides packages) by I found difficult to go on,
>> >> particularly as to the parameterization of new, unusual molecules.
>> >>
>> >> Thanks
>> >> francesco pietra
>> >>
>> >> On Wed, May 30, 2012 at 4:14 PM, Aron Broom<broomsday_at_gmail.com>
>> >>  wrote:
>> >>>
>> >>> I haven't seen GTX600 series benchmarks for NAMD, those would be very
>> >>> nice.
>> >>> These new consumer 600 series cards have substantially less double
>> >>> precision
>> >>> computational power than the 500 series, but a lot more single
>> >>> precision.
>> >>>  From my understanding, NAMD and OpenMM/GROMACS do all single
>> >>> precision
>> >>> on
>> >>> the GPU, so you might see a tremendous speedup on 600 series, but that
>> >>> is
>> >>> quite speculative.
>> >>>
>> >>> OpenCL is ~equivalent, or even faster than CUDA for OpenMM/GROMACS.
>> >>>  OpenCL
>> >>> used to be much slower, but there have been a lot of improvements.
>> >>>  That
>> >>> being said, I don't know that Radeon cards are necessarily faster than
>> >>> nVidia cards for OpenMM/GROMACS.
>> >>>
>> >>> Two points I would make, maybe they aren't relevant to your particular
>> >>> situation, but in case others find this thread:
>> >>>
>> >>> 1) If you are going to get the 600 series GTX cards, and you want to
>> >>> use
>> >>> NAMD, you should really get a motherboard that supports PCI-E 3.0
>> >>> rather
>> >>> than 2.0.  This is because the new GTX680 actually has only 256-bit
>> >>> bandwidth compared to 384-bit for the GTX580, but PCI-E 3.0 allows
>> >>> double
>> >>> the transfer rate compared with 2.0, so you actually come out ahead.
>> >>>  For
>> >>> AMBER or OpenMM/GROMACS I'm not sure this is that critical, but for
>> >>> NAMD,
>> >>> because the tasks are split between the CPU and GPU, you need
>> >>> communication
>> >>> every step, and so that bandwidth is likely the limiting performance
>> >>> factor.
>> >>>
>> >>> 2) If you are making a machine for doing MD, and trying to save money,
>> >>> I
>> >>> would never buy intel CPUs, you generally pay for a lot of features
>> >>> (like
>> >>> hyperthreading) that you will not be using.  I would instead see if
>> >>> you
>> >>> can
>> >>> get a hold of any of the older AMD Thuban 6-core chips, they used to
>> >>> sell
>> >>> the 3.2GHz one for ~$150-200.  I think the newer bulldozer AMD chips
>> >>> aren't
>> >>> doing that well, so I'm not sure I would recommend those.
>> >>>
>> >>> ~Aron
>> >>>
>> >>>
>> >>> On Wed, May 30, 2012 at 3:45 AM, Francesco
>> >>> Pietra<chiendarret_at_gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>
>> >>>> Norman:
>> >>>> Thanks indeed. Because of the poor economic situation of the country
>> >>>> where I am currently based, for the moment I have to stick to the
>> >>>> consumer board. Possibly only upgrading to the Intel socket LGA2011
>> >>>> board with Core i7-3930K or i7-3960X with 6 physical CPUs and four
>> >>>> memory controllers instead of two for AMD in the board GA-890FXA-UDC5
>> >>>> I have now. That, if more memory bandwidth is needed for GPU higher
>> >>>> that the two GTX-580 I have.
>> >>>>
>> >>>> However, are there benchmarks with NAMD (or other MD code) that show
>> >>>> GTX-680 or GTX-690 faster enough than GTX-580 to justify the money?
>> >>>>
>> >>>> Thanks
>> >>>>
>> >>>> francesco pietra
>> >>>>
>> >>>> On Wed, May 30, 2012 at 8:55 AM, Norman Geist
>> >>>> <norman.geist_at_uni-greifswald.de>  wrote:
>> >>>>>
>> >>>>> Hi Francesco,
>> >>>>>
>> >>>>> I just wanted to share what I know about the Radeon cards. As far as
>> >>>>> I
>> >>>>> know, they do _NOT_ support CUDA, only OpenCL which can run on both
>> >>>>> hardware. Namd is written in CUDA so it cannot run with non Nvidia
>> >>>>> cards.
>> >>>>> ACML for example is written in OpenCL. There were benchmarks that
>> >>>>> showed
>> >>>>> that OpenCL is faster on ATI cards than on Nvidia cards, but still
>> >>>>> CUDA
>> >>>>> is
>> >>>>> faster than OpenCL.
>> >>>>>
>> >>>>> So I think you won't be able to run NAMD on ATI cards.
>> >>>>>
>> >>>>> You are maybe also interested in the machines from FluiDyna that
>> >>>>> support
>> >>>>> up to 8 GPU cards.
>> >>>>> Also you will maybe find a motherboard that fits your needs better
>> >>>>> than
>> >>>>> this consumer/gamer hardware.
>> >>>>>
>> >>>>> Best wishes
>> >>>>>
>> >>>>> Norman Geist.
>> >>>>>
>> >>>>>> -----Ursprüngliche Nachricht-----
>> >>>>>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> >>>>>> Auftrag von Francesco Pietra
>> >>>>>> Gesendet: Mittwoch, 30. Mai 2012 00:00
>> >>>>>> An: Axel Kohlmeyer; NAMD
>> >>>>>> Betreff: Re: namd-l: Three GPU cards on shared-mem motherboard
>> >>>>>>
>> >>>>>> To the extent that a reader may be interested here in consumer
>> >>>>>> mainboards:
>> >>>>>>
>> >>>>>> After much looking around, it came out that consumer mainboards are
>> >>>>>> limited to two real x16 2.0. The 990FXA-GD80, with declared four
>> >>>>>> x16
>> >>>>>> 2.0, is the only exception I was able to find, albeit a suspicious
>> >>>>>> one.
>> >>>>>>
>> >>>>>> Thus, I am inclined to stick at the GA-890FXA-UD5 I have, which
>> >>>>>> performs quite well in MD/CUDA with two GTX-580 (the new Gigabyte
>> >>>>>> that
>> >>>>>> replaces this one is still at two x16 2.0). What I would like to do
>> >>>>>> is
>> >>>>>> replacing the two GTX-580 with faster cards. I can find a good
>> >>>>>> arrangement for that. Unease decision, however, unless someone
>> >>>>>> comes
>> >>>>>> out here with classical molecular dynamics benchmarks for recent
>> >>>>>> GPU
>> >>>>>> cards.
>> >>>>>>
>> >>>>>>  From tests for gaming, most often carried out on OpenCL rather
>> >>>>>> than
>> >>>>>> CUDA, the Radeon HD 7970 wins over GTX-580 by a factor of two, and
>> >>>>>> even more on GTX-680 in LuxMark's OpenCL-driven ray-tracing test.
>> >>>>>> In
>> >>>>>> other game tests the difference is modest:
>> >>>>>> http://techreport.com/articles.x/22653/7
>> >>>>>>
>> >>>>>> Even the very expensive GTX-690 is outperformed by Radeon HD 7970
>> >>>>>> in
>> >>>>>> LuxMark's OpenCL-driven ray-tracing test:
>> >>>>>> http://www.hardwareluxx.de/index.php...i.html?start=5.
>> >>>>>>
>> >>>>>> What would be needed at this point is a benchmark for Radeon HD
>> >>>>>> 7970
>> >>>>>> with CUDA/MD.
>> >>>>>>
>> >>>>>> At any event, whether the memory bandwidth of my GA-890FXA-UD5 is
>> >>>>>> enough for two HD 7970, or an Intel socket LGA2011 board is needed
>> >>>>>> with Core i7-3930K or i7-3960X (6 physical CPUs) and four memory
>> >>>>>> controllers instead of two for AMD, is another issue that I am also
>> >>>>>> unable to take.
>> >>>>>>
>> >>>>>> I would be very grateful for comments on these points. Doubling the
>> >>>>>> speed of the simulation (as it occurred when I replaced the GTX-470
>> >>>>>> with GTX-580) is worth the money.
>> >>>>>>
>> >>>>>> francesco pietra
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, May 28, 2012 at 6:20 PM, Axel Kohlmeyer<akohlmey_at_gmail.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> On Mon, May 28, 2012 at 12:03 PM, Francesco Pietra
>> >>>>>>> <chiendarret_at_gmail.com>  wrote:
>> >>>>>>>
>> >>>>>>>>> When referring to NAMD, I wanted to imply (badly, I admit)
>> >>>>>>
>> >>>>>> performance
>> >>>>>>>>>
>> >>>>>>>>> boost by the third GPU.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> as i was mentioning before. that is near impossible to predict.
>> >>>>>>>
>> >>>>>>>>> The  PCI specification is described by the manufacturer as
>> >>>>>>>>> follows
>> >>>>>>>>>
>> >>>>>>>>> -- PCI Express slots version: 2.0.
>> >>>>>>>>>
>> >>>>>>>>> -- PCI slots: 1.
>> >>>>>>>>>
>> >>>>>>>>> -- PCI express x1 slots: 1.
>> >>>>>>>>>
>> >>>>>>>>> -- PCI express x16 slots: 4.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> that doesn't mean anything. labeling slots as x16
>> >>>>>>> only means that you can stick an x16 wide card
>> >>>>>>> into it. each of these slots can be wired with 16,
>> >>>>>>> 8, 4, 2 or 1 lane. also, some boards claim they
>> >>>>>>> have all 16-lane slots, but then two slots are
>> >>>>>>> connected to a little bridge chip. resulting in
>> >>>>>>> two cards each having to share the bandwidth.
>> >>>>>>>
>> >>>>>>>>> Whether these are real x16 2.x, or not, is beyond my
>> >>>>>>>>> understanding.
>> >>>>>>
>> >>>>>> I
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> with out that information, you can't judge.
>> >>>>>>> contact the vendor or find somebody that
>> >>>>>>> has time to research it.
>> >>>>>>>
>> >>>>>>>>> can only compare with the corresponding description for the
>> >>>>>>
>> >>>>>> mainboard
>> >>>>>>>>>
>> >>>>>>>>> I am currently using: GA-890FXA-UD5:
>> >>>>>>>>>
>> >>>>>>>>> 2 x PCI Express x16, running at x16 (PCIEX16_1, PCIEX16_2).
>> >>>>>>>>>
>> >>>>>>>>> 1 x PCI Express x16 slot, running at x8 (PCIEX8).
>> >>>>>>>>>
>> >>>>>>>>> 1 x PCI Express x16 slot, running at x4 (PCIEX4).
>> >>>>>>>>>
>> >>>>>>>>> 2 x PCI Express x1 slots.
>> >>>>>>>>>  (All PCI Express slots conform to the PCI Express 2.0)
>> >>>>>>>>>
>> >>>>>>>>> 1 x PCI slot.
>> >>>>>>>>>
>> >>>>>>>>> With this latter mainboard, adding a second GTX-580 gave the
>> >>>>>>
>> >>>>>> expected
>> >>>>>>>>>
>> >>>>>>>>> acceleration. Data for PCIs of the two mainboards being
>> >>>>>>>>> comparable,
>> >>>>>>
>> >>>>>> I
>> >>>>>>>>>
>> >>>>>>>>> would expect that a third GTX-580 on the 990.. motherboard
>> >>>>>>>>> should
>> >>>>>>
>> >>>>>> play
>> >>>>>>>>>
>> >>>>>>>>> well its job. Is it this naive extrapolaion a sound one?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> no. you usually overload the memory bandwidth of the CPU
>> >>>>>>> with the third GPU and thus you won't get the full speedup.
>> >>>>>>> how much speedup you'll get depends on the individual
>> >>>>>>> characteristics of your input.
>> >>>>>>>
>> >>>>>>> axel.
>> >>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Thanks indeed for further advice
>> >>>>>>>>>
>> >>>>>>>>> francesco pietra
>> >>>>>>>>>
>> >>>>>>> --
>> >>>>>>> Dr. Axel Kohlmeyer
>> >>>>>>> akohlmey_at_gmail.com  http://goo.gl/1wk0
>> >>>>>>>
>> >>>>>>> College of Science and Technology
>> >>>>>>> Temple University, Philadelphia PA, USA.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Aron Broom M.Sc
>> >>> PhD Student
>> >>> Department of Chemistry
>> >>> University of Waterloo
>> >>>
>> >>
>> >>
>> >
>>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:02 CST