AW: Suggestions while building a GPU-machine (CUDA) for NAMD use!

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Wed May 29 2013 - 00:27:44 CDT

And almost all of them, have at least no idea of HPC.

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von Axel Kohlmeyer
> Gesendet: Dienstag, 28. Mai 2013 11:40
> An: Aditya Ranganathan
> Cc: Francesco Pietra; Aron Broom; NAMD
> Betreff: Re: namd-l: Suggestions while building a GPU-machine (CUDA)
> for NAMD use!
>
> On Tue, May 28, 2013 at 11:04 AM, Aditya Ranganathan
> <aditya.sia_at_gmail.com> wrote:
> > Francesco, we are`nt considering the GTX680 for a server motherboard.
>
> why not? we run a couple of GTX 580s such a mainboard for a quite a
> while and they work extremely well.
>
> > My
> > question was with regards to an almost 3X difference in prices of
> GTX680
> > (which we have installed on one of our lab`s older servers) and the
> Tesla
> > C2075 which is a server-grade GPU. I was curious to know if the 3X
> increase
> > in price also corresponds to a similar increase in computing potency
> and the
> > answer seems to be 'no'. I hope the investment is worthwhile in terms
> of
> > reliabilility and scalability over multiple GPUs.
>
> vendors will always push "server grade" hardware, but keep in mind
> that most vendor "experts" that you deal with don't have the faintest
> clue about what they are selling. all they are telling you is, what
> they got told by the hardware vendors (in this case nvidia) and based
> on the principle of "more expensive must be better". also, while there
> will be no official confirmation on this, vendors are usually under
> pressure to sell "server grade" hardware or else they may loose their
> "preferred partner" status, i.e. won't get as deep discounts from the
> hardware vendor as their competition.
>
> *never* trust what a vendor, let alone a sales person, tells you. they
> are in the business to make money and they make more money off the
> more expensive hardware. you have to look at things a bit differently:
> at the 3x price difference, you can have your GPU break twice without
> getting a replacement and you will still come up ahead (since you can
> purchase a faster/better one), or you can buy and run more hardware
> for the same money. the reliability factor matters, if you run a huge
> number of GPUs and the average failure rate is so high that the cost
> for having a person swap broken GPUs with working ones becomes too
> expensive. if you want a couple of workstations that sit somewhere in
> the corner. it doesn't matter. i buy "server type" machines, because
> they are run in a data center and the extra cost is easily
> offset by me not having to walk over to the data center for every
> little piece of maintenance work, since the server boards have remote
> management via IPMI, so i can reboot, power on/off, configure the
> bios, install, etc. from anywhere i have a decent internet connection,
> even halfway around the globe. that saves money, because i am more
> productive this way (and my salary is decent). if i had a pile of
> (cheap) students working for me that i could send over to do these
> kind of tasks at any time, the bottom line would be different and it
> would pay to go for (much) cheaper hardware, even if it fails more
> often.
>
> in short: knowledge is power, and in a business where hype and
> marketing dominate over facts even more so.
>
> axel.
>
> >
> > I hope its clearer now.
> >
> > Thanks and Regards
> >
> > Aditya
> >
> >
> > On Tue, May 28, 2013 at 1:24 PM, Francesco Pietra
> <chiendarret_at_gmail.com>
> > wrote:
> >>
> >> Hi Aditya:
> >>
> >> OK, server motherboard. Expensive, but confirming that to have PCIE
> 3.0
> >> one cannot stick to consumer motherboards, for the time being.
> Anyway, why
> >> do you mention a consumer GTX 680 for a server motherboard? Ask to
> hardware
> >> experts (I am not that). It seems a bit mismatching. At any event,
> be
> >> careful as to the 680 brand. I have posted before that changing
> brand gave
> >> me a lot of problems.
> >>
> >> Also, ask hardware experts whether the price of such a server-GPU is
> worth
> >> while. Would not be better to go to CPU small cluster? With GPUs you
> are
> >> still much limited. For example, no QM/MM. And GPUs are fragile, and
> >> scarcely interpretable, tools (see my previous post).
> >>
> >> regards
> >> francesco
> >>
> >>
> >> On Tue, May 28, 2013 at 9:23 AM, Aditya Ranganathan
> <aditya.sia_at_gmail.com>
> >> wrote:
> >>>
> >>> The motherboard that we are planning to install the gpus would be
> an
> >>> Intel Workstation/Server Motherboard W2600 CR2 with Intel E5 2620 6
> Core
> >>> Xeon processors. Its not a consumer motherboard. Does that sound
> reasonable?
> >>> Im a novice at this so might help to get more insight into this.
> >>>
> >>>
> >>> On Tue, May 28, 2013 at 11:25 AM, Francesco Pietra
> >>> <chiendarret_at_gmail.com> wrote:
> >>>>
> >>>>
> >>>> Sorry, I forgot the forum.
> >>>>
> >>>> On Tue, May 28, 2013 at 7:39 AM, Francesco Pietra
> >>>> <chiendarret_at_gmail.com> wrote:
> >>>>>
> >>>>> Hi Aron:
> >>>>> Thanks for the illustration of C-2075.
> >>>>>
> >>>>> In my opinion/experience the very point is PCIE 2.0 of current
> consumer
> >>>>> mainboards. I would like to know why such a bottleneck was not
> corrected. It
> >>>>> is a rather long time that GPUs at PCIE 3.0 (or stated to be so)
> are
> >>>>> available. Why mainboards have not be brought to PCIE 3.0? Unless
> Aditya has
> >>>>> found the right mainboard. We will see.
> >>>>>
> >>>>> With NAMD, I had two Zotac GTX 580 under AMD CPUs, no problems in
> a
> >>>>> couple of years. Then increased the MD speed with two MSI GTX 680
> under
> >>>>> Intel GPU. In both cases, 6 CPUs per two GTX, mainboard PCIE 2.0;
> >>>>> nonetheless, no much improvement by the second GTX. The MSI
> played correctly
> >>>>> with a twofold increase of speed with respect to the 580 on a
> 200,0000 atom
> >>>>> system. After a few months, problems (not understood) arose with
> one of the
> >>>>> MSI. It plays games but it hangs under NAMD on computations
> longer that
> >>>>> 100,000 steps at ts = 1fs. Especially so when coupled to the
> other MSI.
> >>>>> Exchanging the two MSI on their sockets, always point to failure
> by the same
> >>>>> MSI, so that I concluded that the mainbord is not responsible. I
> have a
> >>>>> fight with the vendor, who claims that such GTXs are for games. I
> asked
> >>>>> replacement with a Zotac equivalent, paying for the price
> difference. Still
> >>>>> fight.
> >>>>>
> >>>>> As to server mainboards, I have old information, concerning a
> board
> >>>>> from Supermicro, which, however, had a single socket at PCIE 3.0.
> Moreover,
> >>>>> as it is well known, most MD code does not gain from server
> hardware, or not
> >>>>> so much to justify the price difference.
> >>>>>
> >>>>> francesco
> >>>>>
> >>>>>
> >>>>> On Mon, May 27, 2013 at 7:50 PM, Aron Broom <broomsday_at_gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi Francesco,
> >>>>>>
> >>>>>> You're right, the memory isn't really much of a selling point
> for
> >>>>>> current MD (it once was an issue in AMBER, but I think that has
> been reduced
> >>>>>> greatly).
> >>>>>>
> >>>>>> Really the main point was that if the C-2075 was a price option
> and
> >>>>>> being compared against the 680 purely for performance, then one
> might want
> >>>>>> to also consider the titan as part of the price/performance
> comparison.
> >>>>>>
> >>>>>> ~Aron
> >>>>>>
> >>>>>>
> >>>>>> On Mon, May 27, 2013 at 9:47 AM, Francesco Pietra
> >>>>>> <chiendarret_at_gmail.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, May 27, 2013 at 3:09 PM, Aron Broom
> <broomsday_at_gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> as Axel suggested, in terms of just performance, the C-2075
> will be
> >>>>>>>> about the same as a GTX480 in most cases. So from just a
> performance
> >>>>>>>> standpoint the GTX680 will generally be better.
> >>>>>>>>
> >>>>>>>> I'm not sure how much a C-2075 costs currently, but if as you
> say
> >>>>>>>> you are getting a PCIe 3.0 board, why not buy a Titan? You'll
> have even
> >>>>>>>> better performance than the 680 and huge memory (6GB).
> >>>>>>>
> >>>>>>>
> >>>>>>> Aron:
> >>>>>>> How much memory is used by a 680 GPU on a consumer
> motherboards
> >>>>>>> (i.e., PCIE 2.0) for a proteic system of common size in
> explicit water,
> >>>>>>> i.e., 200,000 atoms? Either signe or multiple GPU. More that
> some hundred
> >>>>>>> MB? If more than that, how did you manage to accomplish that
> with
> >>>>>>> NAMD2.9-CUDA4.0?
> >>>>>>>
> >>>>>>> thanks
> >>>>>>> francesco
> >>>>>>>
> >>>>>>>
> >>>>>>>> Of course the memory quality issues compared to a K20x that
> Axel
> >>>>>>>> brought up still exist, but if performance is your only
> concern...
> >>>>>>>>
> >>>>>>>> ~Aron
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, May 27, 2013 at 5:58 AM, Axel Kohlmeyer
> <akohlmey_at_gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> On Mon, May 27, 2013 at 11:35 AM, Aditya Ranganathan
> >>>>>>>>> <aditya.sia_at_gmail.com> wrote:
> >>>>>>>>> > @Francesco, we are planning to buy a PCI-express 3.0
> supported
> >>>>>>>>> > board. @Alex:
> >>>>>>>>> > Thanks Alex for the comprehensive walthrough on this issue.
> We
> >>>>>>>>> > aim at
> >>>>>>>>> > building this machine solely for performing classical
> >>>>>>>>> > MD-simulations using
> >>>>>>>>> > NAMD. Reliability and scaling up issues of the a GeForce
> card
> >>>>>>>>> > like GTX680 is
> >>>>>>>>> > what was cited as a possible disadvantage by our computer
> vendor
> >>>>>>>>> > while he
> >>>>>>>>> > was suggesting the Tesla to be an option.
> >>>>>>>>>
> >>>>>>>>> > I have`nt been able to get any clear benchmarks for the
> TESLA
> >>>>>>>>> > C-2075 as of
> >>>>>>>>> > yet. Most of the benchmarks seem to revolve around the
> Kepler
> >>>>>>>>> > series of
> >>>>>>>>> > cards. If anyone is aware of those, please lead me to the
> NAMD
> >>>>>>>>> > benchmarks on
> >>>>>>>>> > TESLA C-2075.
> >>>>>>>>>
> >>>>>>>>> look for benchmarks of C-2050. the C-2075 is tad faster. they
> are
> >>>>>>>>> two
> >>>>>>>>> different revisions of the fermi chip. the difference is
> similar to
> >>>>>>>>> what a GTX 480 is to a GTX 580 (which are the corresponding
> >>>>>>>>> consumer
> >>>>>>>>> models). mind you. with the fermi generations, the consumer
> cards
> >>>>>>>>> were
> >>>>>>>>> more similar to the tesla cards than they are now. only the
> GeForce
> >>>>>>>>> TITAN has a similar (or better?) relationship to the tesla
> K20.
> >>>>>>>>> even
> >>>>>>>>> the recently released GTX 780 has been deliberately
> "crippled" to
> >>>>>>>>> massively reduce double precision floating point performance.
> >>>>>>>>>
> >>>>>>>>> the problem with the C-2075 is that it is using an already
> outdated
> >>>>>>>>> architecture (with GPUs architectures change fast) which is
> as
> >>>>>>>>> different from a kepler chips as perhaps a intel pentium 4 is
> from
> >>>>>>>>> a
> >>>>>>>>> current (ivy bridge) based intel i7 cpu.
> >>>>>>>>>
> >>>>>>>>> axel.
> >>>>>>>>>
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>> > On Mon, May 27, 2013 at 2:27 PM, Axel Kohlmeyer
> >>>>>>>>> > <akohlmey_at_gmail.com> wrote:
> >>>>>>>>> >>
> >>>>>>>>> >> On Mon, May 27, 2013 at 10:14 AM, Aditya Ranganathan
> >>>>>>>>> >> <aditya.sia_at_gmail.com> wrote:
> >>>>>>>>> >> > Hello All,
> >>>>>>>>> >> >
> >>>>>>>>> >> > We are pondering over investing on a GPU based machine
> for
> >>>>>>>>> >> > running NAMD
> >>>>>>>>> >> > simulations (all-atom). Currently, we are stuck with a
> dilemma
> >>>>>>>>> >> > over the
> >>>>>>>>> >> > choice of card for CUDA computing. We already have a GTX
> 680
> >>>>>>>>> >> > which gives
> >>>>>>>>> >> > us
> >>>>>>>>> >> > about 3ns/day for a 100000 atom system using a single
> GPU card
> >>>>>>>>> >> > and 8 cpu
> >>>>>>>>> >> > cores.
> >>>>>>>>> >> >
> >>>>>>>>> >> > Now, we are planning to build a GPU machine with 4 GPU
> cards
> >>>>>>>>> >> > (either
> >>>>>>>>> >> > Tesla
> >>>>>>>>> >> > C-2075C, 6GB GDDR5 or the NVIDIA GTX 680). The base
> system
> >>>>>>>>> >> > would
> >>>>>>>>> >> > consists of
> >>>>>>>>> >> > a 6-core Intel Xeon E5 2620 processor, 64GB DDR3 RAM and
> a 2TB
> >>>>>>>>> >> > Hard
> >>>>>>>>> >> > Drive.
> >>>>>>>>> >> >
> >>>>>>>>> >> > Has anyone in the community used the Tesla series of
> cards
> >>>>>>>>> >> > with NAMD and
> >>>>>>>>> >> > compared its benchmarks (scalability etc) with a entry
> level
> >>>>>>>>> >> > card like
> >>>>>>>>> >> > GTX
> >>>>>>>>> >> > 680. The cost of the Tesla is almost 3 times that of the
> >>>>>>>>> >> > GTX680. Does
> >>>>>>>>> >> > its
> >>>>>>>>> >> > performance justify its price?
> >>>>>>>>> >>
> >>>>>>>>> >> GTX 680 is not exactly "entry" level (more upper mid
> level) and
> >>>>>>>>> >> you
> >>>>>>>>> >> can't compare GPUs like that. you basically have different
> "chip
> >>>>>>>>> >> families" and different "chip generations" GTX 680 is
> based on
> >>>>>>>>> >> the
> >>>>>>>>> >> "Kepler" generation, as are the Tesla K10 and Tesla K20,
> the
> >>>>>>>>> >> C2075
> >>>>>>>>> >> however is based on the previous generation called
> "Fermi". Now
> >>>>>>>>> >> GeForce cards are usually spec'd rather aggressively and
> for use
> >>>>>>>>> >> in
> >>>>>>>>> >> video games and not for reliability in computing (which
> doesn't
> >>>>>>>>> >> mean,
> >>>>>>>>> >> they are unreliable, only that the vendors take a higher
> risk
> >>>>>>>>> >> for
> >>>>>>>>> >> lowering production costs and raising game performance).
> >>>>>>>>> >>
> >>>>>>>>> >> Also on GeForce cards certain functionality is not
> available
> >>>>>>>>> >> (for
> >>>>>>>>> >> example ECC memory configuration) or only in very limited
> way
> >>>>>>>>> >> (for
> >>>>>>>>> >> example double precision floating point math). Also,
> support
> >>>>>>>>> >> through
> >>>>>>>>> >> the nvidia-smi utility is limited. On the other hand,
> Tesla GPUs
> >>>>>>>>> >> do
> >>>>>>>>> >> have all of these benefits and also use "certified" and
> tested
> >>>>>>>>> >> hardware components, often more RAM and have better
> warranty
> >>>>>>>>> >> deals.
> >>>>>>>>> >> All of this and the fact that they are produced and sold
> in
> >>>>>>>>> >> smaller
> >>>>>>>>> >> quantities result in higher costs.
> >>>>>>>>> >>
> >>>>>>>>> >> So whether the Tesla GPUs are worth the price or not
> depends on
> >>>>>>>>> >> what
> >>>>>>>>> >> you are looking for in a GPU. Classical MD can function
> very
> >>>>>>>>> >> well with
> >>>>>>>>> >> just limited double precision performance, since most of
> the
> >>>>>>>>> >> force
> >>>>>>>>> >> calculation can be done in single precision with only a
> small
> >>>>>>>>> >> loss of
> >>>>>>>>> >> accuracy (and would otherwise similarly offloaded to SSE
> and AVX
> >>>>>>>>> >> vector instructions). Also the performance of classical MD
> is
> >>>>>>>>> >> often as
> >>>>>>>>> >> much dominated by memory bandwidth (looking up pairs of
> >>>>>>>>> >> particles
> >>>>>>>>> >> through the neighbor lists) as it is through compute
> >>>>>>>>> >> performance. the
> >>>>>>>>> >> fastest GeForce type GPUs often outperform the fastest
> Tesla
> >>>>>>>>> >> cards of
> >>>>>>>>> >> the same generation in classical MD due to their higher
> clocks
> >>>>>>>>> >> and
> >>>>>>>>> >> higher memory bandwidth. However, if you would also run
> >>>>>>>>> >> applications
> >>>>>>>>> >> that are dependent on double precision floating point, or
> prefer
> >>>>>>>>> >> a low
> >>>>>>>>> >> risk and better management and are willing for that the
> extra
> >>>>>>>>> >> price,
> >>>>>>>>> >> then the Tesla would be it.
> >>>>>>>>> >>
> >>>>>>>>> >> mind you, the Tesla K10 is a special beast in this zoo,
> since it
> >>>>>>>>> >> is
> >>>>>>>>> >> effectively a pimped up GeForce GTX690.
> >>>>>>>>> >>
> >>>>>>>>> >> > Any suggestions from the community would be greatly
> >>>>>>>>> >> > appreciated.
> >>>>>>>>> >>
> >>>>>>>>> >> multi-gpu machines are tricky business. you have to pay
> great
> >>>>>>>>> >> attention to the chipset and how many full withd PCI-e
> slots are
> >>>>>>>>> >> supported. for a 4-GPU machine, you usually need two CPUs
> and
> >>>>>>>>> >> two
> >>>>>>>>> >> southbridges (two GPUs per socket). some boards have only
> one
> >>>>>>>>> >> southbridge and then support more full width PCI-e slots
> via
> >>>>>>>>> >> PCIe
> >>>>>>>>> >> bridge chips. those add a little latency and - when you
> use all
> >>>>>>>>> >> GPUs
> >>>>>>>>> >> at the same time - two GPUs have to share the bandwidth.
> since
> >>>>>>>>> >> the
> >>>>>>>>> >> host to GPU bandwidth affects NAMD performance, you have
> to test
> >>>>>>>>> >> whether in that case a single 4 GPU machine or two
> machines with
> >>>>>>>>> >> 2
> >>>>>>>>> >> GPUs each are the better option (probably the latter).
> also you
> >>>>>>>>> >> should
> >>>>>>>>> >> make sure that the CPU memory bandwidth is not crippled
> (they
> >>>>>>>>> >> come in
> >>>>>>>>> >> different speeds).
> >>>>>>>>> >>
> >>>>>>>>> >> in short, there is no clear cut answer. many things depend
> on
> >>>>>>>>> >> what
> >>>>>>>>> >> *else* you want to do with the machine and there are many
> >>>>>>>>> >> personal
> >>>>>>>>> >> opinions that people are not 100% agreed upon. if you ask
> >>>>>>>>> >> simply, is
> >>>>>>>>> >> the performance of a tesla worth 3x the price (or more in
> the
> >>>>>>>>> >> case of
> >>>>>>>>> >> a K20), my personal opinion is "not at all", but i might
> still
> >>>>>>>>> >> buy
> >>>>>>>>> >> one, in case i come across an application and workflow
> that
> >>>>>>>>> >> benefits
> >>>>>>>>> >> from it.
> >>>>>>>>> >>
> >>>>>>>>> >> axel.
> >>>>>>>>> >> >
> >>>>>>>>> >> >
> >>>>>>>>> >> > Regards
> >>>>>>>>> >> >
> >>>>>>>>> >> > Srivastav Ranganathan
> >>>>>>>>> >> > Research Scholar
> >>>>>>>>> >> > IIT Bombay,
> >>>>>>>>> >> > Mumbai, India
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >>
> >>>>>>>>> >> --
> >>>>>>>>> >> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
> >>>>>>>>> >> International Centre for Theoretical Physics, Trieste.
> Italy.
> >>>>>>>>> >
> >>>>>>>>> >
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
> >>>>>>>>> International Centre for Theoretical Physics, Trieste. Italy.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Aron Broom M.Sc
> >>>>>>>> PhD Student
> >>>>>>>> Department of Chemistry
> >>>>>>>> University of Waterloo
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Aron Broom M.Sc
> >>>>>> PhD Student
> >>>>>> Department of Chemistry
> >>>>>> University of Waterloo
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>
>
> --
> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
> International Centre for Theoretical Physics, Trieste. Italy.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:16 CST