Re: Suggestions while building a GPU-machine (CUDA) for NAMD use!

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue May 28 2013 - 04:40:00 CDT

On Tue, May 28, 2013 at 11:04 AM, Aditya Ranganathan
<aditya.sia_at_gmail.com> wrote:
> Francesco, we are`nt considering the GTX680 for a server motherboard.

why not? we run a couple of GTX 580s such a mainboard for a quite a
while and they work extremely well.

> My
> question was with regards to an almost 3X difference in prices of GTX680
> (which we have installed on one of our lab`s older servers) and the Tesla
> C2075 which is a server-grade GPU. I was curious to know if the 3X increase
> in price also corresponds to a similar increase in computing potency and the
> answer seems to be 'no'. I hope the investment is worthwhile in terms of
> reliabilility and scalability over multiple GPUs.

vendors will always push "server grade" hardware, but keep in mind
that most vendor "experts" that you deal with don't have the faintest
clue about what they are selling. all they are telling you is, what
they got told by the hardware vendors (in this case nvidia) and based
on the principle of "more expensive must be better". also, while there
will be no official confirmation on this, vendors are usually under
pressure to sell "server grade" hardware or else they may loose their
"preferred partner" status, i.e. won't get as deep discounts from the
hardware vendor as their competition.

*never* trust what a vendor, let alone a sales person, tells you. they
are in the business to make money and they make more money off the
more expensive hardware. you have to look at things a bit differently:
at the 3x price difference, you can have your GPU break twice without
getting a replacement and you will still come up ahead (since you can
purchase a faster/better one), or you can buy and run more hardware
for the same money. the reliability factor matters, if you run a huge
number of GPUs and the average failure rate is so high that the cost
for having a person swap broken GPUs with working ones becomes too
expensive. if you want a couple of workstations that sit somewhere in
the corner. it doesn't matter. i buy "server type" machines, because
they are run in a data center and the extra cost is easily
offset by me not having to walk over to the data center for every
little piece of maintenance work, since the server boards have remote
management via IPMI, so i can reboot, power on/off, configure the
bios, install, etc. from anywhere i have a decent internet connection,
even halfway around the globe. that saves money, because i am more
productive this way (and my salary is decent). if i had a pile of
(cheap) students working for me that i could send over to do these
kind of tasks at any time, the bottom line would be different and it
would pay to go for (much) cheaper hardware, even if it fails more
often.

in short: knowledge is power, and in a business where hype and
marketing dominate over facts even more so.

axel.

>
> I hope its clearer now.
>
> Thanks and Regards
>
> Aditya
>
>
> On Tue, May 28, 2013 at 1:24 PM, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>>
>> Hi Aditya:
>>
>> OK, server motherboard. Expensive, but confirming that to have PCIE 3.0
>> one cannot stick to consumer motherboards, for the time being. Anyway, why
>> do you mention a consumer GTX 680 for a server motherboard? Ask to hardware
>> experts (I am not that). It seems a bit mismatching. At any event, be
>> careful as to the 680 brand. I have posted before that changing brand gave
>> me a lot of problems.
>>
>> Also, ask hardware experts whether the price of such a server-GPU is worth
>> while. Would not be better to go to CPU small cluster? With GPUs you are
>> still much limited. For example, no QM/MM. And GPUs are fragile, and
>> scarcely interpretable, tools (see my previous post).
>>
>> regards
>> francesco
>>
>>
>> On Tue, May 28, 2013 at 9:23 AM, Aditya Ranganathan <aditya.sia_at_gmail.com>
>> wrote:
>>>
>>> The motherboard that we are planning to install the gpus would be an
>>> Intel Workstation/Server Motherboard W2600 CR2 with Intel E5 2620 6 Core
>>> Xeon processors. Its not a consumer motherboard. Does that sound reasonable?
>>> Im a novice at this so might help to get more insight into this.
>>>
>>>
>>> On Tue, May 28, 2013 at 11:25 AM, Francesco Pietra
>>> <chiendarret_at_gmail.com> wrote:
>>>>
>>>>
>>>> Sorry, I forgot the forum.
>>>>
>>>> On Tue, May 28, 2013 at 7:39 AM, Francesco Pietra
>>>> <chiendarret_at_gmail.com> wrote:
>>>>>
>>>>> Hi Aron:
>>>>> Thanks for the illustration of C-2075.
>>>>>
>>>>> In my opinion/experience the very point is PCIE 2.0 of current consumer
>>>>> mainboards. I would like to know why such a bottleneck was not corrected. It
>>>>> is a rather long time that GPUs at PCIE 3.0 (or stated to be so) are
>>>>> available. Why mainboards have not be brought to PCIE 3.0? Unless Aditya has
>>>>> found the right mainboard. We will see.
>>>>>
>>>>> With NAMD, I had two Zotac GTX 580 under AMD CPUs, no problems in a
>>>>> couple of years. Then increased the MD speed with two MSI GTX 680 under
>>>>> Intel GPU. In both cases, 6 CPUs per two GTX, mainboard PCIE 2.0;
>>>>> nonetheless, no much improvement by the second GTX. The MSI played correctly
>>>>> with a twofold increase of speed with respect to the 580 on a 200,0000 atom
>>>>> system. After a few months, problems (not understood) arose with one of the
>>>>> MSI. It plays games but it hangs under NAMD on computations longer that
>>>>> 100,000 steps at ts = 1fs. Especially so when coupled to the other MSI.
>>>>> Exchanging the two MSI on their sockets, always point to failure by the same
>>>>> MSI, so that I concluded that the mainbord is not responsible. I have a
>>>>> fight with the vendor, who claims that such GTXs are for games. I asked
>>>>> replacement with a Zotac equivalent, paying for the price difference. Still
>>>>> fight.
>>>>>
>>>>> As to server mainboards, I have old information, concerning a board
>>>>> from Supermicro, which, however, had a single socket at PCIE 3.0. Moreover,
>>>>> as it is well known, most MD code does not gain from server hardware, or not
>>>>> so much to justify the price difference.
>>>>>
>>>>> francesco
>>>>>
>>>>>
>>>>> On Mon, May 27, 2013 at 7:50 PM, Aron Broom <broomsday_at_gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi Francesco,
>>>>>>
>>>>>> You're right, the memory isn't really much of a selling point for
>>>>>> current MD (it once was an issue in AMBER, but I think that has been reduced
>>>>>> greatly).
>>>>>>
>>>>>> Really the main point was that if the C-2075 was a price option and
>>>>>> being compared against the 680 purely for performance, then one might want
>>>>>> to also consider the titan as part of the price/performance comparison.
>>>>>>
>>>>>> ~Aron
>>>>>>
>>>>>>
>>>>>> On Mon, May 27, 2013 at 9:47 AM, Francesco Pietra
>>>>>> <chiendarret_at_gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 27, 2013 at 3:09 PM, Aron Broom <broomsday_at_gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> as Axel suggested, in terms of just performance, the C-2075 will be
>>>>>>>> about the same as a GTX480 in most cases. So from just a performance
>>>>>>>> standpoint the GTX680 will generally be better.
>>>>>>>>
>>>>>>>> I'm not sure how much a C-2075 costs currently, but if as you say
>>>>>>>> you are getting a PCIe 3.0 board, why not buy a Titan? You'll have even
>>>>>>>> better performance than the 680 and huge memory (6GB).
>>>>>>>
>>>>>>>
>>>>>>> Aron:
>>>>>>> How much memory is used by a 680 GPU on a consumer motherboards
>>>>>>> (i.e., PCIE 2.0) for a proteic system of common size in explicit water,
>>>>>>> i.e., 200,000 atoms? Either signe or multiple GPU. More that some hundred
>>>>>>> MB? If more than that, how did you manage to accomplish that with
>>>>>>> NAMD2.9-CUDA4.0?
>>>>>>>
>>>>>>> thanks
>>>>>>> francesco
>>>>>>>
>>>>>>>
>>>>>>>> Of course the memory quality issues compared to a K20x that Axel
>>>>>>>> brought up still exist, but if performance is your only concern...
>>>>>>>>
>>>>>>>> ~Aron
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, May 27, 2013 at 5:58 AM, Axel Kohlmeyer <akohlmey_at_gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Mon, May 27, 2013 at 11:35 AM, Aditya Ranganathan
>>>>>>>>> <aditya.sia_at_gmail.com> wrote:
>>>>>>>>> > @Francesco, we are planning to buy a PCI-express 3.0 supported
>>>>>>>>> > board. @Alex:
>>>>>>>>> > Thanks Alex for the comprehensive walthrough on this issue. We
>>>>>>>>> > aim at
>>>>>>>>> > building this machine solely for performing classical
>>>>>>>>> > MD-simulations using
>>>>>>>>> > NAMD. Reliability and scaling up issues of the a GeForce card
>>>>>>>>> > like GTX680 is
>>>>>>>>> > what was cited as a possible disadvantage by our computer vendor
>>>>>>>>> > while he
>>>>>>>>> > was suggesting the Tesla to be an option.
>>>>>>>>>
>>>>>>>>> > I have`nt been able to get any clear benchmarks for the TESLA
>>>>>>>>> > C-2075 as of
>>>>>>>>> > yet. Most of the benchmarks seem to revolve around the Kepler
>>>>>>>>> > series of
>>>>>>>>> > cards. If anyone is aware of those, please lead me to the NAMD
>>>>>>>>> > benchmarks on
>>>>>>>>> > TESLA C-2075.
>>>>>>>>>
>>>>>>>>> look for benchmarks of C-2050. the C-2075 is tad faster. they are
>>>>>>>>> two
>>>>>>>>> different revisions of the fermi chip. the difference is similar to
>>>>>>>>> what a GTX 480 is to a GTX 580 (which are the corresponding
>>>>>>>>> consumer
>>>>>>>>> models). mind you. with the fermi generations, the consumer cards
>>>>>>>>> were
>>>>>>>>> more similar to the tesla cards than they are now. only the GeForce
>>>>>>>>> TITAN has a similar (or better?) relationship to the tesla K20.
>>>>>>>>> even
>>>>>>>>> the recently released GTX 780 has been deliberately "crippled" to
>>>>>>>>> massively reduce double precision floating point performance.
>>>>>>>>>
>>>>>>>>> the problem with the C-2075 is that it is using an already outdated
>>>>>>>>> architecture (with GPUs architectures change fast) which is as
>>>>>>>>> different from a kepler chips as perhaps a intel pentium 4 is from
>>>>>>>>> a
>>>>>>>>> current (ivy bridge) based intel i7 cpu.
>>>>>>>>>
>>>>>>>>> axel.
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Mon, May 27, 2013 at 2:27 PM, Axel Kohlmeyer
>>>>>>>>> > <akohlmey_at_gmail.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> On Mon, May 27, 2013 at 10:14 AM, Aditya Ranganathan
>>>>>>>>> >> <aditya.sia_at_gmail.com> wrote:
>>>>>>>>> >> > Hello All,
>>>>>>>>> >> >
>>>>>>>>> >> > We are pondering over investing on a GPU based machine for
>>>>>>>>> >> > running NAMD
>>>>>>>>> >> > simulations (all-atom). Currently, we are stuck with a dilemma
>>>>>>>>> >> > over the
>>>>>>>>> >> > choice of card for CUDA computing. We already have a GTX 680
>>>>>>>>> >> > which gives
>>>>>>>>> >> > us
>>>>>>>>> >> > about 3ns/day for a 100000 atom system using a single GPU card
>>>>>>>>> >> > and 8 cpu
>>>>>>>>> >> > cores.
>>>>>>>>> >> >
>>>>>>>>> >> > Now, we are planning to build a GPU machine with 4 GPU cards
>>>>>>>>> >> > (either
>>>>>>>>> >> > Tesla
>>>>>>>>> >> > C-2075C, 6GB GDDR5 or the NVIDIA GTX 680). The base system
>>>>>>>>> >> > would
>>>>>>>>> >> > consists of
>>>>>>>>> >> > a 6-core Intel Xeon E5 2620 processor, 64GB DDR3 RAM and a 2TB
>>>>>>>>> >> > Hard
>>>>>>>>> >> > Drive.
>>>>>>>>> >> >
>>>>>>>>> >> > Has anyone in the community used the Tesla series of cards
>>>>>>>>> >> > with NAMD and
>>>>>>>>> >> > compared its benchmarks (scalability etc) with a entry level
>>>>>>>>> >> > card like
>>>>>>>>> >> > GTX
>>>>>>>>> >> > 680. The cost of the Tesla is almost 3 times that of the
>>>>>>>>> >> > GTX680. Does
>>>>>>>>> >> > its
>>>>>>>>> >> > performance justify its price?
>>>>>>>>> >>
>>>>>>>>> >> GTX 680 is not exactly "entry" level (more upper mid level) and
>>>>>>>>> >> you
>>>>>>>>> >> can't compare GPUs like that. you basically have different "chip
>>>>>>>>> >> families" and different "chip generations" GTX 680 is based on
>>>>>>>>> >> the
>>>>>>>>> >> "Kepler" generation, as are the Tesla K10 and Tesla K20, the
>>>>>>>>> >> C2075
>>>>>>>>> >> however is based on the previous generation called "Fermi". Now
>>>>>>>>> >> GeForce cards are usually spec'd rather aggressively and for use
>>>>>>>>> >> in
>>>>>>>>> >> video games and not for reliability in computing (which doesn't
>>>>>>>>> >> mean,
>>>>>>>>> >> they are unreliable, only that the vendors take a higher risk
>>>>>>>>> >> for
>>>>>>>>> >> lowering production costs and raising game performance).
>>>>>>>>> >>
>>>>>>>>> >> Also on GeForce cards certain functionality is not available
>>>>>>>>> >> (for
>>>>>>>>> >> example ECC memory configuration) or only in very limited way
>>>>>>>>> >> (for
>>>>>>>>> >> example double precision floating point math). Also, support
>>>>>>>>> >> through
>>>>>>>>> >> the nvidia-smi utility is limited. On the other hand, Tesla GPUs
>>>>>>>>> >> do
>>>>>>>>> >> have all of these benefits and also use "certified" and tested
>>>>>>>>> >> hardware components, often more RAM and have better warranty
>>>>>>>>> >> deals.
>>>>>>>>> >> All of this and the fact that they are produced and sold in
>>>>>>>>> >> smaller
>>>>>>>>> >> quantities result in higher costs.
>>>>>>>>> >>
>>>>>>>>> >> So whether the Tesla GPUs are worth the price or not depends on
>>>>>>>>> >> what
>>>>>>>>> >> you are looking for in a GPU. Classical MD can function very
>>>>>>>>> >> well with
>>>>>>>>> >> just limited double precision performance, since most of the
>>>>>>>>> >> force
>>>>>>>>> >> calculation can be done in single precision with only a small
>>>>>>>>> >> loss of
>>>>>>>>> >> accuracy (and would otherwise similarly offloaded to SSE and AVX
>>>>>>>>> >> vector instructions). Also the performance of classical MD is
>>>>>>>>> >> often as
>>>>>>>>> >> much dominated by memory bandwidth (looking up pairs of
>>>>>>>>> >> particles
>>>>>>>>> >> through the neighbor lists) as it is through compute
>>>>>>>>> >> performance. the
>>>>>>>>> >> fastest GeForce type GPUs often outperform the fastest Tesla
>>>>>>>>> >> cards of
>>>>>>>>> >> the same generation in classical MD due to their higher clocks
>>>>>>>>> >> and
>>>>>>>>> >> higher memory bandwidth. However, if you would also run
>>>>>>>>> >> applications
>>>>>>>>> >> that are dependent on double precision floating point, or prefer
>>>>>>>>> >> a low
>>>>>>>>> >> risk and better management and are willing for that the extra
>>>>>>>>> >> price,
>>>>>>>>> >> then the Tesla would be it.
>>>>>>>>> >>
>>>>>>>>> >> mind you, the Tesla K10 is a special beast in this zoo, since it
>>>>>>>>> >> is
>>>>>>>>> >> effectively a pimped up GeForce GTX690.
>>>>>>>>> >>
>>>>>>>>> >> > Any suggestions from the community would be greatly
>>>>>>>>> >> > appreciated.
>>>>>>>>> >>
>>>>>>>>> >> multi-gpu machines are tricky business. you have to pay great
>>>>>>>>> >> attention to the chipset and how many full withd PCI-e slots are
>>>>>>>>> >> supported. for a 4-GPU machine, you usually need two CPUs and
>>>>>>>>> >> two
>>>>>>>>> >> southbridges (two GPUs per socket). some boards have only one
>>>>>>>>> >> southbridge and then support more full width PCI-e slots via
>>>>>>>>> >> PCIe
>>>>>>>>> >> bridge chips. those add a little latency and - when you use all
>>>>>>>>> >> GPUs
>>>>>>>>> >> at the same time - two GPUs have to share the bandwidth. since
>>>>>>>>> >> the
>>>>>>>>> >> host to GPU bandwidth affects NAMD performance, you have to test
>>>>>>>>> >> whether in that case a single 4 GPU machine or two machines with
>>>>>>>>> >> 2
>>>>>>>>> >> GPUs each are the better option (probably the latter). also you
>>>>>>>>> >> should
>>>>>>>>> >> make sure that the CPU memory bandwidth is not crippled (they
>>>>>>>>> >> come in
>>>>>>>>> >> different speeds).
>>>>>>>>> >>
>>>>>>>>> >> in short, there is no clear cut answer. many things depend on
>>>>>>>>> >> what
>>>>>>>>> >> *else* you want to do with the machine and there are many
>>>>>>>>> >> personal
>>>>>>>>> >> opinions that people are not 100% agreed upon. if you ask
>>>>>>>>> >> simply, is
>>>>>>>>> >> the performance of a tesla worth 3x the price (or more in the
>>>>>>>>> >> case of
>>>>>>>>> >> a K20), my personal opinion is "not at all", but i might still
>>>>>>>>> >> buy
>>>>>>>>> >> one, in case i come across an application and workflow that
>>>>>>>>> >> benefits
>>>>>>>>> >> from it.
>>>>>>>>> >>
>>>>>>>>> >> axel.
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > Regards
>>>>>>>>> >> >
>>>>>>>>> >> > Srivastav Ranganathan
>>>>>>>>> >> > Research Scholar
>>>>>>>>> >> > IIT Bombay,
>>>>>>>>> >> > Mumbai, India
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>>>>>> >> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>>>>>> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Aron Broom M.Sc
>>>>>>>> PhD Student
>>>>>>>> Department of Chemistry
>>>>>>>> University of Waterloo
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Aron Broom M.Sc
>>>>>> PhD Student
>>>>>> Department of Chemistry
>>>>>> University of Waterloo
>>>>>
>>>>>
>>>>
>>>
>>
>

--
Dr. Axel Kohlmeyer  akohlmey_at_gmail.com  http://goo.gl/1wk0
International Centre for Theoretical Physics, Trieste. Italy.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:16 CST