Re: Suggestions while building a GPU-machine (CUDA) for NAMD use!

From: Aditya Ranganathan (aditya.sia_at_gmail.com)
Date: Tue May 28 2013 - 04:04:18 CDT

Francesco, we are`nt considering the GTX680 for a server motherboard. My
question was with regards to an almost 3X difference in prices of GTX680
(which we have installed on one of our lab`s older servers) and the Tesla
C2075 which is a server-grade GPU. I was curious to know if the 3X increase
in price also corresponds to a similar increase in computing potency and
the answer seems to be 'no'. I hope the investment is worthwhile in terms
of reliabilility and scalability over multiple GPUs.

I hope its clearer now.

Thanks and Regards

Aditya

On Tue, May 28, 2013 at 1:24 PM, Francesco Pietra <chiendarret_at_gmail.com>wrote:

> Hi Aditya:
>
> OK, server motherboard. Expensive, but confirming that to have PCIE 3.0
> one cannot stick to consumer motherboards, for the time being. Anyway, why
> do you mention a consumer GTX 680 for a server motherboard? Ask to hardware
> experts (I am not that). It seems a bit mismatching. At any event, be
> careful as to the 680 brand. I have posted before that changing brand gave
> me a lot of problems.
>
> Also, ask hardware experts whether the price of such a server-GPU is worth
> while. Would not be better to go to CPU small cluster? With GPUs you are
> still much limited. For example, no QM/MM. And GPUs are fragile, and
> scarcely interpretable, tools (see my previous post).
>
> regards
> francesco
>
>
> On Tue, May 28, 2013 at 9:23 AM, Aditya Ranganathan <aditya.sia_at_gmail.com>wrote:
>
>> The motherboard that we are planning to install the gpus would be an
>> Intel Workstation/Server Motherboard W2600 CR2 with Intel E5 2620 6 Core
>> Xeon processors. Its not a consumer motherboard. Does that sound
>> reasonable? Im a novice at this so might help to get more insight into this.
>>
>>
>> On Tue, May 28, 2013 at 11:25 AM, Francesco Pietra <chiendarret_at_gmail.com
>> > wrote:
>>
>>>
>>> Sorry, I forgot the forum.
>>>
>>> On Tue, May 28, 2013 at 7:39 AM, Francesco Pietra <chiendarret_at_gmail.com
>>> > wrote:
>>>
>>>> Hi Aron:
>>>> Thanks for the illustration of C-2075.
>>>>
>>>> In my opinion/experience the very point is PCIE 2.0 of current consumer
>>>> mainboards. I would like to know why such a bottleneck was not corrected.
>>>> It is a rather long time that GPUs at PCIE 3.0 (or stated to be so) are
>>>> available. Why mainboards have not be brought to PCIE 3.0? Unless Aditya
>>>> has found the right mainboard. We will see.
>>>>
>>>> With NAMD, I had two Zotac GTX 580 under AMD CPUs, no problems in a
>>>> couple of years. Then increased the MD speed with two MSI GTX 680 under
>>>> Intel GPU. In both cases, 6 CPUs per two GTX, mainboard PCIE 2.0;
>>>> nonetheless, no much improvement by the second GTX. The MSI played
>>>> correctly with a twofold increase of speed with respect to the 580 on a
>>>> 200,0000 atom system. After a few months, problems (not understood) arose
>>>> with one of the MSI. It plays games but it hangs under NAMD on computations
>>>> longer that 100,000 steps at ts = 1fs. Especially so when coupled to the
>>>> other MSI. Exchanging the two MSI on their sockets, always point to failure
>>>> by the same MSI, so that I concluded that the mainbord is not responsible.
>>>> I have a fight with the vendor, who claims that such GTXs are for games. I
>>>> asked replacement with a Zotac equivalent, paying for the price difference.
>>>> Still fight.
>>>>
>>>> As to server mainboards, I have old information, concerning a board
>>>> from Supermicro, which, however, had a single socket at PCIE 3.0. Moreover,
>>>> as it is well known, most MD code does not gain from server hardware, or
>>>> not so much to justify the price difference.
>>>>
>>>> francesco
>>>>
>>>>
>>>> On Mon, May 27, 2013 at 7:50 PM, Aron Broom <broomsday_at_gmail.com>wrote:
>>>>
>>>>> Hi Francesco,
>>>>>
>>>>> You're right, the memory isn't really much of a selling point for
>>>>> current MD (it once was an issue in AMBER, but I think that has been
>>>>> reduced greatly).
>>>>>
>>>>> Really the main point was that if the C-2075 was a price option and
>>>>> being compared against the 680 purely for performance, then one might want
>>>>> to also consider the titan as part of the price/performance comparison.
>>>>>
>>>>> ~Aron
>>>>>
>>>>>
>>>>> On Mon, May 27, 2013 at 9:47 AM, Francesco Pietra <
>>>>> chiendarret_at_gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 27, 2013 at 3:09 PM, Aron Broom <broomsday_at_gmail.com>wrote:
>>>>>>
>>>>>>> as Axel suggested, in terms of just performance, the C-2075 will be
>>>>>>> about the same as a GTX480 in most cases. So from just a performance
>>>>>>> standpoint the GTX680 will generally be better.
>>>>>>>
>>>>>>> I'm not sure how much a C-2075 costs currently, but if as you say
>>>>>>> you are getting a PCIe 3.0 board, why not buy a Titan? You'll have even
>>>>>>> better performance than the 680 and huge memory (6GB).
>>>>>>>
>>>>>>
>>>>>> Aron:
>>>>>> How much memory is used by a 680 GPU on a consumer motherboards
>>>>>> (i.e., PCIE 2.0) for a proteic system of common size in explicit water,
>>>>>> i.e., 200,000 atoms? Either signe or multiple GPU. More that some hundred
>>>>>> MB? If more than that, how did you manage to accomplish that with
>>>>>> NAMD2.9-CUDA4.0?
>>>>>>
>>>>>> thanks
>>>>>> francesco
>>>>>>
>>>>>>
>>>>>> Of course the memory quality issues compared to a K20x that Axel
>>>>>>> brought up still exist, but if performance is your only concern...
>>>>>>>
>>>>>>> ~Aron
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 27, 2013 at 5:58 AM, Axel Kohlmeyer <akohlmey_at_gmail.com>wrote:
>>>>>>>
>>>>>>>> On Mon, May 27, 2013 at 11:35 AM, Aditya Ranganathan
>>>>>>>> <aditya.sia_at_gmail.com> wrote:
>>>>>>>> > @Francesco, we are planning to buy a PCI-express 3.0 supported
>>>>>>>> board. @Alex:
>>>>>>>> > Thanks Alex for the comprehensive walthrough on this issue. We
>>>>>>>> aim at
>>>>>>>> > building this machine solely for performing classical
>>>>>>>> MD-simulations using
>>>>>>>> > NAMD. Reliability and scaling up issues of the a GeForce card
>>>>>>>> like GTX680 is
>>>>>>>> > what was cited as a possible disadvantage by our computer vendor
>>>>>>>> while he
>>>>>>>> > was suggesting the Tesla to be an option.
>>>>>>>>
>>>>>>>> > I have`nt been able to get any clear benchmarks for the TESLA
>>>>>>>> C-2075 as of
>>>>>>>> > yet. Most of the benchmarks seem to revolve around the Kepler
>>>>>>>> series of
>>>>>>>> > cards. If anyone is aware of those, please lead me to the NAMD
>>>>>>>> benchmarks on
>>>>>>>> > TESLA C-2075.
>>>>>>>>
>>>>>>>> look for benchmarks of C-2050. the C-2075 is tad faster. they are
>>>>>>>> two
>>>>>>>> different revisions of the fermi chip. the difference is similar to
>>>>>>>> what a GTX 480 is to a GTX 580 (which are the corresponding consumer
>>>>>>>> models). mind you. with the fermi generations, the consumer cards
>>>>>>>> were
>>>>>>>> more similar to the tesla cards than they are now. only the GeForce
>>>>>>>> TITAN has a similar (or better?) relationship to the tesla K20. even
>>>>>>>> the recently released GTX 780 has been deliberately "crippled" to
>>>>>>>> massively reduce double precision floating point performance.
>>>>>>>>
>>>>>>>> the problem with the C-2075 is that it is using an already outdated
>>>>>>>> architecture (with GPUs architectures change fast) which is as
>>>>>>>> different from a kepler chips as perhaps a intel pentium 4 is from a
>>>>>>>> current (ivy bridge) based intel i7 cpu.
>>>>>>>>
>>>>>>>> axel.
>>>>>>>>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mon, May 27, 2013 at 2:27 PM, Axel Kohlmeyer <
>>>>>>>> akohlmey_at_gmail.com> wrote:
>>>>>>>> >>
>>>>>>>> >> On Mon, May 27, 2013 at 10:14 AM, Aditya Ranganathan
>>>>>>>> >> <aditya.sia_at_gmail.com> wrote:
>>>>>>>> >> > Hello All,
>>>>>>>> >> >
>>>>>>>> >> > We are pondering over investing on a GPU based machine for
>>>>>>>> running NAMD
>>>>>>>> >> > simulations (all-atom). Currently, we are stuck with a dilemma
>>>>>>>> over the
>>>>>>>> >> > choice of card for CUDA computing. We already have a GTX 680
>>>>>>>> which gives
>>>>>>>> >> > us
>>>>>>>> >> > about 3ns/day for a 100000 atom system using a single GPU card
>>>>>>>> and 8 cpu
>>>>>>>> >> > cores.
>>>>>>>> >> >
>>>>>>>> >> > Now, we are planning to build a GPU machine with 4 GPU cards
>>>>>>>> (either
>>>>>>>> >> > Tesla
>>>>>>>> >> > C-2075C, 6GB GDDR5 or the NVIDIA GTX 680). The base system
>>>>>>>> would
>>>>>>>> >> > consists of
>>>>>>>> >> > a 6-core Intel Xeon E5 2620 processor, 64GB DDR3 RAM and a 2TB
>>>>>>>> Hard
>>>>>>>> >> > Drive.
>>>>>>>> >> >
>>>>>>>> >> > Has anyone in the community used the Tesla series of cards
>>>>>>>> with NAMD and
>>>>>>>> >> > compared its benchmarks (scalability etc) with a entry level
>>>>>>>> card like
>>>>>>>> >> > GTX
>>>>>>>> >> > 680. The cost of the Tesla is almost 3 times that of the
>>>>>>>> GTX680. Does
>>>>>>>> >> > its
>>>>>>>> >> > performance justify its price?
>>>>>>>> >>
>>>>>>>> >> GTX 680 is not exactly "entry" level (more upper mid level) and
>>>>>>>> you
>>>>>>>> >> can't compare GPUs like that. you basically have different "chip
>>>>>>>> >> families" and different "chip generations" GTX 680 is based on
>>>>>>>> the
>>>>>>>> >> "Kepler" generation, as are the Tesla K10 and Tesla K20, the
>>>>>>>> C2075
>>>>>>>> >> however is based on the previous generation called "Fermi". Now
>>>>>>>> >> GeForce cards are usually spec'd rather aggressively and for use
>>>>>>>> in
>>>>>>>> >> video games and not for reliability in computing (which doesn't
>>>>>>>> mean,
>>>>>>>> >> they are unreliable, only that the vendors take a higher risk for
>>>>>>>> >> lowering production costs and raising game performance).
>>>>>>>> >>
>>>>>>>> >> Also on GeForce cards certain functionality is not available (for
>>>>>>>> >> example ECC memory configuration) or only in very limited way
>>>>>>>> (for
>>>>>>>> >> example double precision floating point math). Also, support
>>>>>>>> through
>>>>>>>> >> the nvidia-smi utility is limited. On the other hand, Tesla GPUs
>>>>>>>> do
>>>>>>>> >> have all of these benefits and also use "certified" and tested
>>>>>>>> >> hardware components, often more RAM and have better warranty
>>>>>>>> deals.
>>>>>>>> >> All of this and the fact that they are produced and sold in
>>>>>>>> smaller
>>>>>>>> >> quantities result in higher costs.
>>>>>>>> >>
>>>>>>>> >> So whether the Tesla GPUs are worth the price or not depends on
>>>>>>>> what
>>>>>>>> >> you are looking for in a GPU. Classical MD can function very
>>>>>>>> well with
>>>>>>>> >> just limited double precision performance, since most of the
>>>>>>>> force
>>>>>>>> >> calculation can be done in single precision with only a small
>>>>>>>> loss of
>>>>>>>> >> accuracy (and would otherwise similarly offloaded to SSE and AVX
>>>>>>>> >> vector instructions). Also the performance of classical MD is
>>>>>>>> often as
>>>>>>>> >> much dominated by memory bandwidth (looking up pairs of particles
>>>>>>>> >> through the neighbor lists) as it is through compute
>>>>>>>> performance. the
>>>>>>>> >> fastest GeForce type GPUs often outperform the fastest Tesla
>>>>>>>> cards of
>>>>>>>> >> the same generation in classical MD due to their higher clocks
>>>>>>>> and
>>>>>>>> >> higher memory bandwidth. However, if you would also run
>>>>>>>> applications
>>>>>>>> >> that are dependent on double precision floating point, or prefer
>>>>>>>> a low
>>>>>>>> >> risk and better management and are willing for that the extra
>>>>>>>> price,
>>>>>>>> >> then the Tesla would be it.
>>>>>>>> >>
>>>>>>>> >> mind you, the Tesla K10 is a special beast in this zoo, since it
>>>>>>>> is
>>>>>>>> >> effectively a pimped up GeForce GTX690.
>>>>>>>> >>
>>>>>>>> >> > Any suggestions from the community would be greatly
>>>>>>>> appreciated.
>>>>>>>> >>
>>>>>>>> >> multi-gpu machines are tricky business. you have to pay great
>>>>>>>> >> attention to the chipset and how many full withd PCI-e slots are
>>>>>>>> >> supported. for a 4-GPU machine, you usually need two CPUs and two
>>>>>>>> >> southbridges (two GPUs per socket). some boards have only one
>>>>>>>> >> southbridge and then support more full width PCI-e slots via PCIe
>>>>>>>> >> bridge chips. those add a little latency and - when you use all
>>>>>>>> GPUs
>>>>>>>> >> at the same time - two GPUs have to share the bandwidth. since
>>>>>>>> the
>>>>>>>> >> host to GPU bandwidth affects NAMD performance, you have to test
>>>>>>>> >> whether in that case a single 4 GPU machine or two machines with
>>>>>>>> 2
>>>>>>>> >> GPUs each are the better option (probably the latter). also you
>>>>>>>> should
>>>>>>>> >> make sure that the CPU memory bandwidth is not crippled (they
>>>>>>>> come in
>>>>>>>> >> different speeds).
>>>>>>>> >>
>>>>>>>> >> in short, there is no clear cut answer. many things depend on
>>>>>>>> what
>>>>>>>> >> *else* you want to do with the machine and there are many
>>>>>>>> personal
>>>>>>>> >> opinions that people are not 100% agreed upon. if you ask
>>>>>>>> simply, is
>>>>>>>> >> the performance of a tesla worth 3x the price (or more in the
>>>>>>>> case of
>>>>>>>> >> a K20), my personal opinion is "not at all", but i might still
>>>>>>>> buy
>>>>>>>> >> one, in case i come across an application and workflow that
>>>>>>>> benefits
>>>>>>>> >> from it.
>>>>>>>> >>
>>>>>>>> >> axel.
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > Regards
>>>>>>>> >> >
>>>>>>>> >> > Srivastav Ranganathan
>>>>>>>> >> > Research Scholar
>>>>>>>> >> > IIT Bombay,
>>>>>>>> >> > Mumbai, India
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>>>>> >> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>>>>> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Aron Broom M.Sc
>>>>>>> PhD Student
>>>>>>> Department of Chemistry
>>>>>>> University of Waterloo
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Aron Broom M.Sc
>>>>> PhD Student
>>>>> Department of Chemistry
>>>>> University of Waterloo
>>>>>
>>>>
>>>>
>>>
>>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:16 CST