Re: Suggestions while building a GPU-machine (CUDA) for NAMD use!

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Tue May 28 2013 - 00:55:38 CDT

Sorry, I forgot the forum.

On Tue, May 28, 2013 at 7:39 AM, Francesco Pietra <chiendarret_at_gmail.com>wrote:

> Hi Aron:
> Thanks for the illustration of C-2075.
>
> In my opinion/experience the very point is PCIE 2.0 of current consumer
> mainboards. I would like to know why such a bottleneck was not corrected.
> It is a rather long time that GPUs at PCIE 3.0 (or stated to be so) are
> available. Why mainboards have not be brought to PCIE 3.0? Unless Aditya
> has found the right mainboard. We will see.
>
> With NAMD, I had two Zotac GTX 580 under AMD CPUs, no problems in a couple
> of years. Then increased the MD speed with two MSI GTX 680 under Intel GPU.
> In both cases, 6 CPUs per two GTX, mainboard PCIE 2.0; nonetheless, no much
> improvement by the second GTX. The MSI played correctly with a twofold
> increase of speed with respect to the 580 on a 200,0000 atom system. After
> a few months, problems (not understood) arose with one of the MSI. It plays
> games but it hangs under NAMD on computations longer that 100,000 steps at
> ts = 1fs. Especially so when coupled to the other MSI. Exchanging the two
> MSI on their sockets, always point to failure by the same MSI, so that I
> concluded that the mainbord is not responsible. I have a fight with the
> vendor, who claims that such GTXs are for games. I asked replacement with a
> Zotac equivalent, paying for the price difference. Still fight.
>
> As to server mainboards, I have old information, concerning a board from
> Supermicro, which, however, had a single socket at PCIE 3.0. Moreover, as
> it is well known, most MD code does not gain from server hardware, or not
> so much to justify the price difference.
>
> francesco
>
>
> On Mon, May 27, 2013 at 7:50 PM, Aron Broom <broomsday_at_gmail.com> wrote:
>
>> Hi Francesco,
>>
>> You're right, the memory isn't really much of a selling point for current
>> MD (it once was an issue in AMBER, but I think that has been reduced
>> greatly).
>>
>> Really the main point was that if the C-2075 was a price option and being
>> compared against the 680 purely for performance, then one might want to
>> also consider the titan as part of the price/performance comparison.
>>
>> ~Aron
>>
>>
>> On Mon, May 27, 2013 at 9:47 AM, Francesco Pietra <chiendarret_at_gmail.com>wrote:
>>
>>>
>>>
>>> On Mon, May 27, 2013 at 3:09 PM, Aron Broom <broomsday_at_gmail.com> wrote:
>>>
>>>> as Axel suggested, in terms of just performance, the C-2075 will be
>>>> about the same as a GTX480 in most cases. So from just a performance
>>>> standpoint the GTX680 will generally be better.
>>>>
>>>> I'm not sure how much a C-2075 costs currently, but if as you say you
>>>> are getting a PCIe 3.0 board, why not buy a Titan? You'll have even better
>>>> performance than the 680 and huge memory (6GB).
>>>>
>>>
>>> Aron:
>>> How much memory is used by a 680 GPU on a consumer motherboards (i.e.,
>>> PCIE 2.0) for a proteic system of common size in explicit water, i.e.,
>>> 200,000 atoms? Either signe or multiple GPU. More that some hundred MB? If
>>> more than that, how did you manage to accomplish that with NAMD2.9-CUDA4.0?
>>>
>>> thanks
>>> francesco
>>>
>>>
>>> Of course the memory quality issues compared to a K20x that Axel
>>>> brought up still exist, but if performance is your only concern...
>>>>
>>>> ~Aron
>>>>
>>>>
>>>> On Mon, May 27, 2013 at 5:58 AM, Axel Kohlmeyer <akohlmey_at_gmail.com>wrote:
>>>>
>>>>> On Mon, May 27, 2013 at 11:35 AM, Aditya Ranganathan
>>>>> <aditya.sia_at_gmail.com> wrote:
>>>>> > @Francesco, we are planning to buy a PCI-express 3.0 supported
>>>>> board. @Alex:
>>>>> > Thanks Alex for the comprehensive walthrough on this issue. We aim at
>>>>> > building this machine solely for performing classical MD-simulations
>>>>> using
>>>>> > NAMD. Reliability and scaling up issues of the a GeForce card like
>>>>> GTX680 is
>>>>> > what was cited as a possible disadvantage by our computer vendor
>>>>> while he
>>>>> > was suggesting the Tesla to be an option.
>>>>>
>>>>> > I have`nt been able to get any clear benchmarks for the TESLA C-2075
>>>>> as of
>>>>> > yet. Most of the benchmarks seem to revolve around the Kepler series
>>>>> of
>>>>> > cards. If anyone is aware of those, please lead me to the NAMD
>>>>> benchmarks on
>>>>> > TESLA C-2075.
>>>>>
>>>>> look for benchmarks of C-2050. the C-2075 is tad faster. they are two
>>>>> different revisions of the fermi chip. the difference is similar to
>>>>> what a GTX 480 is to a GTX 580 (which are the corresponding consumer
>>>>> models). mind you. with the fermi generations, the consumer cards were
>>>>> more similar to the tesla cards than they are now. only the GeForce
>>>>> TITAN has a similar (or better?) relationship to the tesla K20. even
>>>>> the recently released GTX 780 has been deliberately "crippled" to
>>>>> massively reduce double precision floating point performance.
>>>>>
>>>>> the problem with the C-2075 is that it is using an already outdated
>>>>> architecture (with GPUs architectures change fast) which is as
>>>>> different from a kepler chips as perhaps a intel pentium 4 is from a
>>>>> current (ivy bridge) based intel i7 cpu.
>>>>>
>>>>> axel.
>>>>>
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, May 27, 2013 at 2:27 PM, Axel Kohlmeyer <akohlmey_at_gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Mon, May 27, 2013 at 10:14 AM, Aditya Ranganathan
>>>>> >> <aditya.sia_at_gmail.com> wrote:
>>>>> >> > Hello All,
>>>>> >> >
>>>>> >> > We are pondering over investing on a GPU based machine for
>>>>> running NAMD
>>>>> >> > simulations (all-atom). Currently, we are stuck with a dilemma
>>>>> over the
>>>>> >> > choice of card for CUDA computing. We already have a GTX 680
>>>>> which gives
>>>>> >> > us
>>>>> >> > about 3ns/day for a 100000 atom system using a single GPU card
>>>>> and 8 cpu
>>>>> >> > cores.
>>>>> >> >
>>>>> >> > Now, we are planning to build a GPU machine with 4 GPU cards
>>>>> (either
>>>>> >> > Tesla
>>>>> >> > C-2075C, 6GB GDDR5 or the NVIDIA GTX 680). The base system would
>>>>> >> > consists of
>>>>> >> > a 6-core Intel Xeon E5 2620 processor, 64GB DDR3 RAM and a 2TB
>>>>> Hard
>>>>> >> > Drive.
>>>>> >> >
>>>>> >> > Has anyone in the community used the Tesla series of cards with
>>>>> NAMD and
>>>>> >> > compared its benchmarks (scalability etc) with a entry level card
>>>>> like
>>>>> >> > GTX
>>>>> >> > 680. The cost of the Tesla is almost 3 times that of the GTX680.
>>>>> Does
>>>>> >> > its
>>>>> >> > performance justify its price?
>>>>> >>
>>>>> >> GTX 680 is not exactly "entry" level (more upper mid level) and you
>>>>> >> can't compare GPUs like that. you basically have different "chip
>>>>> >> families" and different "chip generations" GTX 680 is based on the
>>>>> >> "Kepler" generation, as are the Tesla K10 and Tesla K20, the C2075
>>>>> >> however is based on the previous generation called "Fermi". Now
>>>>> >> GeForce cards are usually spec'd rather aggressively and for use in
>>>>> >> video games and not for reliability in computing (which doesn't
>>>>> mean,
>>>>> >> they are unreliable, only that the vendors take a higher risk for
>>>>> >> lowering production costs and raising game performance).
>>>>> >>
>>>>> >> Also on GeForce cards certain functionality is not available (for
>>>>> >> example ECC memory configuration) or only in very limited way (for
>>>>> >> example double precision floating point math). Also, support through
>>>>> >> the nvidia-smi utility is limited. On the other hand, Tesla GPUs do
>>>>> >> have all of these benefits and also use "certified" and tested
>>>>> >> hardware components, often more RAM and have better warranty deals.
>>>>> >> All of this and the fact that they are produced and sold in smaller
>>>>> >> quantities result in higher costs.
>>>>> >>
>>>>> >> So whether the Tesla GPUs are worth the price or not depends on what
>>>>> >> you are looking for in a GPU. Classical MD can function very well
>>>>> with
>>>>> >> just limited double precision performance, since most of the force
>>>>> >> calculation can be done in single precision with only a small loss
>>>>> of
>>>>> >> accuracy (and would otherwise similarly offloaded to SSE and AVX
>>>>> >> vector instructions). Also the performance of classical MD is often
>>>>> as
>>>>> >> much dominated by memory bandwidth (looking up pairs of particles
>>>>> >> through the neighbor lists) as it is through compute performance.
>>>>> the
>>>>> >> fastest GeForce type GPUs often outperform the fastest Tesla cards
>>>>> of
>>>>> >> the same generation in classical MD due to their higher clocks and
>>>>> >> higher memory bandwidth. However, if you would also run applications
>>>>> >> that are dependent on double precision floating point, or prefer a
>>>>> low
>>>>> >> risk and better management and are willing for that the extra price,
>>>>> >> then the Tesla would be it.
>>>>> >>
>>>>> >> mind you, the Tesla K10 is a special beast in this zoo, since it is
>>>>> >> effectively a pimped up GeForce GTX690.
>>>>> >>
>>>>> >> > Any suggestions from the community would be greatly appreciated.
>>>>> >>
>>>>> >> multi-gpu machines are tricky business. you have to pay great
>>>>> >> attention to the chipset and how many full withd PCI-e slots are
>>>>> >> supported. for a 4-GPU machine, you usually need two CPUs and two
>>>>> >> southbridges (two GPUs per socket). some boards have only one
>>>>> >> southbridge and then support more full width PCI-e slots via PCIe
>>>>> >> bridge chips. those add a little latency and - when you use all GPUs
>>>>> >> at the same time - two GPUs have to share the bandwidth. since the
>>>>> >> host to GPU bandwidth affects NAMD performance, you have to test
>>>>> >> whether in that case a single 4 GPU machine or two machines with 2
>>>>> >> GPUs each are the better option (probably the latter). also you
>>>>> should
>>>>> >> make sure that the CPU memory bandwidth is not crippled (they come
>>>>> in
>>>>> >> different speeds).
>>>>> >>
>>>>> >> in short, there is no clear cut answer. many things depend on what
>>>>> >> *else* you want to do with the machine and there are many personal
>>>>> >> opinions that people are not 100% agreed upon. if you ask simply, is
>>>>> >> the performance of a tesla worth 3x the price (or more in the case
>>>>> of
>>>>> >> a K20), my personal opinion is "not at all", but i might still buy
>>>>> >> one, in case i come across an application and workflow that benefits
>>>>> >> from it.
>>>>> >>
>>>>> >> axel.
>>>>> >> >
>>>>> >> >
>>>>> >> > Regards
>>>>> >> >
>>>>> >> > Srivastav Ranganathan
>>>>> >> > Research Scholar
>>>>> >> > IIT Bombay,
>>>>> >> > Mumbai, India
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>> >> International Centre for Theoretical Physics, Trieste. Italy.
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Aron Broom M.Sc
>>>> PhD Student
>>>> Department of Chemistry
>>>> University of Waterloo
>>>>
>>>
>>>
>>
>>
>> --
>> Aron Broom M.Sc
>> PhD Student
>> Department of Chemistry
>> University of Waterloo
>>
>
>

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:14 CST