Re: Suggestions while building a GPU-machine (CUDA) for NAMD use!

From: Aditya Ranganathan (aditya.sia_at_gmail.com)
Date: Tue May 28 2013 - 02:23:39 CDT

The motherboard that we are planning to install the gpus would be an Intel
Workstation/Server Motherboard W2600 CR2 with Intel E5 2620 6 Core Xeon
processors. Its not a consumer motherboard. Does that sound reasonable? Im
a novice at this so might help to get more insight into this.

On Tue, May 28, 2013 at 11:25 AM, Francesco Pietra <chiendarret_at_gmail.com>wrote:

>
> Sorry, I forgot the forum.
>
> On Tue, May 28, 2013 at 7:39 AM, Francesco Pietra <chiendarret_at_gmail.com>wrote:
>
>> Hi Aron:
>> Thanks for the illustration of C-2075.
>>
>> In my opinion/experience the very point is PCIE 2.0 of current consumer
>> mainboards. I would like to know why such a bottleneck was not corrected.
>> It is a rather long time that GPUs at PCIE 3.0 (or stated to be so) are
>> available. Why mainboards have not be brought to PCIE 3.0? Unless Aditya
>> has found the right mainboard. We will see.
>>
>> With NAMD, I had two Zotac GTX 580 under AMD CPUs, no problems in a
>> couple of years. Then increased the MD speed with two MSI GTX 680 under
>> Intel GPU. In both cases, 6 CPUs per two GTX, mainboard PCIE 2.0;
>> nonetheless, no much improvement by the second GTX. The MSI played
>> correctly with a twofold increase of speed with respect to the 580 on a
>> 200,0000 atom system. After a few months, problems (not understood) arose
>> with one of the MSI. It plays games but it hangs under NAMD on computations
>> longer that 100,000 steps at ts = 1fs. Especially so when coupled to the
>> other MSI. Exchanging the two MSI on their sockets, always point to failure
>> by the same MSI, so that I concluded that the mainbord is not responsible.
>> I have a fight with the vendor, who claims that such GTXs are for games. I
>> asked replacement with a Zotac equivalent, paying for the price difference.
>> Still fight.
>>
>> As to server mainboards, I have old information, concerning a board from
>> Supermicro, which, however, had a single socket at PCIE 3.0. Moreover, as
>> it is well known, most MD code does not gain from server hardware, or not
>> so much to justify the price difference.
>>
>> francesco
>>
>>
>> On Mon, May 27, 2013 at 7:50 PM, Aron Broom <broomsday_at_gmail.com> wrote:
>>
>>> Hi Francesco,
>>>
>>> You're right, the memory isn't really much of a selling point for
>>> current MD (it once was an issue in AMBER, but I think that has been
>>> reduced greatly).
>>>
>>> Really the main point was that if the C-2075 was a price option and
>>> being compared against the 680 purely for performance, then one might want
>>> to also consider the titan as part of the price/performance comparison.
>>>
>>> ~Aron
>>>
>>>
>>> On Mon, May 27, 2013 at 9:47 AM, Francesco Pietra <chiendarret_at_gmail.com
>>> > wrote:
>>>
>>>>
>>>>
>>>> On Mon, May 27, 2013 at 3:09 PM, Aron Broom <broomsday_at_gmail.com>wrote:
>>>>
>>>>> as Axel suggested, in terms of just performance, the C-2075 will be
>>>>> about the same as a GTX480 in most cases. So from just a performance
>>>>> standpoint the GTX680 will generally be better.
>>>>>
>>>>> I'm not sure how much a C-2075 costs currently, but if as you say you
>>>>> are getting a PCIe 3.0 board, why not buy a Titan? You'll have even better
>>>>> performance than the 680 and huge memory (6GB).
>>>>>
>>>>
>>>> Aron:
>>>> How much memory is used by a 680 GPU on a consumer motherboards
>>>> (i.e., PCIE 2.0) for a proteic system of common size in explicit water,
>>>> i.e., 200,000 atoms? Either signe or multiple GPU. More that some hundred
>>>> MB? If more than that, how did you manage to accomplish that with
>>>> NAMD2.9-CUDA4.0?
>>>>
>>>> thanks
>>>> francesco
>>>>
>>>>
>>>> Of course the memory quality issues compared to a K20x that Axel
>>>>> brought up still exist, but if performance is your only concern...
>>>>>
>>>>> ~Aron
>>>>>
>>>>>
>>>>> On Mon, May 27, 2013 at 5:58 AM, Axel Kohlmeyer <akohlmey_at_gmail.com>wrote:
>>>>>
>>>>>> On Mon, May 27, 2013 at 11:35 AM, Aditya Ranganathan
>>>>>> <aditya.sia_at_gmail.com> wrote:
>>>>>> > @Francesco, we are planning to buy a PCI-express 3.0 supported
>>>>>> board. @Alex:
>>>>>> > Thanks Alex for the comprehensive walthrough on this issue. We aim
>>>>>> at
>>>>>> > building this machine solely for performing classical
>>>>>> MD-simulations using
>>>>>> > NAMD. Reliability and scaling up issues of the a GeForce card like
>>>>>> GTX680 is
>>>>>> > what was cited as a possible disadvantage by our computer vendor
>>>>>> while he
>>>>>> > was suggesting the Tesla to be an option.
>>>>>>
>>>>>> > I have`nt been able to get any clear benchmarks for the TESLA
>>>>>> C-2075 as of
>>>>>> > yet. Most of the benchmarks seem to revolve around the Kepler
>>>>>> series of
>>>>>> > cards. If anyone is aware of those, please lead me to the NAMD
>>>>>> benchmarks on
>>>>>> > TESLA C-2075.
>>>>>>
>>>>>> look for benchmarks of C-2050. the C-2075 is tad faster. they are two
>>>>>> different revisions of the fermi chip. the difference is similar to
>>>>>> what a GTX 480 is to a GTX 580 (which are the corresponding consumer
>>>>>> models). mind you. with the fermi generations, the consumer cards were
>>>>>> more similar to the tesla cards than they are now. only the GeForce
>>>>>> TITAN has a similar (or better?) relationship to the tesla K20. even
>>>>>> the recently released GTX 780 has been deliberately "crippled" to
>>>>>> massively reduce double precision floating point performance.
>>>>>>
>>>>>> the problem with the C-2075 is that it is using an already outdated
>>>>>> architecture (with GPUs architectures change fast) which is as
>>>>>> different from a kepler chips as perhaps a intel pentium 4 is from a
>>>>>> current (ivy bridge) based intel i7 cpu.
>>>>>>
>>>>>> axel.
>>>>>>
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, May 27, 2013 at 2:27 PM, Axel Kohlmeyer <akohlmey_at_gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> On Mon, May 27, 2013 at 10:14 AM, Aditya Ranganathan
>>>>>> >> <aditya.sia_at_gmail.com> wrote:
>>>>>> >> > Hello All,
>>>>>> >> >
>>>>>> >> > We are pondering over investing on a GPU based machine for
>>>>>> running NAMD
>>>>>> >> > simulations (all-atom). Currently, we are stuck with a dilemma
>>>>>> over the
>>>>>> >> > choice of card for CUDA computing. We already have a GTX 680
>>>>>> which gives
>>>>>> >> > us
>>>>>> >> > about 3ns/day for a 100000 atom system using a single GPU card
>>>>>> and 8 cpu
>>>>>> >> > cores.
>>>>>> >> >
>>>>>> >> > Now, we are planning to build a GPU machine with 4 GPU cards
>>>>>> (either
>>>>>> >> > Tesla
>>>>>> >> > C-2075C, 6GB GDDR5 or the NVIDIA GTX 680). The base system would
>>>>>> >> > consists of
>>>>>> >> > a 6-core Intel Xeon E5 2620 processor, 64GB DDR3 RAM and a 2TB
>>>>>> Hard
>>>>>> >> > Drive.
>>>>>> >> >
>>>>>> >> > Has anyone in the community used the Tesla series of cards with
>>>>>> NAMD and
>>>>>> >> > compared its benchmarks (scalability etc) with a entry level
>>>>>> card like
>>>>>> >> > GTX
>>>>>> >> > 680. The cost of the Tesla is almost 3 times that of the GTX680.
>>>>>> Does
>>>>>> >> > its
>>>>>> >> > performance justify its price?
>>>>>> >>
>>>>>> >> GTX 680 is not exactly "entry" level (more upper mid level) and you
>>>>>> >> can't compare GPUs like that. you basically have different "chip
>>>>>> >> families" and different "chip generations" GTX 680 is based on the
>>>>>> >> "Kepler" generation, as are the Tesla K10 and Tesla K20, the C2075
>>>>>> >> however is based on the previous generation called "Fermi". Now
>>>>>> >> GeForce cards are usually spec'd rather aggressively and for use in
>>>>>> >> video games and not for reliability in computing (which doesn't
>>>>>> mean,
>>>>>> >> they are unreliable, only that the vendors take a higher risk for
>>>>>> >> lowering production costs and raising game performance).
>>>>>> >>
>>>>>> >> Also on GeForce cards certain functionality is not available (for
>>>>>> >> example ECC memory configuration) or only in very limited way (for
>>>>>> >> example double precision floating point math). Also, support
>>>>>> through
>>>>>> >> the nvidia-smi utility is limited. On the other hand, Tesla GPUs do
>>>>>> >> have all of these benefits and also use "certified" and tested
>>>>>> >> hardware components, often more RAM and have better warranty deals.
>>>>>> >> All of this and the fact that they are produced and sold in smaller
>>>>>> >> quantities result in higher costs.
>>>>>> >>
>>>>>> >> So whether the Tesla GPUs are worth the price or not depends on
>>>>>> what
>>>>>> >> you are looking for in a GPU. Classical MD can function very well
>>>>>> with
>>>>>> >> just limited double precision performance, since most of the force
>>>>>> >> calculation can be done in single precision with only a small loss
>>>>>> of
>>>>>> >> accuracy (and would otherwise similarly offloaded to SSE and AVX
>>>>>> >> vector instructions). Also the performance of classical MD is
>>>>>> often as
>>>>>> >> much dominated by memory bandwidth (looking up pairs of particles
>>>>>> >> through the neighbor lists) as it is through compute performance.
>>>>>> the
>>>>>> >> fastest GeForce type GPUs often outperform the fastest Tesla cards
>>>>>> of
>>>>>> >> the same generation in classical MD due to their higher clocks and
>>>>>> >> higher memory bandwidth. However, if you would also run
>>>>>> applications
>>>>>> >> that are dependent on double precision floating point, or prefer a
>>>>>> low
>>>>>> >> risk and better management and are willing for that the extra
>>>>>> price,
>>>>>> >> then the Tesla would be it.
>>>>>> >>
>>>>>> >> mind you, the Tesla K10 is a special beast in this zoo, since it is
>>>>>> >> effectively a pimped up GeForce GTX690.
>>>>>> >>
>>>>>> >> > Any suggestions from the community would be greatly appreciated.
>>>>>> >>
>>>>>> >> multi-gpu machines are tricky business. you have to pay great
>>>>>> >> attention to the chipset and how many full withd PCI-e slots are
>>>>>> >> supported. for a 4-GPU machine, you usually need two CPUs and two
>>>>>> >> southbridges (two GPUs per socket). some boards have only one
>>>>>> >> southbridge and then support more full width PCI-e slots via PCIe
>>>>>> >> bridge chips. those add a little latency and - when you use all
>>>>>> GPUs
>>>>>> >> at the same time - two GPUs have to share the bandwidth. since the
>>>>>> >> host to GPU bandwidth affects NAMD performance, you have to test
>>>>>> >> whether in that case a single 4 GPU machine or two machines with 2
>>>>>> >> GPUs each are the better option (probably the latter). also you
>>>>>> should
>>>>>> >> make sure that the CPU memory bandwidth is not crippled (they come
>>>>>> in
>>>>>> >> different speeds).
>>>>>> >>
>>>>>> >> in short, there is no clear cut answer. many things depend on what
>>>>>> >> *else* you want to do with the machine and there are many personal
>>>>>> >> opinions that people are not 100% agreed upon. if you ask simply,
>>>>>> is
>>>>>> >> the performance of a tesla worth 3x the price (or more in the case
>>>>>> of
>>>>>> >> a K20), my personal opinion is "not at all", but i might still buy
>>>>>> >> one, in case i come across an application and workflow that
>>>>>> benefits
>>>>>> >> from it.
>>>>>> >>
>>>>>> >> axel.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > Regards
>>>>>> >> >
>>>>>> >> > Srivastav Ranganathan
>>>>>> >> > Research Scholar
>>>>>> >> > IIT Bombay,
>>>>>> >> > Mumbai, India
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>>> >> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>>> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Aron Broom M.Sc
>>>>> PhD Student
>>>>> Department of Chemistry
>>>>> University of Waterloo
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Aron Broom M.Sc
>>> PhD Student
>>> Department of Chemistry
>>> University of Waterloo
>>>
>>
>>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:16 CST