Re: Suggestions while building a GPU-machine (CUDA) for NAMD use!

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Tue May 28 2013 - 02:54:47 CDT

Hi Aditya:

OK, server motherboard. Expensive, but confirming that to have PCIE 3.0 one
cannot stick to consumer motherboards, for the time being. Anyway, why do
you mention a consumer GTX 680 for a server motherboard? Ask to hardware
experts (I am not that). It seems a bit mismatching. At any event, be
careful as to the 680 brand. I have posted before that changing brand gave
me a lot of problems.

Also, ask hardware experts whether the price of such a server-GPU is worth
while. Would not be better to go to CPU small cluster? With GPUs you are
still much limited. For example, no QM/MM. And GPUs are fragile, and
scarcely interpretable, tools (see my previous post).

regards
francesco

On Tue, May 28, 2013 at 9:23 AM, Aditya Ranganathan <aditya.sia_at_gmail.com>wrote:

> The motherboard that we are planning to install the gpus would be an Intel
> Workstation/Server Motherboard W2600 CR2 with Intel E5 2620 6 Core Xeon
> processors. Its not a consumer motherboard. Does that sound reasonable? Im
> a novice at this so might help to get more insight into this.
>
>
> On Tue, May 28, 2013 at 11:25 AM, Francesco Pietra <chiendarret_at_gmail.com>wrote:
>
>>
>> Sorry, I forgot the forum.
>>
>> On Tue, May 28, 2013 at 7:39 AM, Francesco Pietra <chiendarret_at_gmail.com>wrote:
>>
>>> Hi Aron:
>>> Thanks for the illustration of C-2075.
>>>
>>> In my opinion/experience the very point is PCIE 2.0 of current consumer
>>> mainboards. I would like to know why such a bottleneck was not corrected.
>>> It is a rather long time that GPUs at PCIE 3.0 (or stated to be so) are
>>> available. Why mainboards have not be brought to PCIE 3.0? Unless Aditya
>>> has found the right mainboard. We will see.
>>>
>>> With NAMD, I had two Zotac GTX 580 under AMD CPUs, no problems in a
>>> couple of years. Then increased the MD speed with two MSI GTX 680 under
>>> Intel GPU. In both cases, 6 CPUs per two GTX, mainboard PCIE 2.0;
>>> nonetheless, no much improvement by the second GTX. The MSI played
>>> correctly with a twofold increase of speed with respect to the 580 on a
>>> 200,0000 atom system. After a few months, problems (not understood) arose
>>> with one of the MSI. It plays games but it hangs under NAMD on computations
>>> longer that 100,000 steps at ts = 1fs. Especially so when coupled to the
>>> other MSI. Exchanging the two MSI on their sockets, always point to failure
>>> by the same MSI, so that I concluded that the mainbord is not responsible.
>>> I have a fight with the vendor, who claims that such GTXs are for games. I
>>> asked replacement with a Zotac equivalent, paying for the price difference.
>>> Still fight.
>>>
>>> As to server mainboards, I have old information, concerning a board from
>>> Supermicro, which, however, had a single socket at PCIE 3.0. Moreover, as
>>> it is well known, most MD code does not gain from server hardware, or not
>>> so much to justify the price difference.
>>>
>>> francesco
>>>
>>>
>>> On Mon, May 27, 2013 at 7:50 PM, Aron Broom <broomsday_at_gmail.com> wrote:
>>>
>>>> Hi Francesco,
>>>>
>>>> You're right, the memory isn't really much of a selling point for
>>>> current MD (it once was an issue in AMBER, but I think that has been
>>>> reduced greatly).
>>>>
>>>> Really the main point was that if the C-2075 was a price option and
>>>> being compared against the 680 purely for performance, then one might want
>>>> to also consider the titan as part of the price/performance comparison.
>>>>
>>>> ~Aron
>>>>
>>>>
>>>> On Mon, May 27, 2013 at 9:47 AM, Francesco Pietra <
>>>> chiendarret_at_gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, May 27, 2013 at 3:09 PM, Aron Broom <broomsday_at_gmail.com>wrote:
>>>>>
>>>>>> as Axel suggested, in terms of just performance, the C-2075 will be
>>>>>> about the same as a GTX480 in most cases. So from just a performance
>>>>>> standpoint the GTX680 will generally be better.
>>>>>>
>>>>>> I'm not sure how much a C-2075 costs currently, but if as you say you
>>>>>> are getting a PCIe 3.0 board, why not buy a Titan? You'll have even better
>>>>>> performance than the 680 and huge memory (6GB).
>>>>>>
>>>>>
>>>>> Aron:
>>>>> How much memory is used by a 680 GPU on a consumer motherboards
>>>>> (i.e., PCIE 2.0) for a proteic system of common size in explicit water,
>>>>> i.e., 200,000 atoms? Either signe or multiple GPU. More that some hundred
>>>>> MB? If more than that, how did you manage to accomplish that with
>>>>> NAMD2.9-CUDA4.0?
>>>>>
>>>>> thanks
>>>>> francesco
>>>>>
>>>>>
>>>>> Of course the memory quality issues compared to a K20x that Axel
>>>>>> brought up still exist, but if performance is your only concern...
>>>>>>
>>>>>> ~Aron
>>>>>>
>>>>>>
>>>>>> On Mon, May 27, 2013 at 5:58 AM, Axel Kohlmeyer <akohlmey_at_gmail.com>wrote:
>>>>>>
>>>>>>> On Mon, May 27, 2013 at 11:35 AM, Aditya Ranganathan
>>>>>>> <aditya.sia_at_gmail.com> wrote:
>>>>>>> > @Francesco, we are planning to buy a PCI-express 3.0 supported
>>>>>>> board. @Alex:
>>>>>>> > Thanks Alex for the comprehensive walthrough on this issue. We aim
>>>>>>> at
>>>>>>> > building this machine solely for performing classical
>>>>>>> MD-simulations using
>>>>>>> > NAMD. Reliability and scaling up issues of the a GeForce card like
>>>>>>> GTX680 is
>>>>>>> > what was cited as a possible disadvantage by our computer vendor
>>>>>>> while he
>>>>>>> > was suggesting the Tesla to be an option.
>>>>>>>
>>>>>>> > I have`nt been able to get any clear benchmarks for the TESLA
>>>>>>> C-2075 as of
>>>>>>> > yet. Most of the benchmarks seem to revolve around the Kepler
>>>>>>> series of
>>>>>>> > cards. If anyone is aware of those, please lead me to the NAMD
>>>>>>> benchmarks on
>>>>>>> > TESLA C-2075.
>>>>>>>
>>>>>>> look for benchmarks of C-2050. the C-2075 is tad faster. they are two
>>>>>>> different revisions of the fermi chip. the difference is similar to
>>>>>>> what a GTX 480 is to a GTX 580 (which are the corresponding consumer
>>>>>>> models). mind you. with the fermi generations, the consumer cards
>>>>>>> were
>>>>>>> more similar to the tesla cards than they are now. only the GeForce
>>>>>>> TITAN has a similar (or better?) relationship to the tesla K20. even
>>>>>>> the recently released GTX 780 has been deliberately "crippled" to
>>>>>>> massively reduce double precision floating point performance.
>>>>>>>
>>>>>>> the problem with the C-2075 is that it is using an already outdated
>>>>>>> architecture (with GPUs architectures change fast) which is as
>>>>>>> different from a kepler chips as perhaps a intel pentium 4 is from a
>>>>>>> current (ivy bridge) based intel i7 cpu.
>>>>>>>
>>>>>>> axel.
>>>>>>>
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Mon, May 27, 2013 at 2:27 PM, Axel Kohlmeyer <
>>>>>>> akohlmey_at_gmail.com> wrote:
>>>>>>> >>
>>>>>>> >> On Mon, May 27, 2013 at 10:14 AM, Aditya Ranganathan
>>>>>>> >> <aditya.sia_at_gmail.com> wrote:
>>>>>>> >> > Hello All,
>>>>>>> >> >
>>>>>>> >> > We are pondering over investing on a GPU based machine for
>>>>>>> running NAMD
>>>>>>> >> > simulations (all-atom). Currently, we are stuck with a dilemma
>>>>>>> over the
>>>>>>> >> > choice of card for CUDA computing. We already have a GTX 680
>>>>>>> which gives
>>>>>>> >> > us
>>>>>>> >> > about 3ns/day for a 100000 atom system using a single GPU card
>>>>>>> and 8 cpu
>>>>>>> >> > cores.
>>>>>>> >> >
>>>>>>> >> > Now, we are planning to build a GPU machine with 4 GPU cards
>>>>>>> (either
>>>>>>> >> > Tesla
>>>>>>> >> > C-2075C, 6GB GDDR5 or the NVIDIA GTX 680). The base system would
>>>>>>> >> > consists of
>>>>>>> >> > a 6-core Intel Xeon E5 2620 processor, 64GB DDR3 RAM and a 2TB
>>>>>>> Hard
>>>>>>> >> > Drive.
>>>>>>> >> >
>>>>>>> >> > Has anyone in the community used the Tesla series of cards with
>>>>>>> NAMD and
>>>>>>> >> > compared its benchmarks (scalability etc) with a entry level
>>>>>>> card like
>>>>>>> >> > GTX
>>>>>>> >> > 680. The cost of the Tesla is almost 3 times that of the
>>>>>>> GTX680. Does
>>>>>>> >> > its
>>>>>>> >> > performance justify its price?
>>>>>>> >>
>>>>>>> >> GTX 680 is not exactly "entry" level (more upper mid level) and
>>>>>>> you
>>>>>>> >> can't compare GPUs like that. you basically have different "chip
>>>>>>> >> families" and different "chip generations" GTX 680 is based on the
>>>>>>> >> "Kepler" generation, as are the Tesla K10 and Tesla K20, the C2075
>>>>>>> >> however is based on the previous generation called "Fermi". Now
>>>>>>> >> GeForce cards are usually spec'd rather aggressively and for use
>>>>>>> in
>>>>>>> >> video games and not for reliability in computing (which doesn't
>>>>>>> mean,
>>>>>>> >> they are unreliable, only that the vendors take a higher risk for
>>>>>>> >> lowering production costs and raising game performance).
>>>>>>> >>
>>>>>>> >> Also on GeForce cards certain functionality is not available (for
>>>>>>> >> example ECC memory configuration) or only in very limited way (for
>>>>>>> >> example double precision floating point math). Also, support
>>>>>>> through
>>>>>>> >> the nvidia-smi utility is limited. On the other hand, Tesla GPUs
>>>>>>> do
>>>>>>> >> have all of these benefits and also use "certified" and tested
>>>>>>> >> hardware components, often more RAM and have better warranty
>>>>>>> deals.
>>>>>>> >> All of this and the fact that they are produced and sold in
>>>>>>> smaller
>>>>>>> >> quantities result in higher costs.
>>>>>>> >>
>>>>>>> >> So whether the Tesla GPUs are worth the price or not depends on
>>>>>>> what
>>>>>>> >> you are looking for in a GPU. Classical MD can function very well
>>>>>>> with
>>>>>>> >> just limited double precision performance, since most of the force
>>>>>>> >> calculation can be done in single precision with only a small
>>>>>>> loss of
>>>>>>> >> accuracy (and would otherwise similarly offloaded to SSE and AVX
>>>>>>> >> vector instructions). Also the performance of classical MD is
>>>>>>> often as
>>>>>>> >> much dominated by memory bandwidth (looking up pairs of particles
>>>>>>> >> through the neighbor lists) as it is through compute performance.
>>>>>>> the
>>>>>>> >> fastest GeForce type GPUs often outperform the fastest Tesla
>>>>>>> cards of
>>>>>>> >> the same generation in classical MD due to their higher clocks and
>>>>>>> >> higher memory bandwidth. However, if you would also run
>>>>>>> applications
>>>>>>> >> that are dependent on double precision floating point, or prefer
>>>>>>> a low
>>>>>>> >> risk and better management and are willing for that the extra
>>>>>>> price,
>>>>>>> >> then the Tesla would be it.
>>>>>>> >>
>>>>>>> >> mind you, the Tesla K10 is a special beast in this zoo, since it
>>>>>>> is
>>>>>>> >> effectively a pimped up GeForce GTX690.
>>>>>>> >>
>>>>>>> >> > Any suggestions from the community would be greatly appreciated.
>>>>>>> >>
>>>>>>> >> multi-gpu machines are tricky business. you have to pay great
>>>>>>> >> attention to the chipset and how many full withd PCI-e slots are
>>>>>>> >> supported. for a 4-GPU machine, you usually need two CPUs and two
>>>>>>> >> southbridges (two GPUs per socket). some boards have only one
>>>>>>> >> southbridge and then support more full width PCI-e slots via PCIe
>>>>>>> >> bridge chips. those add a little latency and - when you use all
>>>>>>> GPUs
>>>>>>> >> at the same time - two GPUs have to share the bandwidth. since the
>>>>>>> >> host to GPU bandwidth affects NAMD performance, you have to test
>>>>>>> >> whether in that case a single 4 GPU machine or two machines with 2
>>>>>>> >> GPUs each are the better option (probably the latter). also you
>>>>>>> should
>>>>>>> >> make sure that the CPU memory bandwidth is not crippled (they
>>>>>>> come in
>>>>>>> >> different speeds).
>>>>>>> >>
>>>>>>> >> in short, there is no clear cut answer. many things depend on what
>>>>>>> >> *else* you want to do with the machine and there are many personal
>>>>>>> >> opinions that people are not 100% agreed upon. if you ask simply,
>>>>>>> is
>>>>>>> >> the performance of a tesla worth 3x the price (or more in the
>>>>>>> case of
>>>>>>> >> a K20), my personal opinion is "not at all", but i might still buy
>>>>>>> >> one, in case i come across an application and workflow that
>>>>>>> benefits
>>>>>>> >> from it.
>>>>>>> >>
>>>>>>> >> axel.
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > Regards
>>>>>>> >> >
>>>>>>> >> > Srivastav Ranganathan
>>>>>>> >> > Research Scholar
>>>>>>> >> > IIT Bombay,
>>>>>>> >> > Mumbai, India
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>>>> >> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0
>>>>>>> International Centre for Theoretical Physics, Trieste. Italy.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Aron Broom M.Sc
>>>>>> PhD Student
>>>>>> Department of Chemistry
>>>>>> University of Waterloo
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Aron Broom M.Sc
>>>> PhD Student
>>>> Department of Chemistry
>>>> University of Waterloo
>>>>
>>>
>>>
>>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:16 CST