Re: AW: 2CPU+1GPU vs 1CPU+2GPU

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Tue Feb 14 2012 - 08:19:59 CST

Next message: Axel Kohlmeyer: "Re: AW: 2CPU+1GPU vs 1CPU+2GPU"
Previous message: Norman Geist: "AW: AW: 2CPU+1GPU vs 1CPU+2GPU"
Maybe in reply to: Norman Geist: "AW: 2CPU+1GPU vs 1CPU+2GPU"
Next in thread: Norman Geist: "AW: AW: 2CPU+1GPU vs 1CPU+2GPU"
Reply: Norman Geist: "AW: AW: 2CPU+1GPU vs 1CPU+2GPU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

On Tue, Feb 14, 2012 at 7:14 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:
> Hi Nicholas,
>
> yes, that's what I tried to point out. Nobody seems to really know. But most
> people use ECC on CPUs in HPC also, there must be a reason for that. In case

this is because there are no alternatives. there is no market
for data center hardware without. please keep in mind, that
the bulk of the data center hardware is sold not to researchers,
but to companies and they *do* need as perfect reliability
as they can afford.

> of NAMD I would guess it's a question of system size following your
> explanations. For really small mechanics it could matter then, a bigger can
> better offset. ECC corrects flipped bits, and in the binary system this can
> cause little or dramatic change. The question is, can they change results. I
> think it’s a difference if a number is 00000001(1) or 10000001(129) to only
> show a byte number and these flips can occur everywhere in the RAM also on
> forcefield parameters that are stored there.

i congratulate you on being a subject to the recent FUD
tactics of computer sales droids (and indirectly of nvidia,
since most of those guys don't know what they are talking
about and just quote what they get told).

yes, bitflips can have a dramatic effect, but i consider them
negligible compared to all the *systematic* errors that you
are including in your calculations without worrying. e.g.
there are the truncation errors through cutoffs, there are
the errors through using a multiple time step integrator,
and using discrete time steps in the first place, and with
using GPUs, you have a more significant truncation error
through using single precision math on the GPU.
compared to that, the worry about bitflips is small change.

on top of that, in my personal experience (and i do you
a machine with 32 Tesla C2050s, am and have been using
several GeForce and also older C1060 and S1070 GPUs)
the typical scenario is that either you have very, very rare
random bit flips (i have not seen an ECC error flagged on
our hardware for a very long time) or you have a damaged
memory cell and *that* you can catch quickly, e.g. by
running cuda_memtest for one iteration (which is what
people did with older hardware).

however, to run GPUs reliably like this, you need proper
cooling and *there* is the biggest risk in my personal
opinion, since by default GPUs are using a fan control
regiment that keeps the noise down, but not the heat.

i have come up with a little hack, that can operate our
GPUs at over 20C lower core temperature and thus
massively increases the reliability.
http://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness

now lets move on to the question at hand
of the original poster.

i.e. purchasing a small amount of workstation
type hardware. first of all, i am surprised to
see an M-series GPU offered. those are
passively cooled and thus can only be used
in a properly certified case with proper ventilation
and in a temperature controlled environment.
so i would worry, if you can operate that machine
at all. a C-series or a GeForce type GPU sounds
more suitable to me. for a workstation, in the current
situation, i would go for a couple of 3GB GeForce
GTX 580 and just buy a spare GPU on the side for
each box right away and then use the money saved
to crank up the RAM in the machine, so you can
use them for analysis as well. it is the best bang
for the buck. just don't take a vendor overclocked
version and don't take the cheapest variant. those
cards have such an advantage in speed over any
Tesla based offering that it is worth it, and you are
not running hundreds of them, so that any higher
probability of hardware issues would cost you
too much time.

if, however, you consider this too much of a risk,
they i would recommend to not purchase a GPU
at all, but get a machine with 4-way 8-core AMD
Opteron machines (resist the temptation of going
to higher core counts, it is not really worth the
money). you can put together an extremely powerful
and affordable NAMD workstation with this hardware,
and if you want to be really stingy, you can save
on the memory (i.e. get 32GB RAM) as well and
perhaps get even more than the two machines
that were mentioned. that would give you the
advantage of not having to worry about the GPUs
at all and not suffering from any current limitations
that the GPU kernels in NAMD have.

cheers,
axel.

> I don't know the project you mentioned, but if it is distributed computing,
> I would have implemented an error correction there (in simplest way double
> computation on different nodes) as they for sure did also, because it can be
> manipulated.
>
> Feel free to correct me.
>
> Cheers
>
> Norman Geist.
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag von Nicholas M Glykos
>> Gesendet: Dienstag, 14. Februar 2012 11:04
>> An: Marcel UJI (IMAP)
>> Cc: Norman Geist; namd-l_at_ks.uiuc.edu
>> Betreff: Re: AW: namd-l: 2CPU+1GPU vs 1CPU+2GPU
>>
>>
>> Dear Marcel, Norman, List,
>>
>> I'll play devil's advocate, bear with me. Measuring (and demonstrating)
>> memory errors with memtest does nothing to answer the important
>> question :
>> Do these errors change the average long-term dehaviour (and derived
>> quantites) from the simulations, or they just add (as white noise)
>> another
>> source of chaotropic behaviour in an already chaotic system ? I would
>> argue that if the memory errors are trully random, then they can not be
>> correlated with the aim of any given simulation, and, thus, can not be
>> held responsible for things working out "incredibly great" or
>> otherwise.
>> If I were to offer an example in support of this thesis, I would
>> probably
>> quote the results obtained on folding simulations by the Shaw group
>> (the
>> Science 2010 paper) using the Anton machine which to my knowledge
>> (please
>> do correct me if I'm wrong) does not use ECC memory. Although I'm not
>> advocating the incorporation of avoidable errors in calculations, I do
>> feel that solid evidence for the effect of these errors on the MD-
>> derived
>> quantities is missing.
>>
>> My twocents,
>> Nicholas
>>
>>
>>
>> On Tue, 14 Feb 2012, Marcel UJI (IMAP) wrote:
>>
>> > Yes I have found other sources with similar results (see
>> > http://www.cs.stanford.edu/people/ihaque/talks/resilience-2010.pdf),
>> so
>> > I think I will finally go for those Tesla cards.
>> >
>> > Thank you all for your help!
>> >
>> > Marcel
>> >
>> > Al 14/02/12 08:18, En/na Norman Geist ha escrit:
>> > >
>> > > Hi,
>> > >
>> > >
>> > >
>> > > I just wanted to add that I was pretty surprised when I first saw
>> the
>> > > ECC error counters on my Tesla C2050. Well in fact it's the total
>> of
>> > > double bit and I never investigated their occurrence but I would
>> only
>> > > go without ECC with some belly aches because everything that
>> doesn't
>> > > work or behave strange in your simulations, or even what works
>> > > incredibly great can come due to artifacts of memory errors, that
>> > > might sound a little overdone, but is possible. For what else,
>> except
>> > > of reliability, ECC has been developed. But I'm really not sure
>> what
>> > > influence those errors can really have, but with ecc you have one
>> > > thing less to survey when problems occur.
>> > >
>> > >
>> > >
>> > > Best wishes
>> > >
>> > >
>> > >
>> > > Norman Geist.
>> > >
>> > >
>> > >
>> > > *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu]
>> *Im
>> > > Auftrag von *Ajasja Ljubetic
>> > > *Gesendet:* Montag, 13. Februar 2012 16:09
>> > > *Cc:* Marcel UJI (IMAP); namd-l_at_ks.uiuc.edu
>> > > *Betreff:* Re: namd-l: 2CPU+1GPU vs 1CPU+2GPU
>> > >
>> > >
>> > >
>> > > One final thing. I've done some benchmarking with a AMD 6-core
>> > > desktop and a GTX-570 and it ends up being about equal
>> (slightly
>> > > faster) than a 6-core xeon with an M2070. You can buy a 3GB
>> > > GTX580 for a fraction of the price of a M series card, and an
>> AMD
>> > > CPU (particularly the 3 GHz 6-core Thubans) will be close to
>> half
>> > > the price of the intel. While I'm sure the intel chip is
>> > > generally superior to the AMD one, it doesn't seem to be a
>> factor
>> > > when running NAMD. So I would say buy two desktops and save
>> > > yourself money and also gain performance. I know there is the
>> > > lack of ECC memory with the GTX series, but I'm really not
>> > > convinced that is a big issue for MD (maybe someone on the list
>> > > has a different opinion).
>> > >
>> > >
>> > >
>> > > I'm running my simulations on several GTX 560 Ti for half a year
>> now
>> > > and it works great! So I would back up this advice.
>> > >
>> > >
>> > >
>> > > Best regards,
>> > >
>> > > Ajasja
>> > >
>> > >
>> > >
>> > > ~Aron
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Feb 13, 2012 at 6:44 AM, Nicholas M Glykos
>> > > <glykos_at_mbg.duth.gr <mailto:glykos_at_mbg.duth.gr>> wrote:
>> > >
>> > >
>> > >
>> > > You will (hopefully) hear from Axel on this, but :
>> > >
>> > >
>> > > > as it would give more speed for our NAMD based simulations
>> > >
>> > > Is this an assumption or the result of benchmarking the two
>> hardware
>> > > configurations with your intended system sizes ? For small
>> (atom-wise)
>> > > systems, you shouldn't expect much improvement by increasing
>> the
>> > > number of
>> > > GPUs (and for tiny systems the 1CPU+2GPU may not scale at all).
>> > >
>> > > My twocents,
>> > > Nicholas
>> > >
>> > >
>> > > --
>> > >
>> > >
>> > > Nicholas M. Glykos, Department of Molecular Biology
>> > > and Genetics, Democritus University of Thrace, University
>> Campus,
>> > > Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office)
>> > > +302551030620 <tel:%2B302551030620>,
>> > > Ext.77620, Tel (lab) +302551030615 <tel:%2B302551030615>,
>> > > http://utopia.duth.gr/~glykos/
>> <http://utopia.duth.gr/%7Eglykos/>
>> > >
>> > >
>> > >
>> > > --
>> > > Aron Broom M.Sc
>> > > PhD Student
>> > > Department of Chemistry
>> > > University of Waterloo
>> > >
>> > >
>> > >
>> >
>> >
>>
>> --
>>
>>
>> Nicholas M. Glykos, Department of Molecular Biology
>> and Genetics, Democritus University of Thrace, University Campus,
>> Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office)
>> +302551030620,
>> Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/
>
>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
College of Science and Technology
Temple University, Philadelphia PA, USA.

Next message: Axel Kohlmeyer: "Re: AW: 2CPU+1GPU vs 1CPU+2GPU"
Previous message: Norman Geist: "AW: AW: 2CPU+1GPU vs 1CPU+2GPU"
Maybe in reply to: Norman Geist: "AW: 2CPU+1GPU vs 1CPU+2GPU"
Next in thread: Norman Geist: "AW: AW: 2CPU+1GPU vs 1CPU+2GPU"
Reply: Norman Geist: "AW: AW: 2CPU+1GPU vs 1CPU+2GPU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:12 CST