AW: AW: 2CPU+1GPU vs 1CPU+2GPU

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Feb 14 2012 - 06:14:22 CST

Hi Nicholas,

yes, that's what I tried to point out. Nobody seems to really know. But most
people use ECC on CPUs in HPC also, there must be a reason for that. In case
of NAMD I would guess it's a question of system size following your
explanations. For really small mechanics it could matter then, a bigger can
better offset. ECC corrects flipped bits, and in the binary system this can
cause little or dramatic change. The question is, can they change results. I
think it’s a difference if a number is 00000001(1) or 10000001(129) to only
show a byte number and these flips can occur everywhere in the RAM also on
forcefield parameters that are stored there.

I don't know the project you mentioned, but if it is distributed computing,
I would have implemented an error correction there (in simplest way double
computation on different nodes) as they for sure did also, because it can be
manipulated.

Feel free to correct me.

Cheers

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von Nicholas M Glykos
> Gesendet: Dienstag, 14. Februar 2012 11:04
> An: Marcel UJI (IMAP)
> Cc: Norman Geist; namd-l_at_ks.uiuc.edu
> Betreff: Re: AW: namd-l: 2CPU+1GPU vs 1CPU+2GPU
>
>
> Dear Marcel, Norman, List,
>
> I'll play devil's advocate, bear with me. Measuring (and demonstrating)
> memory errors with memtest does nothing to answer the important
> question :
> Do these errors change the average long-term dehaviour (and derived
> quantites) from the simulations, or they just add (as white noise)
> another
> source of chaotropic behaviour in an already chaotic system ? I would
> argue that if the memory errors are trully random, then they can not be
> correlated with the aim of any given simulation, and, thus, can not be
> held responsible for things working out "incredibly great" or
> otherwise.
> If I were to offer an example in support of this thesis, I would
> probably
> quote the results obtained on folding simulations by the Shaw group
> (the
> Science 2010 paper) using the Anton machine which to my knowledge
> (please
> do correct me if I'm wrong) does not use ECC memory. Although I'm not
> advocating the incorporation of avoidable errors in calculations, I do
> feel that solid evidence for the effect of these errors on the MD-
> derived
> quantities is missing.
>
> My twocents,
> Nicholas
>
>
>
> On Tue, 14 Feb 2012, Marcel UJI (IMAP) wrote:
>
> > Yes I have found other sources with similar results (see
> > http://www.cs.stanford.edu/people/ihaque/talks/resilience-2010.pdf),
> so
> > I think I will finally go for those Tesla cards.
> >
> > Thank you all for your help!
> >
> > Marcel
> >
> > Al 14/02/12 08:18, En/na Norman Geist ha escrit:
> > >
> > > Hi,
> > >
> > >
> > >
> > > I just wanted to add that I was pretty surprised when I first saw
> the
> > > ECC error counters on my Tesla C2050. Well in fact it's the total
> of
> > > double bit and I never investigated their occurrence but I would
> only
> > > go without ECC with some belly aches because everything that
> doesn't
> > > work or behave strange in your simulations, or even what works
> > > incredibly great can come due to artifacts of memory errors, that
> > > might sound a little overdone, but is possible. For what else,
> except
> > > of reliability, ECC has been developed. But I'm really not sure
> what
> > > influence those errors can really have, but with ecc you have one
> > > thing less to survey when problems occur.
> > >
> > >
> > >
> > > Best wishes
> > >
> > >
> > >
> > > Norman Geist.
> > >
> > >
> > >
> > > *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu]
> *Im
> > > Auftrag von *Ajasja Ljubetic
> > > *Gesendet:* Montag, 13. Februar 2012 16:09
> > > *Cc:* Marcel UJI (IMAP); namd-l_at_ks.uiuc.edu
> > > *Betreff:* Re: namd-l: 2CPU+1GPU vs 1CPU+2GPU
> > >
> > >
> > >
> > > One final thing. I've done some benchmarking with a AMD 6-core
> > > desktop and a GTX-570 and it ends up being about equal
> (slightly
> > > faster) than a 6-core xeon with an M2070. You can buy a 3GB
> > > GTX580 for a fraction of the price of a M series card, and an
> AMD
> > > CPU (particularly the 3 GHz 6-core Thubans) will be close to
> half
> > > the price of the intel. While I'm sure the intel chip is
> > > generally superior to the AMD one, it doesn't seem to be a
> factor
> > > when running NAMD. So I would say buy two desktops and save
> > > yourself money and also gain performance. I know there is the
> > > lack of ECC memory with the GTX series, but I'm really not
> > > convinced that is a big issue for MD (maybe someone on the list
> > > has a different opinion).
> > >
> > >
> > >
> > > I'm running my simulations on several GTX 560 Ti for half a year
> now
> > > and it works great! So I would back up this advice.
> > >
> > >
> > >
> > > Best regards,
> > >
> > > Ajasja
> > >
> > >
> > >
> > > ~Aron
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Feb 13, 2012 at 6:44 AM, Nicholas M Glykos
> > > <glykos_at_mbg.duth.gr <mailto:glykos_at_mbg.duth.gr>> wrote:
> > >
> > >
> > >
> > > You will (hopefully) hear from Axel on this, but :
> > >
> > >
> > > > as it would give more speed for our NAMD based simulations
> > >
> > > Is this an assumption or the result of benchmarking the two
> hardware
> > > configurations with your intended system sizes ? For small
> (atom-wise)
> > > systems, you shouldn't expect much improvement by increasing
> the
> > > number of
> > > GPUs (and for tiny systems the 1CPU+2GPU may not scale at all).
> > >
> > > My twocents,
> > > Nicholas
> > >
> > >
> > > --
> > >
> > >
> > > Nicholas M. Glykos, Department of Molecular Biology
> > > and Genetics, Democritus University of Thrace, University
> Campus,
> > > Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office)
> > > +302551030620 <tel:%2B302551030620>,
> > > Ext.77620, Tel (lab) +302551030615 <tel:%2B302551030615>,
> > > http://utopia.duth.gr/~glykos/
> <http://utopia.duth.gr/%7Eglykos/>
> > >
> > >
> > >
> > > --
> > > Aron Broom M.Sc
> > > PhD Student
> > > Department of Chemistry
> > > University of Waterloo
> > >
> > >
> > >
> >
> >
>
> --
>
>
> Nicholas M. Glykos, Department of Molecular Biology
> and Genetics, Democritus University of Thrace, University Campus,
> Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office)
> +302551030620,
> Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/~glykos/

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:21:39 CST