Re: namd-cuda-intel vs. namd-intel

From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Fri May 15 2009 - 10:41:39 CDT

On Fri, 2009-05-15 at 11:25 -0400, Dow_Hurst wrote:

dow,

> I apologize for posting misleading info as all errors were mine and
> certainly not his. However, this does raise concerns over long
> computations and the possibility of small RAM errors.

the same is true for most desktop (type) machines that are used for
simulations. there is some good news - at least for now - in the fact,
that MD is somewhat of a "conservative problem". i.e. if there was a
certain random fluctuation in a force or position, then it would be
just one more fluctuation amongst others. if the fluctuation was too
large, the job would crash due to floating point overflows. GPUs are
a bit trickier in this case, since there is no kernel running on
them that watches over the compute "task" and can detect when something
went wrong. with the current size of clusters and problems, this will
be an increasingly common but not prohibitively bad scenario.

for the long run, however, one will have to think about fault-tolerant
setups. where "tasklets" or "compute objects" are 'tried and committed'
rather than just launched. i am pretty certain that the charm++
developers are already working towards that goal. ;)
their framework is certainly more suitable to this than the more
typical minimalistic MPI parallelization in many other codes.

cheers,
   axel.

> Best wishes,
> Dow
>

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:49 CST