Re: unusual, periodic crash in Linux FC3/GM/MPI/CHARM/NAMD

From: Dan Strahs (dstrahs_at_pace.edu)
Date: Thu Feb 09 2006 - 18:41:29 CST

Thanks for the response.

1) The file sizes of the dcd/veldcd files are ~680 Mbyte. This is
dissimilar to the 2Gbyte limit in some systems, so probably not.

2) These are the largest files generated on this system; other runs were
either shorter (only 2ns) or on a different OS (RedHat 9). Size could thus
be a factor.

3) The system is large: ~130,000 atoms. I only have 10 processors, so NAMD
runs at an average of 0.591374 sec/step; thus 2179310 steps takes
~1,288,788 seconds, or ~21500 minutes or ~360 hours or ~15 days. These
don't appear to significant numbers (powers of 2 or anything like that).

On the other hand, this is not the longest simulation on the cluster:
under the previous 32-bit OS, the cluster ran at a slower 0.795292
sec/step; thus 2000000 steps took 1,590,584 seconds or ~18.5 days.

If you have any more ideas, please throw them my way!

Dan Strahs

On Thu, 9 Feb 2006 jonathan_at_ibt.unam.mx wrote:

> Hello. Seems to me that the fact that you can't run NAMD for longer than 2 ns is
> either a file size issue or a process length permission issue.
> How big are the 2.17931 ns DCD files? Are their sizes coincidentally close to an
> integer value (e.g. 2.0 GB)? If so, you may be looking at a file size limit
> inherent to your operating system (which I doubt, since I'm running FC4 x86_64
> and I haven't encountered such a problem). Have you previously generated files
> that big?
> On the other hand, how long does it take NAMD to run with your system for
> 2.17931 ns? Is it suspiciously near an integer number of hours (e.g. 20 h)? If
> so, you might have a restriction on the amount of time your user can run a
> process for. Have you previously run a process for longer than that?
> These are my two cents. HTH.
>
> J. Valencia
>
> ----------------------------------------------------------------
> Este mensaje fue enviado desde el servidor Webmail del Instituto de Biotecnologia.
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:43:17 CST