Re: NAMD breaks with wrong timestep

From: Brian Radak (bradak_at_anl.gov)
Date: Fri Oct 21 2016 - 10:41:46 CDT

I think you are hitting the limit of 32 bit signed integers somewhere
(2147483647). There is not always good habit in the code of using
unsigned integers where applicable, probably because the step isn't
really used for anything other than checking the output frequency.

It might not be satisfying, but you can probably solve this by using the
"firstTimestep" command to reset the count.

HTH,

Brian

On 10/21/2016 08:23 AM, Götz, Alexander wrote:
>
> Hello everybody,
>
>
> I currently face some troubles with NAMD2.10 and my simulation
> systems. The systems (I have three nearly equal membrane
> systems) have all run for 2.1µs in 21 chunks of 100ns (all atom
> CHARMM36, 2fs integration timestep). Everything worked perfectly fine
> until step 22. Whenever I want to start step 22 for any of my systems
> I get the following error in the NAMD output:
>
>
> TCL: Running for 50000000 steps
> ETITLE: TS BOND ANGLE DIHED IMPRP
> ELECT VDW BOUNDARY MISC KINETIC
> TOTAL TEMP POTENTIAL TOTAL3
> TEMPAVG PRESSURE GPRESSURE VOLUME
> PRESSAVG GPRESSAVG
>
> ENERGY: 2100000000 1866.2220 9802.6548 6896.3270
> 78.1218 -103131.8273 3410.1396 0.0000
> 0.0000 26331.7058 -54746.6563 301.5845
> -81078.3620 -54571.7237 301.5845 -263.9492
> -261.2014 396913.4892 -263.9492 -261.2014
>
> OPENING EXTENDED SYSTEM TRAJECTORY FILE
> WRITING EXTENDED SYSTEM TO OUTPUT FILE AT STEP -2144967296
> CLOSING EXTENDED SYSTEM TRAJECTORY FILE
> WRITING COORDINATES TO OUTPUT FILE AT STEP -2144967296
> COORDINATE DCD FILE /<path removed by the author>/ WAS NOT CREATED
> The last position output (seq=-2) takes 0.006 seconds, 1399.262 MB
> of memory in use
> WRITING VELOCITIES TO OUTPUT FILE AT STEP -2144967296
> The last velocity output (seq=-2) takes 0.004 seconds, 1400.191 MB
> of memory in use
> ====================================================
>
> WallClock: 4.380243 CPUTime: 4.380243 Memory: 1400.191406 MB
> [Partition 0][Node 0] End of program
>
> I am quite confused about this, because I changed nothing in my
> NAMD configuration files except for the file numbering of the restart
> and output files and these are fine (checked by 3 different
> people). ​For me the problem seems to be related with generation
> of the DCD file. For the cluster part, the file system of the cluster
> (IBM GPFS) should be fine because other jobs with equal
> configurations are working and there has not been any maintenance that
> could be in relation to the observed problems. In addition step 21 of
> one of the system worked properly while step 22 of the other two
> systems failed at the same time. Looks a little bit like 22 is a magic
> number?
>
>
> Furthermore, the negative step number in the output, which is not in a
> line with the run steps, is also quite mysterious for me. I hope
> anybody has a tip or a solution for me because I have checked nearly
> everything that came into my mind until now.
>
>
> Best Regards
>
>
> Alex
>
>
> *--------------------------------------------------------*
> *Alexander Götz, M.Sc.*
> Technische Universität München // Fakultät für Physik
> Lehrstuhl für Bioelektronik E.14
> Maximus-von-Imhof Forum 4 (room P059)
> 85350 Freising, Germany
> T: +49 8161 71-3540
>
> *_Please consider the environment before printing this email_*
> *_
> _*
>
>

-- 
Brian Radak
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
9700 South Cass Avenue, Bldg. 240
Argonne, IL 60439-4854
(630) 252-8643
brian.radak_at_anl.gov

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:20:45 CST