Re: feature request: external initiated clean exit

From: Peter Freddolino (petefred_at_umich.edu)
Date: Fri Aug 21 2015 - 20:04:42 CDT

Hi Daniel,
I understand. Have you tried writing the restarts more frequently and benchmarked it? If your simulations are running so slowly, the performance hit is likely to be negligible, even for a big simulation.
Best,
Peter

> On Aug 21, 2015, at 8:21 PM, Daniel Möller <daniel.moeller3_at_uni-greifswald.de> wrote:
>
> Hi,
> thank you for your answer. I'm already writing a restart every 10.000 steps, but some of our simulation-systems are really big for our small HPC-system(s), so I don't get many restarts per day. And if we've to stop a machine (caused be things out of our control and time management), I don't want to lose this progress. (This large simulations cost time to write a restart, which I need for the simulation) That was the reason, why get the idea to ask this here.
> (Yes, we could upgrade our HPC-system (and we try to do so), but it takes time and a need of enough money)
>
>
> Sincerely
>
> Daniel Möller
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Peter Freddolino
> Gesendet: Samstag, 22. August 2015 00:10
> An: Axel Kohlmeyer
> Cc: NAMD list; Daniel Möller
> Betreff: Re: namd-l: feature request: external initiated clean exit
>
>
>> On Aug 21, 2015, at 6:03 PM, Axel Kohlmeyer <akohlmey_at_gmail.com> wrote:
>>
>>
>>
>> On Fri, Aug 21, 2015 at 5:44 PM, Peter Freddolino <petefred_at_umich.edu> wrote:
>> Dear Daniel,
>> Have you looked into namd’s use of restart files? With appropriate configuration they regularly produce files that can be used for clean restarts.
>> Why would you have an ‘exit’ file and not just send a termination signal to the namd process?
>>
>> ​hi peter,
>>
>> FYI, restart files in plane-wave DFT calculations can be huge compared to classical MD, and thus writing them frequently has much more impact on performance. also, due to the high algorithmic load of such calculations, you usually run with many more CPU cores. thus the loss of computer time when a job crashes without a usable last restart can be significant. for very large calculations across a some 1000s of CPU cores, it can quickly take of the order of 30 minutes to write a restart. that leads to a different "restart culture"... ;-)
>>
>> axel.
>
> Hi Axel,
> Thanks for the lesson — good to know that the technical issues at play are different.
>
> So especially in light of that, Daniel, I should say that writing restart files for classical MD is typically *not* performance intensive as long as you don’t do it too often — I usually see every 1000-10000 steps; sometimes even more frequent. You can set this with the restartfreq parameter (and related entries) in NAMD.
> Best,
> Peter
>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:16 CST