From: René Hafner TUK (hamburge_at_physik.uni-kl.de)
Date: Wed Mar 24 2021 - 09:57:26 CDT
I use NAMD 2.14.
Though when using 2 replicas forcing a crash they both had an
error/end message in the logfile
while for 4 I *had **at least one* replica logfile that has no
error/end message written at the end.
Therefore I guess there is one zombie still left.
I wanted to try this now with "top -b > file.txt" in my submission
script after the line "charmrun namd2..." but need to wait until a
proper node becomes available again.
On 3/24/2021 3:40 PM, Vermaas, Josh wrote:
> Hi Rene,
> Is this 2.13 or 2.14? I seem to recall that 2.13 (or maybe it was
> 2.12?) **didn’t** kill the other replicas when one replica received a
> termination signal, and so you might legitimately be running into an
> issue where there are zombie namd processes roaming around on slurm.
> I typically do not do anything special to clean up after a job
> crashes, since it is supposed to take itself down cleanly.
> *From: *<owner-namd-l_at_ks.uiuc.edu> on behalf of René Hafner TUK
> *Reply-To: *"namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>, René Hafner TUK
> *Date: *Wednesday, March 24, 2021 at 9:22 AM
> *To: *"namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>
> *Subject: *namd-l: how to properly end NAMD replica job on slurm batch
> Dear NAMD Maintainers,
> I work on cluster with SLURM batch system.
> I am currently testing replica simulations and
> experience the issue that when the replica simulation ends
> with an error or I cancel the job via scancel (since I am only testing...)
> the node gets "closed" with the error that "*kill task failed*".
> (it then takes intervention by cluster admins to reopen/reboot the
> node but thats local policy I guess)
> Have you ever experienced this?
> Is there a way to savely end the replica runs even when an error occurs?
> Do I have to collect processIDs to kill the replica runs myself before
> the submission script (containing the call to charmrun namd2... ) ends ?
> Kind regards
> Dipl.-Phys. René Hafner
> TU Kaiserslautern
-- -- Dipl.-Phys. René Hafner TU Kaiserslautern Germany
This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST