how to properly end NAMD replica job on slurm batch system

From: René Hafner TUK (hamburge_at_physik.uni-kl.de)
Date: Wed Mar 24 2021 - 08:13:04 CDT

Dear NAMD Maintainers,

I work on cluster with SLURM batch system.

  I am currently testing replica simulations and

         experience the issue that when the replica simulation ends with
an error or I cancel the job via scancel (since I am only testing...)

     the node gets "closed" with the error that "*kill task failed*".
(it then takes intervention by cluster admins to reopen/reboot the node
but thats local policy I guess)

Have you ever experienced this?

Is there a way to savely end the replica runs even when an error occurs?

Do I have to collect processIDs to kill the replica runs myself before
the submission script (containing the call to charmrun namd2... ) ends ?

Kind regards
René

-- 
--
Dipl.-Phys. René Hafner
TU Kaiserslautern
Germany

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST