namd leaves zombie processes on nodes?

From: JC Gumbart (gumbart_at_physics.gatech.edu)
Date: Mon May 23 2016 - 10:01:45 CDT

Hi all,

We’ve run into a new issue on our cluster here. We find that jobs killed by the scheduler (torque) often don’t die (although sometimes they do!), but instead keep running. They are producing output as normal, but are not seen by the scheduler anymore. What our IT people can’t figure out is why it just started happening after a recent maintenance period - they said they didn’t change anything that should have affected this.

We’re running them using the command "mpirun -np $NP -env MV2_ENABLE_AFFINITY=0 namd2 $CONFFILE &> $LOGFILE”

Any suggestions?

Thanks!
JC

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:22:11 CST