From: JC Gumbart (gumbart_at_physics.gatech.edu)
Date: Mon May 23 2016 - 10:01:45 CDT
We’ve run into a new issue on our cluster here. We find that jobs killed by the scheduler (torque) often don’t die (although sometimes they do!), but instead keep running. They are producing output as normal, but are not seen by the scheduler anymore. What our IT people can’t figure out is why it just started happening after a recent maintenance period - they said they didn’t change anything that should have affected this.
We’re running them using the command "mpirun -np $NP -env MV2_ENABLE_AFFINITY=0 namd2 $CONFFILE &> $LOGFILE”
This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:20:29 CST