AW: 16 total processes killed (some possibly by mpirun during cleanup)

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Nov 03 2015 - 00:25:15 CST

Assuming you are redirecting stdout and stderr to a log file, similar to:

 

mpirun […] namd2 job.in 2> job.e > job.out

 

You should have a look at the and of those files to find the reason why namd stopped. The message from mpirun about killed processes doesn’t really point out anything, as it simply informs you that the job has been cancelled.

 

Another reason might be a walltime limit on the cluster you are using.

 

Norman Geist

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Shalton Evans
Gesendet: Montag, 2. November 2015 21:34
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: 16 total processes killed (some possibly by mpirun during cleanup)

 

Good Day All,

I am attempting a Monte Carlo approach to a dynamics run. I am executing 20 of the same script and required files all in different directories. But, it seems as though at some point one by one, I get somewhat of an error message. It says "16 total processes killed (some possibly by mpirun during cleanup." The dynamics runs are failing after days of computer time for no other reason I can think of except that they are running for too long.

Is there anyone that knows why this is happening? Help would be appreciated.

-Shalton

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:22:12 CST