From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Wed Nov 09 2011 - 11:00:04 CST
On Wed, Nov 9, 2011 at 11:07 AM, PAUL NEWMAN <paulclizana_at_gmail.com> wrote:
> Dear Namd Users,
>
> Recently I am facing a problem when doing SMD with free energy calculations
> implemented in Namd. After running several hours the job end up with the
> error below. I restart the job and it runs ok the same error appears again
> after several hours of run. This simulation is run on KRAKEN. Does anyone
> has experienced this error? any suggestion will be highly appreciate it.
>
> LDB: TIME 52693.6 LOAD: AVG 0.705515 MAX 0.830861 PROXIES: TOTAL 7258 MAXPE
> 7 MAXPATCH 8 None 0.37042
> LDB: TIME 52693.6 LOAD: AVG 0.705515 MAX 0.830861 PROXIES: TOTAL 8369 MAXPE
> 11 MAXPATCH 13 RefineTorusLB 0.37042
> LDB: ============== END OF LOAD BALANCING =============== 52693.6
>
> SMD 11504200 49.2737 68.7159 249.946 0 0 46.7116
> MOMENTUM: 11504200 P: -964.237 105.36 350.516 L: -15268.9 -7461.61 151699
> ENERGY: 11504200 4992.2957 13176.3849 7768.3395
> 0.0000 -1760450.2512 205942.4450 0.0000 0.0000
> 308764.7515 -1219806.0345 317.5730 -1528570.7861
> -1218952.0103 318.0227 57.1181 88.8474
> 4928733.5188 127.8666 127.8696
>
> Assertion failed in file
> /ptmp/ulib/mpt/nightly/5.0/060910/mpich2/src/mpid/cray/src/adi/req.c at line
> 398: ((((typeptr)))->ref_count) >= 0
> aborting job:
> (null)
> [NID 12206] 2011-11-09 02:58:32 Apid 7667477: initiated application
> termination
this looks like the MPI library got spooked somehow.
i would suggest to first contact user support staff and
have them track down what assertion this is and what
it indicates so it could be tracked down.
cheers,
axel.
> Application 7667477 exit codes: 1
> Application 7667477 exit signals: Killed
> Application 7667477 resources: utime 18897942, stime 538534
>
> --
> Cheers,
>
> Paul
>
>
>
-- Dr. Axel Kohlmeyer akohlmey_at_gmail.com http://goo.gl/1wk0 College of Science and Technology Temple University, Philadelphia PA, USA.
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:54 CST