Problem running NAMD on ranger@TACC

From: shayan_at_msu.edu
Date: Tue Feb 17 2009 - 07:01:27 CST

Hello NAMD-list,

I am running NAMD on ranger_at_TACC on a system containing 2 million atoms using the script  from

~tg455591/NAMD_scripts/runbatch. It used to run problem free until recently when it started exiting unexpectedly after few steps giving errors like:

WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 244000
LDB: TIME 7267.75 LOAD: AVG 1.35357 MAX 1.40105  PROXIES: TOTAL 1838 MAXPE 13 MAXPATCH 7 None 0.478391
LDB: TIME 7267.76 LOAD: AVG 1.35357 MAX 1.38049  PROXIES: TOTAL 1838 MAXPE 13 MAXPATCH 7 Refine 0.478391
Abort signaled by rank 136: [i139-411.ranger.tacc.utexas.edu:136] Got completion with error IBV_WC_WR_FLUSH_ERR, code=5, dest rank=152

Exit code -3 signaled from i139-411.ranger.tacc.utexas.edu
Killing remote processes...MPI process terminated unexpectedly
DONE
TACC: MPI job exited with code: 1
TACC: Shutting down parallel environment.
TACC: Shutdown complete. Exiting.

At other times the error showed "Got completion with error IBV_WC_RETRY_EXC_ERR" 
or " Got completion with error IBV_WC_LOC_PROT_ERR".

I have also received the following error few times:

WRITING COORDINATES TO DCD FILE AT STEP 796000
LDB: TIME 22116.6 LOAD: AVG 1.18 MAX 1.23874  PROXIES: TOTAL 1966 MAXPE 13 MAXPATCH 7 None 0.44839
LDB: TIME 22116.6 LOAD: AVG 1.18 MAX 1.20339  PROXIES: TOTAL 1968 MAXPE 13 MAXPATCH 7 Refine 0.44839
MPI process terminated unexpectedly
Exit code -5 signaled from i141-406.ranger.tacc.utexas.edu
Killing remote processes...DONE
TACC: MPI job exited with code: 1
TACC: Shutting down parallel environment.
TACC: Shutdown complete. Exiting.

Although, I am not sure whether this is a NAMD issue, any suggestion is greatly appreciated.
I have also contacted the Ranger staff but they havn't been able to help me out till now.

Best Wishes,
Shayantani

Shayantani Mukherjee
Department of Biochemistry and Molecular Biology
Michigan State University
East Lansing, Michigan
E-mail: shayan_at_msu.edu

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:30 CST