Problem with running on multiple nodes

From: Karteek Bejagam (karteek4_at_vt.edu)
Date: Wed May 31 2017 - 11:33:48 CDT

Hello NAMD users,

I have a system with 100000 atoms.
It runs fine on a single node with 24 cores.
However, on multiple nodes, it fails with the following error.

##################
LDB: ============= START OF LOAD BALANCING ============== 6.80246
LDB: Largest compute 1637 load 0.031749 is 2.9% of average load 1.083487
LDB: Average compute 0.001368 is 0.1% of average load 1.083487
LDB: TIME 6.81261 LOAD: AVG 1.08349 MAX 1.4512 PROXIES: TOTAL 2496 MAXPE
40 MAXPATCH 5 None MEM: 406.734 MB
LDB: TIME 6.83671 LOAD: AVG 1.08349 MAX 1.24366 PROXIES: TOTAL 2496 MAXPE
40 MAXPATCH 5 TorusLB MEM: 406.934 MB
--------------------------------------------------------------------------
mpirun noticed that process rank 8 with PID 142379 on node nr060 exited on
signal 11 (Segmentation fault).

###################
Here is a part of script file.

module load gcc/4.7.2 openmpi/1.8.5 namd/2.10
charmrun namd2 +p$PBS_NP equi.namd > output.log

Thanks in advance,
Karteek

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:20 CST