NAMD2.7 scaling problems

From: DimitryASuplatov (genesup_at_gmail.com)
Date: Wed Oct 28 2009 - 04:06:13 CDT

Hello,

I am trying to run a 39728 atoms system using NAMD 2.7b1.

It runs without a problem on 1,4,8,16,32,64 CPUs, but when I try to
launch it on 128 and 256 CPUs it throws an error

MPI process terminated unexpectedly
Exit code -5 signaled from node-04-05
Killing remote processes...Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
DONE
 
which is most likely related to mpirun then to namd executable.

For the same situation NAMD 2.6 runs perfectly on 256 CPUs.

NAMD2.7 was compiled as suggested in the note.txt file and using tips
from namdwiki related to mpich and infiniband.

Does it happen because my system it too small to be scaled on 256? But
why then namd 2.6 works fine, I thought that 2.7 has better scalability?
Could it happen because of incorrect installation? What could be the
problem?

Thank you very much for your time.

SDA

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:22:29 CST