namd: 64 p job runs; 128 p job fails why?

From: Sangamesh B (forum.san_at_gmail.com)
Date: Tue Aug 12 2008 - 12:27:18 CDT

Hi all,

      I've installed Namd 2.6 on Rocks 4.3, 33 node cluster ( Dual
processor, Quad core Intel Xeon: Total 264 cores ).

NAMD is built with MVAPICH2-1.0.3 and Intel 10 compilers.

The scaling is good from: 8 to 16, 16 to 32, 32 to 64. But when 128 core job
is submitted, the job fails.

#mpirun -machinefile ./machfile -np 128
/data/apps/namd26_mvapich2/Linux-mvapich2/namd2 ./apoa1.namd | tee
namd_128cores
Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
rank 65 in job 4 master_host_name_50238 caused collective abort of all
ranks
  exit status of rank 65: killed by signal 9

The input file is the standard benchmark file which is available on the NAMD
website, i.e. apoa1.tar.gz.

According to the benchmark results given on the site, say that it
runs/scales upto 256 processors.

But in my case, its even not running for 128 cores.

But other applications such as Amber 9 and Gromacs work for upto 256
processors. Means there is no problem with mvapich2.

So, what went wrong?

Thanks,
Sangamesh

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:45 CST