From: Sangamesh B (forum.san_at_gmail.com)
Date: Tue Aug 12 2008 - 10:14:03 CDT
I've installed Namd 2.6 on Rocks 4.3, 33 node cluster ( Dual
processor, Quad core Intel Xeon: Total 264 cores ).
NAMD is built with MVAPICH2-1.0.3 and Intel 10 compilers.
The scaling is good from: 8 to 16, 16 to 32, 32 to 64. But when 128 core job
is submitted, the job fails.
#mpirun -machinefile ./machfile -np 128
/data/apps/namd26_mvapich2/Linux-mvapich2/namd2 ./apoa1.namd | tee
Charm++> Running on MPI version: 2.0 multi-thread support: 0/0
rank 65 in job 4 master_host_name_50238 caused collective abort of all
exit status of rank 65: killed by signal 9
The input file is the standard benchmark file which is available on the NAMD
website, i.e. apoa1.tar.gz.
According to the benchmark results given on the site, say that it
runs/scales upto 256 processors.
But in my case, its even not running for 128 cores.
But other applications such as Amber 9 and Gromacs work for upto 256
processors. Means there is no problem with mvapich2.
So, what went wrong?
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:45 CST