NAMD 2.7b2 versy slow on multiple nodes

From: Laurent Chaloin (laurent.chaloin_at_cpbs.cnrs.fr)
Date: Tue May 25 2010 - 16:30:51 CDT

Dear NAMD-users and developers,

I have installed NAMD_CVS from source and compiled either with g++ or
mpicc on IBM blade computer (8 nodes + one injector).
Charm-6.1.3 has been compiled either qith g++ or mpicc and the
corresponding NAMD binary as well according the release notes
recommendations.
For instance with mpicc as compiler, for Charm-6.1.3 :
./build charm++ mpi-linux-x86_64 mpicxx smp -j16 -O2 -DCMK_OPTIMIZE
The tests were successful (Hello)
For NAMD:
First, in the arch directory, I created the base/arch/fftw/tcl files
NAMD_ARCH = Linux-x86_64-MPI-mpicc
CHARMARCH = mpi-linux-x86_64-smp-mpicxx
CXX = mpiCC
CXXOPTS = -O3 -m64 -fexpensive-optimizations -ffast-math
CC = mpicc
COPTS = -O3 -m64 -fexpensive-optimizations -ffast-math
./config Linux-x86_64-MPI-mpicc --charm-arch mpi-linux-x86_64-smp-mpicxx
then make in the newly created directory
ok binary built with few warning (no error) for src/flipbinpdb.c:62:
warning: incompatible implicit declaration of built-in function
Then I copied the whole NAMD directory to each node (I did not compil on
each node)

NOW the problem is when I run MD run for testing the performance (60000
atoms with PBC and PME, 1 fs timestep),and I noticed that the best speed is
obtained when running on only two nodes with 16 cpu (each node is formed by
2-quadcores) with 1 day / ns (seems ok) and as soon as I increase the
number of nodes or cpu the speed is slowing down dramatically: 3 day/ns
with 32 cpu (4 nodes used) and even more with 5,6 or 8 nodes (up to 10
days/ns).
Does it mean that I made a mistake in the compilation, did I forget an
important option ?
Or does it mean that my network connection (10-gigabit) is not fast enough
for MPI ?

Then I tested the g++ built namd binary and charmrun, same problem, fast
with 2 nodes and when I used more nodes it was very slow.

I have then tried to compil NAMD-2.7b2 without smp (according the message
when you start namd "Running on MPI version: 2.1 multi-thread support:
MPI_THREAD_FUNNELED (max supported: MPI_THREAD_SINGLE), then why SMP is
proposed in the smart-build script ?) but the benchmark test was as bad as
before (nevertheless it was slightly better without smp).
I also tried the new version version of Charm++-6.2.0 but it is not
working.
Any idea or suggestion, or big recommendation will be more than
appreciated before I jump on the machine ! (after looking at the benchmark
of NAMD website it's more or less the opposite, e.g. more node used, faster
it is!
Thanks
Laurent

-- 
______________________________________________
Dr. CHALOIN Laurent
Equipe de biophysique et bioinformatique 
Centre d'études d'agents Pathogènes et Biotechnologies pour la Santé
CNRS - UMR 5236 - Université Montpellier 1 et 2
Institut de Biologie, 4 bd Henri IV - CS 69033
34965 Montpellier - cedex 2
Tel: 04 67 60 02 31
Fax: 04 67 60 44 20
-- 
passerelle antivirus du campus CNRS de Montpellier
--

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:55:49 CST