NAMD 2.7b2 versy slow on multiple nodes

From: Laurent Chaloin (
Date: Tue May 25 2010 - 16:30:51 CDT

Dear NAMD-users and developers,

I have installed NAMD_CVS from source and compiled either with g++ or
mpicc on IBM blade computer (8 nodes + one injector).
Charm-6.1.3 has been compiled either qith g++ or mpicc and the
corresponding NAMD binary as well according the release notes
For instance with mpicc as compiler, for Charm-6.1.3 :
./build charm++ mpi-linux-x86_64 mpicxx smp -j16 -O2 -DCMK_OPTIMIZE
The tests were successful (Hello)
First, in the arch directory, I created the base/arch/fftw/tcl files
NAMD_ARCH = Linux-x86_64-MPI-mpicc
CHARMARCH = mpi-linux-x86_64-smp-mpicxx
CXX = mpiCC
CXXOPTS = -O3 -m64 -fexpensive-optimizations -ffast-math
CC = mpicc
COPTS = -O3 -m64 -fexpensive-optimizations -ffast-math
./config Linux-x86_64-MPI-mpicc --charm-arch mpi-linux-x86_64-smp-mpicxx
then make in the newly created directory
ok binary built with few warning (no error) for src/flipbinpdb.c:62:
warning: incompatible implicit declaration of built-in function
Then I copied the whole NAMD directory to each node (I did not compil on
each node)

NOW the problem is when I run MD run for testing the performance (60000
atoms with PBC and PME, 1 fs timestep),and I noticed that the best speed is
obtained when running on only two nodes with 16 cpu (each node is formed by
2-quadcores) with 1 day / ns (seems ok) and as soon as I increase the
number of nodes or cpu the speed is slowing down dramatically: 3 day/ns
with 32 cpu (4 nodes used) and even more with 5,6 or 8 nodes (up to 10
Does it mean that I made a mistake in the compilation, did I forget an
important option ?
Or does it mean that my network connection (10-gigabit) is not fast enough
for MPI ?

Then I tested the g++ built namd binary and charmrun, same problem, fast
with 2 nodes and when I used more nodes it was very slow.

I have then tried to compil NAMD-2.7b2 without smp (according the message
when you start namd "Running on MPI version: 2.1 multi-thread support:
MPI_THREAD_FUNNELED (max supported: MPI_THREAD_SINGLE), then why SMP is
proposed in the smart-build script ?) but the benchmark test was as bad as
before (nevertheless it was slightly better without smp).
I also tried the new version version of Charm++-6.2.0 but it is not
Any idea or suggestion, or big recommendation will be more than
appreciated before I jump on the machine ! (after looking at the benchmark
of NAMD website it's more or less the opposite, e.g. more node used, faster
it is!

Dr. CHALOIN Laurent
Equipe de biophysique et bioinformatique 
Centre d'études d'agents Pathogènes et Biotechnologies pour la Santé
CNRS - UMR 5236 - Université Montpellier 1 et 2
Institut de Biologie, 4 bd Henri IV - CS 69033
34965 Montpellier - cedex 2
Tel: 04 67 60 02 31
Fax: 04 67 60 44 20
passerelle antivirus du campus CNRS de Montpellier

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:54:10 CST