NAMD 2.7b1 running & performance issues

From: Marius Micluta (marius_at_biochim.ro)
Date: Mon May 25 2009 - 06:13:07 CDT

Dear NAMD experts,

I installed NAMD 2.7b1 on a small Bull HPC cluster (four compute nodes, each
with two quad-core Xeon_at_2.5 GHz CPUs, 8 GB RAM and a management node with two
quad-core Xeon_at_2.0 GHz CPUs, 8 GB RAM, all running Red Hat Enterprise Linux 5
and interconnected by a Gigabit Ethernet network) and tested it with the ApoA1
benchmark. Using the precompiled Linux-x86_64-TCP binaries, the performance
seemed rather modest: about 3.5 days/ns running on all the 32 cores available
on the computing nodes, as compared to 11 days/ns on an old four-node commodity
PC cluster, each with a single Pentium 4_at_2.66 GHz CPU. Therefore, I compiled an
SMP and an MPI version with the Intel compilers and ran the benchmark again.
The performance was somewhat better with the MPI version (2.5 days/ns), but the
real improvement came with the SMP version: 18 hours/ns.

Unfortunately, the SMP version fails to run in some instances. Using the same
command line, the simulation runs fine sometimes, but in many cases charmrun
freezes after displaying "cpu topology info is being gathered!". I can see the
namd2 process on all the four compute nodes, but while on two or three nodes
the CPU load approaches 100% on all the 8 cores, on the others the CPU load is
near zero. Moreover, when killing the charmrun process on the management node,
only the namd2 processes on the nodes where the threads specified by the ++ppn
parameter seem to have been started get killed.

I also tried to run charmdebug, but it fails. Launching from the command line
the command configured by charmdebug (charmrun +pN /usr/local/NAMD/namd2
apoa1.namd ++server +cpd +DebugDisplay localhost:10.0), I get a strange error
at the same point where the run froze:

Charm++> synchronizing isomalloc memory region...
CPD: Frozen processor N+1
[0] consolidated Isomalloc memory region: 0x2aaaab59b000 - 0x7a351707da18
(83404474 megs)
Charm++> cpu topology info is being gathered!
CPD: Frozen processor N
CPD: Signal received on processor N: 11
CPD: Frozen processor N
------------- Processor N Exiting: Caught Signal ------------
Signal: segmentation violation,

no matter what value between 1 and 32 I choose for N. I tried to specify the
.nodelist file with the ++nodelist parameter, as well as other command line
options, but with no effect.

Could this be a compiler optimization issue? The MPI version compiles and runs
fine with the -O2 optimization flag, while the SMP version compiled only with
-O1. The documentation states that the TCP and SMP versions are not affected by
this bug and that the +netpoll parameter could be used to circumvent the bug. I
also tried this parameter, but with no success. An strace reveals that all the
involved processes (charmrun and namd2) are stuck in an endless poll which
times out. The compiler version is 10.1.011, the same used to build the
hardware manufacturer's MPI libraries.

I would be very grateful for any help or suggestion related to the
above-mentioned problems.

Kind regards,

Marius Micluta
Institute of Biochemistry of the Romanian Academy

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:51 CST