From: Alessandro Cembran (
Date: Tue Nov 14 2006 - 13:39:27 CST
I just wanted to report, for anybody who's interested, that I've been
successful in compiling and testing charm++ and NAMD on the machine
described below.
The performances are only 10-15% worse than on a 16 procs dedicated
Altix Node with the same Madison 1.5GHz processors, and the scaling is
good up to 128 processors. I haven't tested more processors yet.
All the details about the installation and benchmarks are in the
attached file.
Please let me know any comment, suggestion or question.
Alessandro Cembran wrote:
> Hi,
> I am running NAMD 2.6 in parallel on this machine:
> It's a HP/Linux Itanium2 1.5MHz Madison with QsNetII/Elan-4 Quadrics
> interconnect.
> Right now I'm running the Linux-ia64 (Itanium) pre-compiled executable
> with charmrun:
> charmrun namd2 +p16 ++nodelist job.nodelist job.conf > job.out
> I compared the test of a 16 processors job on this HP machine with
> what I got on a 16-processors node on an Altix machine mounting the
> same Itanium 1.5GHz Madison processors. (On Altix I used the
> Linux-ia64-MPT (SGI Altix) pre-compiled binary with mpirun).
> While the CPU time is similar for both machines (i.e, ~0.3 sec/step),
> the Wall time is significantly different: on Altix it is the same as
> the CPU time (~0.3 sec/step), while on the HP cluster I get something
> around 0.42 sec/step. Overall, on the HP the job is ~30% slower.
> The AVG LDB on altix is ~32.5, while on HP is ~37
> I was wondering if anybody has some clue in this performance
> difference. Is this the best that I can get because Altix is a
> shared-memory system and the HP cluster instead has an intrinsic
> slow-down due to the communication among nodes? Or is there any way I
> can improve the performances, by tweaking some MPI variables or by
> recompiling the code with specific flags?
> Thanks in advance,
> Alessandro
-- Alessandro Cembran,PhD Post Doctoral Associate Mailing Address: Univ. of Minnesota, Dept. of Chemistry G2, 139 Smith Hall 207 Pleasant St SE Minneapolis, MN 55455-0431 Office: Univ. of Minnesota, Walter Library 117 Pleasant St SE, Room 473 Phone: +1 612-624-4617 E-mail:
# #
# #
# * System specs: #
# HP/Linux Itanium2 (Madison, 1.5GHz) #
# Quadrics QsNetII/Elan4 Interconnects #
# OS: Linux 2.4.20-hp4_pnnl57smp, based on #
# Red Hat Linux Advanced Servers #
# ( #
# #
# * Compilers: #
# Intel compilers (8.1 and 7.1) #
# #
# * Test System: #
# ~190K atoms, NPT, Tstep=2fs, Settle=on #
# StepsPerCycle=24, PairListsPerCycle=3 #
# FullElectFrequency=3 Cutoffs=9.0/11.0/13.0 #
# #
# * Reference Job (to compare the performances): #
# Test System run on a 16-procs SGI Altix dedicated node #
# mounting Madison 1.5GHz. #
# ( #
# Precompiled NAMD 2.6 binary Linux-ia64-MPT (SGI Altix) #
# Speed (Wall) ~ 0.278 s/step after ~ 50.000 steps #
# #
# * Installation dir: ~/progs #
# #
cd ~/progs
1) Get NAMD Version 2.6 (2006-08-31) Source
and download NAMD_2.6_Source.tar.gz into ~/progs
tar -zxvf NAMD_2.6_Source.tar.gz
2) Get tcl and fftw:
mkdir tcl
cd tcl
tar -zxvf tcl-linux-ia64-redhat7.tar.gz
cd ..
mkdir fftw
cd fftw
tar -zxvf fftw-linux-ia64-redhat7.tar.gz
cd ..
> NOTE: I also tried these tcl and fftw precompiled libs:
They did not work because some problems with GLIBC_2.3
3) Choice of the compiler:
First I compiled everything with the default Intel compiler, version 8.1.
Everything worked smoothly, but the performances were bad:
- 8 procs, after only 1800 steps (optimal LDB not reached yet):
Speed ~ 0.86 s/step
- 16 procs, after only 4800 steps (optimal LDB not reached yet):
Speed ~ 0.43 s/step
Notice however that on Altix, after 4800 steps, the Speed is already ~ 0.297 s/step.
Since bad performances are reported for Itanium with Intel >=8.1
I decided to try to recompile everything with Intel 7.1 and the performances are
excellent. Therefore, I report here the procedure for Intel 7.1, which is almost
identical to 8.1, highlighting the differences between the two.
4) Revert Intel 8.1 to Intel 7.1
Since Intel 8.1 and integer*8 are the default on this System, we can revert that
to Intel 7.1 and integer*4 with
module swap pnnl_env pnnl_env/old
Now the default C++ and Fortran compilers are
Since the "elan-linux-ia64" platform specific build of charm++ will be looking for
icc, ifort and icpc, we create the links in our ~/bin:
cd ~/bin
ln -s /opt/intel/compiler70/ia64/bin/ecc ./icc
ln -s /opt/intel/compiler70/ia64/bin/ecpc ./icpc
ln -s /opt/intel/compiler70/ia64/bin/efc ./ifort
cd ~/progs/
5) Compile and test charm++
cd NAMD_2.6_Source
tar -xvf charm-5.9.tar
cd charm-5.9
./build charm++ elan-linux-ia64 icc ifort -O -DCMK_OPTIMIZE=1 > build.log 2>&1
cd elan-linux-ia64-ifort-icc/tests/charm++/megatest/
make pgm
Ask for the processors to the queue system:
bsub -P ACCOUNT# -n 2 -W 00:10 -Is csh
...wait for the processors...
prun -n 2 ./pgm > pgm.out
Verify that the job ran with no problems and then exit the queue.
cd ~/progs/NAMD_2.6_Source
6) Compile NAMD
In "Make.charm", change the CHARMBASE line into the following:
CHARMBASE = /home/cembran/progs/NAMD_2.6_Source/charm-5.9
In "arch/Linux-ia64-icc.arch", change the CHARMARCH line into the following:
CHARMARCH = elan-linux-ia64-ifort-icc
In "arch/Linux-ia64.tcl", change the TCLDIR line into the following:
In "arch/Linux-ia64.fftw", change the FFTDIR line into the following:
Then, from ~/progs/NAMD_2.6_Source, execute
./config tcl fftw Linux-ia64-icc
cd Linux-ia64-icc
make >make.log 2>&1
7) Notes for running NAMD
In the submission script9 (#!/bin/csh), use:
setenv LIBELAN_ALLOC_SIZE 419430400
If I do not use that for my Test System, it crashes right away, as indicated by:
As far as the tests that I've run up to now, a Margin of 1.0 works.
8) Benchmarks (all Wall)
Scale is referred to the 16 procs Altix Test:
Scale = 100 * (s/step_Altix x 16) / (s/step_HP x procs)
Speedup = procs * Scale / 100
(Note, a job with 40 procs on Altix, Madison 1.6 GHz, has a Speed of ~0.1215 s/step, Scale=91.5, Speedup=36.6)
System Speed (s/step) Speed (days/ns) Scale (%) Speedup
- Reference (Altix 16 procs): 0.278 1.61 100 16
- HP/Linux 8 procs: 0.660 3.82 84 6.7
- HP/Linux 16 procs: 0.317 1.83 88 14
- HP/Linux 64 procs: 0.0805 0.466 86 55
- HP/Linux 128 procs: 0.0420 0.243 83 106
- HP/Linux 256 procs:
- HP/Linux 512 procs:
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:44:10 CST