Re: Linux HP Itanium Quadrics

From: Alessandro Cembran (cembran_at_chem.umn.edu)
Date: Tue Nov 14 2006 - 13:39:27 CST

Hi,

I just wanted to report, for anybody who's interested, that I've been
successful in compiling and testing charm++ and NAMD on the machine
described below.
The performances are only 10-15% worse than on a 16 procs dedicated
Altix Node with the same Madison 1.5GHz processors, and the scaling is
good up to 128 processors. I haven't tested more processors yet.

All the details about the installation and benchmarks are in the
attached file.

Please let me know any comment, suggestion or question.

Alessandro

Alessandro Cembran wrote:
> Hi,
>
> I am running NAMD 2.6 in parallel on this machine:
> http://mscf.emsl.pnl.gov/hardware/config_mpp2.shtml.
> It's a HP/Linux Itanium2 1.5MHz Madison with QsNetII/Elan-4 Quadrics
> interconnect.
>
> Right now I'm running the Linux-ia64 (Itanium) pre-compiled executable
> with charmrun:
> charmrun namd2 +p16 ++nodelist job.nodelist job.conf > job.out
>
> I compared the test of a 16 processors job on this HP machine with
> what I got on a 16-processors node on an Altix machine mounting the
> same Itanium 1.5GHz Madison processors. (On Altix I used the
> Linux-ia64-MPT (SGI Altix) pre-compiled binary with mpirun).
>
> While the CPU time is similar for both machines (i.e, ~0.3 sec/step),
> the Wall time is significantly different: on Altix it is the same as
> the CPU time (~0.3 sec/step), while on the HP cluster I get something
> around 0.42 sec/step. Overall, on the HP the job is ~30% slower.
> The AVG LDB on altix is ~32.5, while on HP is ~37
>
> I was wondering if anybody has some clue in this performance
> difference. Is this the best that I can get because Altix is a
> shared-memory system and the HP cluster instead has an intrinsic
> slow-down due to the communication among nodes? Or is there any way I
> can improve the performances, by tweaking some MPI variables or by
> recompiling the code with specific flags?
>
> Thanks in advance,
>
> Alessandro
>

-- 
Alessandro Cembran,PhD
Post Doctoral Associate
Mailing Address:
Univ. of Minnesota, Dept. of Chemistry
G2, 139 Smith Hall 207 Pleasant St SE
Minneapolis, MN 55455-0431
Office:
Univ. of Minnesota, Walter Library
117 Pleasant St SE, Room 473
Phone: +1 612-624-4617
E-mail: cembran_at_chem.umn.edu

############################################################
# #
# INSTALLATION OF NAMD 2.6 #
# #
# * System specs: #
# HP/Linux Itanium2 (Madison, 1.5GHz) #
# Quadrics QsNetII/Elan4 Interconnects #
# OS: Linux 2.4.20-hp4_pnnl57smp, based on #
# Red Hat Linux Advanced Servers #
# (http://mscf.emsl.pnl.gov/hardware/config_mpp2.shtml) #
# #
# * Compilers: #
# Intel compilers (8.1 and 7.1) #
# #
# * Test System: #
# ~190K atoms, NPT, Tstep=2fs, Settle=on #
# StepsPerCycle=24, PairListsPerCycle=3 #
# FullElectFrequency=3 Cutoffs=9.0/11.0/13.0 #
# #
# * Reference Job (to compare the performances): #
# Test System run on a 16-procs SGI Altix dedicated node #
# mounting Madison 1.5GHz. #
# (http://www.msi.umn.edu/altix/intro/) #
# Precompiled NAMD 2.6 binary Linux-ia64-MPT (SGI Altix) #
# Speed (Wall) ~ 0.278 s/step after ~ 50.000 steps #
# #
# * Installation dir: ~/progs #
# #
############################################################

cd ~/progs

1) Get NAMD Version 2.6 (2006-08-31) Source
-------------------------------------------
http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
and download NAMD_2.6_Source.tar.gz into ~/progs
tar -zxvf NAMD_2.6_Source.tar.gz

2) Get tcl and fftw:
--------------------
mkdir tcl
cd tcl
wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl-linux-ia64-redhat7.tar.gz
tar -zxvf tcl-linux-ia64-redhat7.tar.gz
cd ..
mkdir fftw
cd fftw
wget http://www.ks.uiuc.edu/Research/namd/libraries/fftw-linux-ia64-redhat7.tar.gz
tar -zxvf fftw-linux-ia64-redhat7.tar.gz
cd ..
> NOTE: I also tried these tcl and fftw precompiled libs:
        http://www.ks.uiuc.edu/Research/namd/libraries/fftw-linux-ia64.tar.gz
        http://www.ks.uiuc.edu/Research/namd/libraries/tcl-linux-ia64.tar.gz
        They did not work because some problems with GLIBC_2.3

3) Choice of the compiler:
--------------------------
First I compiled everything with the default Intel compiler, version 8.1.
Everything worked smoothly, but the performances were bad:
- 8 procs, after only 1800 steps (optimal LDB not reached yet):
  Speed ~ 0.86 s/step
- 16 procs, after only 4800 steps (optimal LDB not reached yet):
  Speed ~ 0.43 s/step
Notice however that on Altix, after 4800 steps, the Speed is already ~ 0.297 s/step.
Since bad performances are reported for Itanium with Intel >=8.1
(http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdOnIA64),
I decided to try to recompile everything with Intel 7.1 and the performances are
excellent. Therefore, I report here the procedure for Intel 7.1, which is almost
identical to 8.1, highlighting the differences between the two.

4) Revert Intel 8.1 to Intel 7.1
--------------------------------
Since Intel 8.1 and integer*8 are the default on this System, we can revert that
to Intel 7.1 and integer*4 with
module swap pnnl_env pnnl_env/old
Now the default C++ and Fortran compilers are
/opt/intel/compiler70/ia64/bin/ecc
/opt/intel/compiler70/ia64/bin/ecpc
/opt/intel/compiler70/ia64/bin/efc
Since the "elan-linux-ia64" platform specific build of charm++ will be looking for
icc, ifort and icpc, we create the links in our ~/bin:
cd ~/bin
ln -s /opt/intel/compiler70/ia64/bin/ecc ./icc
ln -s /opt/intel/compiler70/ia64/bin/ecpc ./icpc
ln -s /opt/intel/compiler70/ia64/bin/efc ./ifort
cd ~/progs/

5) Compile and test charm++
---------------------------
cd NAMD_2.6_Source
tar -xvf charm-5.9.tar
cd charm-5.9
./build charm++ elan-linux-ia64 icc ifort -O -DCMK_OPTIMIZE=1 > build.log 2>&1
cd elan-linux-ia64-ifort-icc/tests/charm++/megatest/
make pgm
Ask for the processors to the queue system:
bsub -P ACCOUNT# -n 2 -W 00:10 -Is csh
...wait for the processors...
prun -n 2 ./pgm > pgm.out
Verify that the job ran with no problems and then exit the queue.
cd ~/progs/NAMD_2.6_Source

6) Compile NAMD
---------------
In "Make.charm", change the CHARMBASE line into the following:
CHARMBASE = /home/cembran/progs/NAMD_2.6_Source/charm-5.9
In "arch/Linux-ia64-icc.arch", change the CHARMARCH line into the following:
CHARMARCH = elan-linux-ia64-ifort-icc
In "arch/Linux-ia64.tcl", change the TCLDIR line into the following:
TCLDIR=/home/cembran/progs/tcl/linux-ia64-redhat7
In "arch/Linux-ia64.fftw", change the FFTDIR line into the following:
FFTDIR=/home/cembran/progs/fftw/linux-ia64-redhat7
Then, from ~/progs/NAMD_2.6_Source, execute
./config tcl fftw Linux-ia64-icc
cd Linux-ia64-icc
make >make.log 2>&1

7) Notes for running NAMD
-------------------------
In the submission script9 (#!/bin/csh), use:
setenv LIBELAN_ALLOC_SIZE 419430400
If I do not use that for my Test System, it crashes right away, as indicated by:
http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdOnElan
As far as the tests that I've run up to now, a Margin of 1.0 works.

8) Benchmarks (all Wall)
------------------------
Scale is referred to the 16 procs Altix Test:
Scale = 100 * (s/step_Altix x 16) / (s/step_HP x procs)
Speedup = procs * Scale / 100
(Note, a job with 40 procs on Altix, Madison 1.6 GHz, has a Speed of ~0.1215 s/step, Scale=91.5, Speedup=36.6)

        System Speed (s/step) Speed (days/ns) Scale (%) Speedup

- Reference (Altix 16 procs): 0.278 1.61 100 16

- HP/Linux 8 procs: 0.660 3.82 84 6.7
- HP/Linux 16 procs: 0.317 1.83 88 14
- HP/Linux 64 procs: 0.0805 0.466 86 55
- HP/Linux 128 procs: 0.0420 0.243 83 106
- HP/Linux 256 procs:
- HP/Linux 512 procs:

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:42:49 CST