Re: Will MPICH2 work for NAMD? and the best way to build/compile NAMD to exploit my hardware resource?

From: Bjoern Olausson (namdlist_at_googlemail.com)
Date: Fri Nov 13 2009 - 03:39:32 CST

On Friday 13 November 2009 07:47:49 you wrote:
> Hi Bjoern and NAMD users,
>
> I installed Intel C Compiler and got it built and compiled. I hope the
> procedures are right. I did the following: ./build charm++
> multicore-linux64 icc -j8 -O2
> ./config Linux-x86_64-icc --charm-arch multicore-linux64-icc
> path/.../NAMD_CVS_Source/Linux-x86_64-icc/make
>
> when I ran it, I did
> path/namd2 +setcpuaffinity +pN namd2.conf
> (N is the number of cores; with/without +setcpuaffinity seem identical in
> speed according to our 42000+ atoms NVT ensemble)
>
Why not use the apoa1 benchmark so we can compare our experience and results
and to exclude configuration differences

> Just as a reminder that I have two quad-core Nehalem. Because each core is
> hyper-threaded I can see 16 CPU in my system monitor and also I can fully
> use them for a NAMD run. The following two setup (when using 8 cores; N
> =8) get the same speed: 1) NAMD_CVS_Linux-x86_64 executables downloaded
> from NAMD website and then it as /path/NAMD_CVS_Linux-x86_64/charmrun
> /path/NAMD_CVS_Linux-x86_64/namd2 ++local +idlepoll +pN ./heat1.conf >
> log 2) locally compiled version as described in the first paragraph and
> run it as (per Bjoem's instructions)
> /path/NAMD_CVS_Source/Linux-x86_64-icc/namd2 +pN ./heat1.conf > log
>
> However, when I use 16 cores (N=16)
> the setup 2) outperforms setup 1) by 8%; not able to justify the reason, I
> appreciate any comment on this. Another observation is that, for setup 2),
> when N goes from 8 to 16, the speed increases only by 14%.
>
I have no exact idea why 2 outperforms 1. I am curios too (Please comment
someone on that).
But for the decrease in performance gain for >8Cores... the HT Cores are no
real cores, the CPU just processes one more thread per Core, "more ore less
parallel" so the decrease in performance gain is normal and was anticipated.

I tried the same and compared a Intel i7 with HT to AMD Istanbul (Hexacore)
and found the same results, see here:
http://olausson.de/component/content/article/6-Blog/59-intel-core-i7-920-vs-
amd-istanbul-2427

And here you can see my tests for the different flavors of NAMD:
http://olausson.de/component/content/article/6-Blog/57-benchmarking-different-
flavours-of-namd27b1

> I have two quad-core Nehalem CPUs and 3 Tesla GPU cards in one machine.
> so far 3 cores/threads + 3 GPUs gave the best performance --> about 60 %
> faster than setup 2) with N=8 However, by doing this, the remaining 13
> cores/threads would just sit idle.
>
> How can we best utilize 16 cores/threads + 3 GPUs machine to run a SINGLE
> NAMD job ? Any comment will help. Thank you.
>
I don't have any experience with CUDA, but there was a long thread about your
issue here:
http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/10779.html

Only non-bonded interactions are (currently) IMHO accelerated via GPU, so
there is left some stuff for the CPUs and I would guess that nCPU > nGPU
shifts depending on your system size.

Greetings
Bjoern

-- 
Bjoern Olausson
Martin-Luther-Universität Halle-Wittenberg 
Fachbereich Biochemie/Biotechnologie
Kurt-Mothes-Str. 3
06120 Halle/Saale
Phone: +49-345-55-24942

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:30 CST