RE: Will MPICH2 work for NAMD? and the best way to build/compile NAMD to exploit my hardware resource?

From: Lee Wei Yang (lwyang_at_ljbi.org)
Date: Fri Nov 13 2009 - 16:12:50 CST

Hi Bjoern,

Thanks for sharing the test results. I used apoa1 to benchmark our speed.

Info: Benchmark time: 8 CPUs 0.166955 s/step 1.93235 days/ns 730.277 MB memory
TIMING: 500 CPU: 670.224, 1.37534/step Wall: 83.8886, 0.171952/step, 0 hours remaining, 730.277344 MB of memory in use.

Info: Benchmark time: 16 CPUs 0.136937 s/step 1.58492 days/ns 1187.84 MB memory
TIMING: 500 CPU: 1110.69, 2.16082/step Wall: 70.2034, 0.136174/step, 0 hours remaining, 1187.835938 MB of memory in use.

Info: Benchmark time: 12 CPUs 0.188997 s/step 2.18746 days/ns 888.484 MB memory
TIMING: 500 CPU: 1101.01, 2.2842/step Wall: 92.0199, 0.190696/step, 0 hours remaining, 888.484375 MB of memory in use.

Notice that as using 12 CPUs, the speed for apoa1 is even slower than 8 CPUs. I see the same for our system (42000+ atoms) as well - 12 cpus are slower than 8 cpus. not sure it is due to HyperThreading or the way the CPUs are taking patches/computing objects.

Also, thanks for the "Benchmarking different flavours of NAMD2.7b1".

Lee
________________________________________
From: Bjoern Olausson [namdlist_at_googlemail.com]
Sent: Friday, November 13, 2009 1:39 AM
To: Lee Wei Yang; namd-l_at_ks.uiuc.edu
Subject: Re: namd-l: Will MPICH2 work for NAMD? and the best way to build/compile NAMD to exploit my hardware resource?

On Friday 13 November 2009 07:47:49 you wrote:
> Hi Bjoern and NAMD users,
>
> I installed Intel C Compiler and got it built and compiled. I hope the
> procedures are right. I did the following: ./build charm++
> multicore-linux64 icc -j8 -O2
> ./config Linux-x86_64-icc --charm-arch multicore-linux64-icc
> path/.../NAMD_CVS_Source/Linux-x86_64-icc/make
>
> when I ran it, I did
> path/namd2 +setcpuaffinity +pN namd2.conf
> (N is the number of cores; with/without +setcpuaffinity seem identical in
> speed according to our 42000+ atoms NVT ensemble)
>
Why not use the apoa1 benchmark so we can compare our experience and results
and to exclude configuration differences

> Just as a reminder that I have two quad-core Nehalem. Because each core is
> hyper-threaded I can see 16 CPU in my system monitor and also I can fully
> use them for a NAMD run. The following two setup (when using 8 cores; N
> =8) get the same speed: 1) NAMD_CVS_Linux-x86_64 executables downloaded
> from NAMD website and then it as /path/NAMD_CVS_Linux-x86_64/charmrun
> /path/NAMD_CVS_Linux-x86_64/namd2 ++local +idlepoll +pN ./heat1.conf >
> log 2) locally compiled version as described in the first paragraph and
> run it as (per Bjoem's instructions)
> /path/NAMD_CVS_Source/Linux-x86_64-icc/namd2 +pN ./heat1.conf > log
>
> However, when I use 16 cores (N=16)
> the setup 2) outperforms setup 1) by 8%; not able to justify the reason, I
> appreciate any comment on this. Another observation is that, for setup 2),
> when N goes from 8 to 16, the speed increases only by 14%.
>
I have no exact idea why 2 outperforms 1. I am curios too (Please comment
someone on that).
But for the decrease in performance gain for >8Cores... the HT Cores are no
real cores, the CPU just processes one more thread per Core, "more ore less
parallel" so the decrease in performance gain is normal and was anticipated.

I tried the same and compared a Intel i7 with HT to AMD Istanbul (Hexacore)
and found the same results, see here:
http://olausson.de/component/content/article/6-Blog/59-intel-core-i7-920-vs-
amd-istanbul-2427

And here you can see my tests for the different flavors of NAMD:
http://olausson.de/component/content/article/6-Blog/57-benchmarking-different-
flavours-of-namd27b1

> I have two quad-core Nehalem CPUs and 3 Tesla GPU cards in one machine.
> so far 3 cores/threads + 3 GPUs gave the best performance --> about 60 %
> faster than setup 2) with N=8 However, by doing this, the remaining 13
> cores/threads would just sit idle.
>
> How can we best utilize 16 cores/threads + 3 GPUs machine to run a SINGLE
> NAMD job ? Any comment will help. Thank you.
>
I don't have any experience with CUDA, but there was a long thread about your
issue here:
http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l/10779.html

Only non-bonded interactions are (currently) IMHO accelerated via GPU, so
there is left some stuff for the CPUs and I would guess that nCPU > nGPU
shifts depending on your system size.

Greetings
Bjoern

--
Bjoern Olausson
Martin-Luther-Universität Halle-Wittenberg
Fachbereich Biochemie/Biotechnologie
Kurt-Mothes-Str. 3
06120 Halle/Saale
Phone: +49-345-55-24942

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:30 CST