From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Tue Mar 08 2005 - 16:41:06 CST
Actually the figure I sent is newer than the one on the NAMD web site.
That figure was slightly old, and with 2 processors on 1 node, also the
new result was after some tuning of Charm performance on it.
There may be several factors contributing to the bad scaling:
1. gigabit ethernet usually scale not as good as myrinet
2. your simulation problem is already pretty fast, around 1.6 second on
one processor. So how often is the PME?
Anyway naturally it is a much challenging problem to scale a 1.6
second problem compared to a 4 seconds per step problem due to the even
less amount of work to parallelize.
3. Some of the recent optimization/tuning in Charm++ for Mac seemed to
helped a little bit. (for example, system malloc on Mac is slow,
changing it to our own tuned malloc helped improving the performance)
4. How many processors per node you ran the benchmark problem? Normally
running 2 processes on each node, there seems to have big OS. daemon
interference, which may show negative impact on the performance.
Compared with run with one processor per node to see.
Gengbin
Michael Grabe wrote:
> Thanks Gengbin,
>
> I have seen that benchmark graph many
> times and never noticed the G5 cluster on
> it.
>
> The scaling for 1-16 processors is almost
> perfect for Turing. i wonder why i am
> getting such poor speed up myself.
> See attached figure.
>
> I am using the charm binary that is packaged
> with the Darwin NAMD download, and I have
> a Gigaswitch between my machines.
> My test system is moderate size (~20-30K atoms)
> and I am using vdw cutoff of 10 Angst. and
> PME is ON. Any suggestions for what to try
> would be appreciated.
>
> -michael
>
>
> ------------------------------------------------------------------------
>
>
>
>
>
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:34 CST