AW: Always 24-way SMP?

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Feb 26 2013 - 00:27:27 CST

Next message: Norman Geist: "AW: RATTLE algorithm"
Previous message: Siri Søndergaard: "LES very slow"
In reply to: Andrew Pearson: "Re: Always 24-way SMP?"
Next in thread: Andrew Pearson: "Re: Always 24-way SMP?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hi Andrew,

nice to hear that so far. But I'm still confused about:

1. Charm++ telling it's a 24way smp node.

2. The speedup being 12

3. You telling it's a 16 core node

Could you post the output of "cat /proc/cpuinfo", so we can make sure that
we fully understood what's going on.

Norman Geist.

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Andrew Pearson
Gesendet: Montag, 25. Februar 2013 18:50
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: Always 24-way SMP?

Hello again Norman

Yes, this was exactly the problem. I disabled hyperthreading on a compute
node and performed my scaling test again, and this time the results were
perfect. The speedup is now linear, and I get 12.3x for a 16-core run on a
single 16-core node. Thank you for your advice and for pointing out this
problem -- this would have affected many of our users, and not just NAMD
users.

Andrew

On Mon, Feb 25, 2013 at 10:06 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:

Andrew,

what kind of cpu are you using on this node. What you experience remembers
me on hyper threading. Could it be that your machine has only 12 physical
cores, and the rest are the hyper threading "logical" cores? If so, it's no
wonder that namd can't get any benefits out of the virtual cores (actually
only a second command schedule per physical core), which are usually thought
to better fill up spaces in the cpu schedule when doing multitasking, as
tasks also produce wait times for example with disk IO. As namd doesn't
leave to much spaces because of being a highly optimized code, the maximum
speedup of 12 is reasonable.

So I think you have two six-core cpus on your node. Please let us know this
first.

Furthermore, I never observed problems with the precompiled namd builds. And
most things I read about it, were about infiniband and ofed stuff. Also,
this problems were about succesfully starting namd, but not about bad
parallel scaling.

Norman Geist.

Von: Andrew Pearson [mailto:andrew.j.pearson_at_gmail.com]
Gesendet: Montag, 25. Februar 2013 13:28
An: Norman Geist
Cc: Namd Mailing List
Betreff: Re: namd-l: Always 24-way SMP?

Hi Norman

Thanks for the response. I didn't phrase my question well - I know I'm
experiencing scaling problems, and I'm trying to determine whether
precompiled namd binaries are known to cause problems. I ask this since many
people seem to say that you should compile namd yourself to save headaches.

Your explanation about charm++ displaying information about the number of
cores makes sense. I'll bet that's what's happening.

My scaling problem is that for a given system (27 patches, 50000 atoms) I
get perfect speedup until nprocs = 12 and then the speedup line goes almost
flat. This occurs for runs performed on a single 16 core node.

Andrew

On Monday, February 25, 2013, Norman Geist wrote:

Hi Andrew,

it's a bad idea to ask someone else if you have scaling problems. You should
know if you have or not. The information from the outfile just comes from
the charm++ startup and is simply a information about the underlying
hardware. It doesn't mean it uses smp. It just tells you it's a
multiprocessor/multicore node. Watch the output carefully and you will see
IMHO that it uses the right number of cpus (for example the Benchmark
lines). So what kind of scaling problems you have? Don't you get the
expected speedup?

Norman Geist.

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Andrew Pearson
Gesendet: Freitag, 22. Februar 2013 19:30
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: Always 24-way SMP?

I'm investigating scaling problems with NAMD. I'm running precompiled
linux-64-tcp binaries on a linux cluster with 12-core nodes using "charmrun
+p $NPROCS ++mpiexec".

I know scaling problems have been covered, but I can't find the answer to my
specific question. No matter how many cores I use or how many nodes they
are spread over, at the top of stdout charm++ always reports "Running on #
unique compute nodes (24-way SMP)". It gets # correct, but it's always
24-way SMP. Is this supposed to be this way? If so, why?

Everyone seems to say that you should recompile NAMD with your own MPI
library, but I don't seem to have problems running NAMD jobs to completion
with charmrun + OpenMPI built with intel (except for the scaling). Could
using the precompiled binaries result in scaling problems?

Thank you.

Next message: Norman Geist: "AW: RATTLE algorithm"
Previous message: Siri Søndergaard: "LES very slow"
In reply to: Andrew Pearson: "Re: Always 24-way SMP?"
Next in thread: Andrew Pearson: "Re: Always 24-way SMP?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:20:57 CST