[cluster-l] Single- vs. Dual- vs. Quad-core CPUs
Jay A. Kreibich
jay at kreibi.ch
Thu Mar 29 19:01:17 CDT 2007
On Thu, Mar 29, 2007 at 02:49:06PM -0500, Nils Oberg scratched on the wall:
>
> At 15:07 3/28/2007, Jay A. Kreibich wrote:
> > Additionally, wall-clock speed is not a great way to do performance
> > tests. In the end, it is usually what matters, but it isn't going to
> > answer very many questions of this nature.
>
> What would be a good way to do performance tests? I looked at things
> like valgrind and other performance testers, but the ones I saw were
> intrusive and slowed performance down.
It depends on what you're testing for. While wall-clock doesn't
offer you much idea of what is going on, in the end it is usually
what you care about. A high performance system is usually designed
to answer a question, and the only thing most people care about is
that the question is answered as quickly as possible in terms of
real-life minutes and seconds. If you can benchmark your actual
loads with actual data, that's what really counts.
It is only when you get to the question of tuning-- either the
algorithm or the hardware configuration-- that you need to ask more
detailed questions. If you're happy with the expected performance at
the prices you're looking at, it might not be worth any additional
testing. If, on the other hand, you're running thousands and
thousands of simulations and getting a 10% run-time improvement
translates to cutting out four or five weeks worth of work, it might
be worth investing a few days in tuning (NOTE: it isn't worth much
more than that, however). In order to improve runtimes, you need to
learn a lot more details about where your bottlenecks are and where
your runtime is being spent. Some of these are hard questions,
however. Looking to see how much time the process spends asleep
waiting for network traffic is fairly easy to answer. Looking to see
how many runtime cycles are spent waiting for memory due to cache
performance is much more tricky.
Linux is not my OS of choice, so I can't really offer specific
suggestions beyond saying that cluster tuning is a bit of a Heisenberg
issue. If you slow down the process by putting all kinds of
instrumentation on it, you might find some issues with cache
performance. On the other hand, the fact that you have the process
under inspection might change its network performance and hide issues
that are happening there. The issues are very similar to
multi-threaded programming, only worse.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"'People who live in bamboo houses should not throw pandas.' Jesus said that."
- "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
More information about the cluster-l
mailing list