[cluster-l] Single- vs. Dual- vs. Quad-core CPUs

Jay A. Kreibich jay at kreibi.ch
Thu Mar 29 19:01:17 CDT 2007


On Thu, Mar 29, 2007 at 02:49:06PM -0500, Nils Oberg scratched on the wall:
> 
> At 15:07 3/28/2007, Jay A. Kreibich wrote:
> >  Additionally, wall-clock speed is not a great way to do performance
> >  tests.  In the end, it is usually what matters, but it isn't going to
> >  answer very many questions of this nature.
> 
> What would be a good way to do performance tests?  I looked at things 
> like valgrind and other performance testers, but the ones I saw were 
> intrusive and slowed performance down.

  It depends on what you're testing for.  While wall-clock doesn't
  offer you much idea of what is going on, in the end it is usually
  what you care about.   A high performance system is usually designed
  to answer a question, and the only thing most people care about is
  that the question is answered as quickly as possible in terms of
  real-life minutes and seconds.  If you can benchmark your actual
  loads with actual data, that's what really counts.

  It is only when you get to the question of tuning-- either the
  algorithm or the hardware configuration-- that you need to ask more
  detailed questions.  If you're happy with the expected performance at
  the prices you're looking at, it might not be worth any additional
  testing.  If, on the other hand, you're running thousands and
  thousands of simulations and getting a 10% run-time improvement
  translates to cutting out four or five weeks worth of work, it might
  be worth investing a few days in tuning (NOTE: it isn't worth much
  more than that, however).  In order to improve runtimes, you need to
  learn a lot more details about where your bottlenecks are and where
  your runtime is being spent.  Some of these are hard questions,
  however.  Looking to see how much time the process spends asleep
  waiting for network traffic is fairly easy to answer.  Looking to see
  how many runtime cycles are spent waiting for memory due to cache
  performance is much more tricky.

  Linux is not my OS of choice, so I can't really offer specific
  suggestions beyond saying that cluster tuning is a bit of a Heisenberg
  issue.  If you slow down the process by putting all kinds of
  instrumentation on it, you might find some issues with cache
  performance.  On the other hand, the fact that you have the process
  under inspection might change its network performance and hide issues
  that are happening there.  The issues are very similar to
  multi-threaded programming, only worse.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"


More information about the cluster-l mailing list