Re: benchmarking on Cray XT4

From: Philip Peartree (
Date: Tue Mar 16 2010 - 07:59:49 CDT

Hi Hannes

I found a similar situation on the XT4. My understanding is that the
seastar interconnect is shared across the cores of a processor,
therefore 2048 tasks on 2048 processors is faster than 2048 tasks on
512 processors. From my experience, asking for 2048 pe with 4 tasks
per node will give only 512 procs in use, which is slower, but in most
job accounting methodologies users are billed per proc, so it is
beneficial to work on as fewer procs as possible.

Could I enquire the system you are working on, is it Hector? I can
supply some data on this if you like

Philip Peartree
University of Manchester

Sent from my iPhone

On 16 Mar 2010, at 10:50, Hannes Loeffler <>

> Hi,
> I am currently running some benchmarks on a Cray XT4 with quad-core
> processors. Users can choose how many tasks to run per processor.
> What I find is that a single task/processor outperforms a two
> task/processor run which itself is faster than a four task/processor
> run. I see this behaviour for processor counts from 8 to 2048. Now,
> I do understand that there may be performance hits when certain
> resource are shared but I would still have expected a different
> outcome. Can anyone comment on my findings? Is that the
> performance that I have to expect from namd on this architecture?
> Cheers,
> Hannes.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:54 CST