Re: benchmarking on Cray XT4

From: Hannes Loeffler (Hannes.Loeffler_at_stfc.ac.uk)
Date: Tue Mar 16 2010 - 09:32:38 CDT

You've guessed right, Philip, it's Hector. I'm quite disappointed with
the results because it makes running jobs there quite expensive
obviously. I would be very much interested in your data too.

Axel, it is namd 2.6. The program has been compiled by the Hector
people if I am not mistaken.

Here are my benchmarking results.

# system: protein/membrane solvated with 465399 atoms total
# no. steps: 10.000
# machine: hector
# program/force field: namd2.6/CHARMM
#
# no. cores vs. CPUTime
#cores npepn=1 npepn=2 npepn=4
   8 15659.57 15923.37 16544.06
  16 8061.80 8205.97 8570.84
  32 4122.70 4181.28 4405.94
  64 2007.51 2043.67 2189.68
 128 1096.88 1142.68 1202.05
 256 617.09 674.21 789.80
 512 370.91 380.18 432.75
1024 247.69 258.52 270.41
2048 220.12 227.54 292.43

Thanks to all who answered,
Hannes.

On Tue, 16 Mar 2010 12:59:49 +0000
Philip Peartree <philpac_at_gmail.com> wrote:

> Hi Hannes
>
> I found a similar situation on the XT4. My understanding is that the
> seastar interconnect is shared across the cores of a processor,
> therefore 2048 tasks on 2048 processors is faster than 2048 tasks on
> 512 processors. From my experience, asking for 2048 pe with 4 tasks
> per node will give only 512 procs in use, which is slower, but in most
> job accounting methodologies users are billed per proc, so it is
> beneficial to work on as fewer procs as possible.
>
> Could I enquire the system you are working on, is it Hector? I can
> supply some data on this if you like
>
> Philip Peartree
> University of Manchester
>
> Sent from my iPhone
>
> On 16 Mar 2010, at 10:50, Hannes Loeffler <Hannes.Loeffler_at_stfc.ac.uk>
> wrote:
>
> > Hi,
> >
> > I am currently running some benchmarks on a Cray XT4 with quad-core
> > processors. Users can choose how many tasks to run per processor.
> > What I find is that a single task/processor outperforms a two
> > task/processor run which itself is faster than a four task/processor
> > run. I see this behaviour for processor counts from 8 to 2048.
> > Now, I do understand that there may be performance hits when certain
> > resource are shared but I would still have expected a different
> > outcome. Can anyone comment on my findings? Is that the
> > performance that I have to expect from namd on this architecture?
> >
> > Cheers,
> > Hannes.
> >

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:55:34 CST