Re: Performance Problems on Linux Cluster / Bad Scaling

From: Philip Peartree (P.Peartree_at_postgrad.manchester.ac.uk)
Date: Tue Sep 18 2007 - 10:34:22 CDT

It maybe that your system size is reaching the limits of scalability.
My system
of ~44000 atoms stops scaling well at about 56 procs on our Itanium2/Quadrics
cluster...

The best measure of performance is the parallel efficiency, that is:

Time on (1 proc/n x Time on n procs) x 100

This should give you an idea of how well it scales, aim for more than
60% is the
rule I've seen around. Sorry I don't have much more to add on
diagnostics. What
kind of time per step are you getting?

Philip Peartree

Quoting Marcus Rölz <m.roelz_at_gmx.de>:

>
> Hello folks,
>
> i am observing severe performance problems on our Linux-Cluster.
> If i double the namd-processes from 1 to 2 performance doubles and
> processor load is about 90%, but if i double up from 6 to 12
> processes performance even drops below the 6 processor performance
> and cpu usage is about 20%.
>
>
> The setup is the following:
>
> -32 machines, 2 dualcore processors each
> -Linux version 2.6.16.21-0.8-smp
> -Gigabit ethernet
> -simulated system with about 12.000 atoms
>
> I measured the network-performance with netperf and got about 900
> Mbit/sec which should be fine. The clusternodes i use are not heavily
> loaded.
>
> I am running out of ideas what to debug next. There must be a way to
> systematically debug these performance issues - but i didn't figure
> it out yet.
>
> Any help & good ideas would greatly be appreciated.
>
>
> Marcus (University of Greifswald)
>
>
>
> My config-file: (i already experimented with stepspercycle, twoAwayX,
> outputtiming)
>
> structure ./allwater_ws.psf
> coordinates ./allwater_ws.pdb
>
> set temperature 310
> set outputname allwater_wsout
>
> firsttimestep 0
>
> paraTypeCharmm on
> parameters ./par_all27_prot_na.prm
> temperature $temperature
>
>
> # Force-Field Parameters
> exclude scaled1-4
> 1-4scaling 1.0
> cutoff 12.
> switching on
> switchdist 10.
> pairlistdist 14.5 ##13.5
>
>
> # Integrator Parameters
> timestep 2.0 ;# 2fs/step
> rigidBonds all ;# needed for 2fs steps
> nonbondedFreq 1
> fullElectFrequency 2
> stepspercycle 30
> twoAwayX yes
> outputTiming 20
>
> # Constant Temperature Control
> langevin on ;# do langevin dynamics
> langevinDamping 5 ;# damping coefficient (gamma) of 5/ps
> langevinTemp $temperature
> langevinHydrogen off ;# don't couple langevin bath to hydrogens
>
> # Output
> outputName $outputname
>
> restartfreq 5000 ;# 500steps = every 1ps
> dcdfreq 250
> outputEnergies 100
> outputPressure 100
>
>
> # Spherical boundary conditions
> sphericalBC on
> sphericalBCcenter 21.3367443085 29.2859230042 26.6932468414
> sphericalBCr1 31.4426914717
> sphericalBCk1 10
> sphericalBCexp1 2
>
> # Minimization
> minimize 900 ;#100
> reinitvels $temperature
>
> run 30000000 ;# 5ps = 2500
>
>
>
>
>
> --
> Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:45:16 CST