Re: Performance Problems on Linux Cluster / Bad Scaling

From: Dow_Hurst (dhurst_at_mindspring.com)
Date: Sun Sep 23 2007 - 08:39:40 CDT

Marcus,
Do you have a non-blocking switch in your cluster that is dedicated to your NAMD communications? Do you have a separate network for cluster management and NFS mounts in the cluster? I've seen NAMD choke on switches that aren't non-blocking for all ports. You need a dedicated full-duplex full bandwidth low latency switching to make efficient communication possible on Gig-E for NAMD. SMC makes a good 24 port non-blocking switch that we have in our cluster. We can scale to 24 cores on 96K atom system before we see efficiency start dropping. I'm not familiar with netperf so I wonder if that number you've measured is a single port to port bandwidth
Best wishes,
Dow

-----Original Message-----
>From: "Marcus Rölz" <m.roelz_at_gmx.de>
>Sent: Sep 18, 2007 9:17 AM
>To: namd-l_at_ks.uiuc.edu
>Subject: namd-l: Performance Problems on Linux Cluster / Bad Scaling
>
>
>Hello folks,
>
>i am observing severe performance problems on our Linux-Cluster.
>If i double the namd-processes from 1 to 2 performance doubles and processor load is about 90%, but if i double up from 6 to 12 processes performance even drops below the 6 processor performance and cpu usage is about 20%.
>
>
>The setup is the following:
>
>-32 machines, 2 dualcore processors each
>-Linux version 2.6.16.21-0.8-smp
>-Gigabit ethernet
>-simulated system with about 12.000 atoms
>
>I measured the network-performance with netperf and got about 900 Mbit/sec which should be fine. The clusternodes i use are not heavily loaded.
>
>I am running out of ideas what to debug next. There must be a way to systematically debug these performance issues - but i didn't figure it out yet.
>
>Any help & good ideas would greatly be appreciated.
>
>
>Marcus (University of Greifswald)
>
>
>
>My config-file: (i already experimented with stepspercycle, twoAwayX, outputtiming)
>
>structure ./allwater_ws.psf
>coordinates ./allwater_ws.pdb
>
>set temperature 310
>set outputname allwater_wsout
>
>firsttimestep 0
>
>paraTypeCharmm on
>parameters ./par_all27_prot_na.prm
>temperature $temperature
>
>
># Force-Field Parameters
>exclude scaled1-4
>1-4scaling 1.0
>cutoff 12.
>switching on
>switchdist 10.
>pairlistdist 14.5 ##13.5
>
>
># Integrator Parameters
>timestep 2.0 ;# 2fs/step
>rigidBonds all ;# needed for 2fs steps
>nonbondedFreq 1
>fullElectFrequency 2
>stepspercycle 30
>twoAwayX yes
>outputTiming 20
>
># Constant Temperature Control
>langevin on ;# do langevin dynamics
>langevinDamping 5 ;# damping coefficient (gamma) of 5/ps
>langevinTemp $temperature
>langevinHydrogen off ;# don't couple langevin bath to hydrogens
>
># Output
>outputName $outputname
>
>restartfreq 5000 ;# 500steps = every 1ps
>dcdfreq 250
>outputEnergies 100
>outputPressure 100
>
>
># Spherical boundary conditions
>sphericalBC on
>sphericalBCcenter 21.3367443085 29.2859230042 26.6932468414
>sphericalBCr1 31.4426914717
>sphericalBCk1 10
>sphericalBCexp1 2
>
># Minimization
>minimize 900 ;#100
>reinitvels $temperature
>
>run 30000000 ;# 5ps = 2500
>
>
>
>
>
>--
>Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
>Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
>

No sig.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:45:17 CST