Re: Background load problem

From: Anirban (anirbang_at_cdac.in)
Date: Fri Feb 27 2009 - 03:38:16 CST

Hi Peter,

Yes, on a lower number processors I am not getting that error. But now I
am getting the following lines in the log file between every successive
step:
------------------------------------------------------------------------
WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1017000000
WRITING COORDINATES TO DCD FILE AT STEP 1017000000
WRITING COORDINATES TO RESTART FILE AT STEP 1017000000
FINISHED WRITING RESTART COORDINATES
WRITING VELOCITIES TO RESTART FILE AT STEP 1017000000
FINISHED WRITING RESTART VELOCITIES
LDB: LOAD: AVG 0.0184382 MAX 0.0304735 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0184382 MAX 0.0304735 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0184247 MAX 0.0304558 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0184247 MAX 0.0304558 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0184401 MAX 0.0304849 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0184401 MAX 0.0304849 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0183048 MAX 0.0301793 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0183048 MAX 0.0301793 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0182905 MAX 0.0301466 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0182905 MAX 0.0301466 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0182922 MAX 0.0301778 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0182922 MAX 0.0301778 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0183165 MAX 0.0301847 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0183165 MAX 0.0301847 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0182986 MAX 0.0301688 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0182986 MAX 0.0301688 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0183072 MAX 0.0301442 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0183072 MAX 0.0301442 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0183293 MAX 0.0301492 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
LDB: LOAD: AVG 0.0183293 MAX 0.0301492 MSGS: TOTAL 1 MAXC 1 MAXP 1
Refine
LDB: LOAD: AVG 0.0183219 MAX 0.0301478 MSGS: TOTAL 1 MAXC 1 MAXP 1
None
--------------------------------------------------------------------
Is it normal? Any suggestion is welcome.

Regards,

On Thu, 2009-02-26 at 11:45 -0600, Peter Freddolino wrote:
> Hi Anirban,
>
> it's unlikely that you'll get anything meaningful out of the tiny system
> you're describing.
>
> Please note that by the metric I described previously (1000
> atoms/processor), it is not surprising that your 5000 atom system
> doesn't scale to 16 processors. To get better than 1000 atoms/processor
> you need to follow the steps about performance tuning on the namd wiki.
> You should be sure that you're using a cvs version of namd, as CG
> systems have different scaling properties that 2.6 may not handle as well.
>
> You might want to try turning on ldbUnloadZero if your background load
> warnings persist with reasonable numbers of processors.
> Peter
>
> Anirban wrote:
> > Hi Peter,
> >
> > Actually that is a shape-based system, thats why the size is too small.
> > I also tried to run a RBCG model of 5622 CG beads and that is also not
> > scaling beyond 16 processors. And I am getting the same comment "High
> > background load" for this system also. What should I do?
> >
> > Regards,
> >
> >
> > On Thu, 2009-02-26 at 07:50 -0600, Peter Freddolino wrote:
> >> Hi Anirban,
> >> namd is generally able to scale efficiently in parallel up to 100-1000
> >> atoms per processor, depending on your exact system. Trying to run a 45
> >> particle system on more than one processor is unlikely to give
> >> significant returns, and certainly running on more than one node (where
> >> network latency comes into play) is right out. Have you tried
> >> benchmarking with different numbers of processors?
> >>
> >> How is your system so small? Do you not have cg water?
> >>
> >> Best,
> >> Peter
> >>
> >> Anirban Ghosh wrote:
> >>> Hi ALL,
> >>>
> >>> I am running a CGMD simulation of a system comprising of 45 beads using
> >>> NAMD. I am using 16 processors (4 nodes) to run the job. Although no other
> >>> jobs are running on this node and CPU usage % is only 10-11%, but still I
> >>> am getting the following error messages related to background loads.
> >>> Because of this the log files are becoming too large and the run-times are
> >>> increasing exponentially.
> >>> -----------------------------------------------------------------------------
> >>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1000100000
> >>> OPENING COORDINATE DCD FILE
> >>> WRITING COORDINATES TO DCD FILE AT STEP 1000100000
> >>> WRITING COORDINATES TO RESTART FILE AT STEP 1000100000
> >>> FINISHED WRITING RESTART COORDINATES
> >>> WRITING VELOCITIES TO RESTART FILE AT STEP 1000100000
> >>> FINISHED WRITING RESTART VELOCITIES
> >>> LDB: LOAD: AVG 0.00873701 MAX 0.0428593 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> None
> >>> Warning: 1 processors are overloaded due to high background load.
> >>> LDB: LOAD: AVG 0.00873701 MAX 0.0428593 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> Refine
> >>> LDB: LOAD: AVG 0.00786597 MAX 0.0348988 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> None
> >>> Warning: 2 processors are overloaded due to high background load.
> >>> LDB: LOAD: AVG 0.00786597 MAX 0.031147 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> Refine
> >>> LDB: LOAD: AVG 0.00806749 MAX 0.034482 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> None
> >>> Warning: 2 processors are overloaded due to high background load.
> >>> LDB: LOAD: AVG 0.00806749 MAX 0.0327935 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> Refine
> >>> LDB: LOAD: AVG 0.00822078 MAX 0.0371771 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> None
> >>> Warning: 1 processors are overloaded due to high background load.
> >>> LDB: LOAD: AVG 0.00822078 MAX 0.0339828 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> Refine
> >>> LDB: LOAD: AVG 0.00878559 MAX 0.0362875 MSGS: TOTAL 15 MAXC 1 MAXP 15
> >>> None
> >>> -----------------------------------------------------------------------------
> >>>
> >>> How can I solve this problem? Any suggestion is appreciated.
> >>>
> >>>
> >>> Regards,
> >>>
> >>>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:25 CST