Re: Background load problem

From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Fri Feb 27 2009 - 15:10:29 CST

That's just normal information on the performance of the load balancer,
and does not indicate a problem. However, the very low loads on all of
your processors indicates you're running on too many processors for your
system...

Best,
Peter

Anirban wrote:
> Hi Peter,
>
> Yes, on a lower number processors I am not getting that error. But now I
> am getting the following lines in the log file between every successive
> step:
> ------------------------------------------------------------------------
> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1017000000
> WRITING COORDINATES TO DCD FILE AT STEP 1017000000
> WRITING COORDINATES TO RESTART FILE AT STEP 1017000000
> FINISHED WRITING RESTART COORDINATES
> WRITING VELOCITIES TO RESTART FILE AT STEP 1017000000
> FINISHED WRITING RESTART VELOCITIES
> LDB: LOAD: AVG 0.0184382 MAX 0.0304735 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0184382 MAX 0.0304735 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0184247 MAX 0.0304558 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0184247 MAX 0.0304558 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0184401 MAX 0.0304849 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0184401 MAX 0.0304849 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0183048 MAX 0.0301793 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0183048 MAX 0.0301793 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0182905 MAX 0.0301466 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0182905 MAX 0.0301466 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0182922 MAX 0.0301778 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0182922 MAX 0.0301778 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0183165 MAX 0.0301847 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0183165 MAX 0.0301847 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0182986 MAX 0.0301688 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0182986 MAX 0.0301688 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0183072 MAX 0.0301442 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0183072 MAX 0.0301442 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0183293 MAX 0.0301492 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> LDB: LOAD: AVG 0.0183293 MAX 0.0301492 MSGS: TOTAL 1 MAXC 1 MAXP 1
> Refine
> LDB: LOAD: AVG 0.0183219 MAX 0.0301478 MSGS: TOTAL 1 MAXC 1 MAXP 1
> None
> --------------------------------------------------------------------
> Is it normal? Any suggestion is welcome.
>
> Regards,
>
>
> On Thu, 2009-02-26 at 11:45 -0600, Peter Freddolino wrote:
>> Hi Anirban,
>>
>> it's unlikely that you'll get anything meaningful out of the tiny system
>> you're describing.
>>
>> Please note that by the metric I described previously (1000
>> atoms/processor), it is not surprising that your 5000 atom system
>> doesn't scale to 16 processors. To get better than 1000 atoms/processor
>> you need to follow the steps about performance tuning on the namd wiki.
>> You should be sure that you're using a cvs version of namd, as CG
>> systems have different scaling properties that 2.6 may not handle as well.
>>
>> You might want to try turning on ldbUnloadZero if your background load
>> warnings persist with reasonable numbers of processors.
>> Peter
>>
>> Anirban wrote:
>>> Hi Peter,
>>>
>>> Actually that is a shape-based system, thats why the size is too small.
>>> I also tried to run a RBCG model of 5622 CG beads and that is also not
>>> scaling beyond 16 processors. And I am getting the same comment "High
>>> background load" for this system also. What should I do?
>>>
>>> Regards,
>>>
>>>
>>> On Thu, 2009-02-26 at 07:50 -0600, Peter Freddolino wrote:
>>>> Hi Anirban,
>>>> namd is generally able to scale efficiently in parallel up to 100-1000
>>>> atoms per processor, depending on your exact system. Trying to run a 45
>>>> particle system on more than one processor is unlikely to give
>>>> significant returns, and certainly running on more than one node (where
>>>> network latency comes into play) is right out. Have you tried
>>>> benchmarking with different numbers of processors?
>>>>
>>>> How is your system so small? Do you not have cg water?
>>>>
>>>> Best,
>>>> Peter
>>>>
>>>> Anirban Ghosh wrote:
>>>>> Hi ALL,
>>>>>
>>>>> I am running a CGMD simulation of a system comprising of 45 beads using
>>>>> NAMD. I am using 16 processors (4 nodes) to run the job. Although no other
>>>>> jobs are running on this node and CPU usage % is only 10-11%, but still I
>>>>> am getting the following error messages related to background loads.
>>>>> Because of this the log files are becoming too large and the run-times are
>>>>> increasing exponentially.
>>>>> -----------------------------------------------------------------------------
>>>>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1000100000
>>>>> OPENING COORDINATE DCD FILE
>>>>> WRITING COORDINATES TO DCD FILE AT STEP 1000100000
>>>>> WRITING COORDINATES TO RESTART FILE AT STEP 1000100000
>>>>> FINISHED WRITING RESTART COORDINATES
>>>>> WRITING VELOCITIES TO RESTART FILE AT STEP 1000100000
>>>>> FINISHED WRITING RESTART VELOCITIES
>>>>> LDB: LOAD: AVG 0.00873701 MAX 0.0428593 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> None
>>>>> Warning: 1 processors are overloaded due to high background load.
>>>>> LDB: LOAD: AVG 0.00873701 MAX 0.0428593 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> Refine
>>>>> LDB: LOAD: AVG 0.00786597 MAX 0.0348988 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> None
>>>>> Warning: 2 processors are overloaded due to high background load.
>>>>> LDB: LOAD: AVG 0.00786597 MAX 0.031147 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> Refine
>>>>> LDB: LOAD: AVG 0.00806749 MAX 0.034482 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> None
>>>>> Warning: 2 processors are overloaded due to high background load.
>>>>> LDB: LOAD: AVG 0.00806749 MAX 0.0327935 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> Refine
>>>>> LDB: LOAD: AVG 0.00822078 MAX 0.0371771 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> None
>>>>> Warning: 1 processors are overloaded due to high background load.
>>>>> LDB: LOAD: AVG 0.00822078 MAX 0.0339828 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> Refine
>>>>> LDB: LOAD: AVG 0.00878559 MAX 0.0362875 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>>>> None
>>>>> -----------------------------------------------------------------------------
>>>>>
>>>>> How can I solve this problem? Any suggestion is appreciated.
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:25 CST