Re: Background load problem

From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Thu Feb 26 2009 - 11:45:00 CST

Hi Anirban,

it's unlikely that you'll get anything meaningful out of the tiny system
you're describing.

Please note that by the metric I described previously (1000
atoms/processor), it is not surprising that your 5000 atom system
doesn't scale to 16 processors. To get better than 1000 atoms/processor
you need to follow the steps about performance tuning on the namd wiki.
You should be sure that you're using a cvs version of namd, as CG
systems have different scaling properties that 2.6 may not handle as well.

You might want to try turning on ldbUnloadZero if your background load
warnings persist with reasonable numbers of processors.
Peter

Anirban wrote:
> Hi Peter,
>
> Actually that is a shape-based system, thats why the size is too small.
> I also tried to run a RBCG model of 5622 CG beads and that is also not
> scaling beyond 16 processors. And I am getting the same comment "High
> background load" for this system also. What should I do?
>
> Regards,
>
>
> On Thu, 2009-02-26 at 07:50 -0600, Peter Freddolino wrote:
>> Hi Anirban,
>> namd is generally able to scale efficiently in parallel up to 100-1000
>> atoms per processor, depending on your exact system. Trying to run a 45
>> particle system on more than one processor is unlikely to give
>> significant returns, and certainly running on more than one node (where
>> network latency comes into play) is right out. Have you tried
>> benchmarking with different numbers of processors?
>>
>> How is your system so small? Do you not have cg water?
>>
>> Best,
>> Peter
>>
>> Anirban Ghosh wrote:
>>> Hi ALL,
>>>
>>> I am running a CGMD simulation of a system comprising of 45 beads using
>>> NAMD. I am using 16 processors (4 nodes) to run the job. Although no other
>>> jobs are running on this node and CPU usage % is only 10-11%, but still I
>>> am getting the following error messages related to background loads.
>>> Because of this the log files are becoming too large and the run-times are
>>> increasing exponentially.
>>> -----------------------------------------------------------------------------
>>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1000100000
>>> OPENING COORDINATE DCD FILE
>>> WRITING COORDINATES TO DCD FILE AT STEP 1000100000
>>> WRITING COORDINATES TO RESTART FILE AT STEP 1000100000
>>> FINISHED WRITING RESTART COORDINATES
>>> WRITING VELOCITIES TO RESTART FILE AT STEP 1000100000
>>> FINISHED WRITING RESTART VELOCITIES
>>> LDB: LOAD: AVG 0.00873701 MAX 0.0428593 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> None
>>> Warning: 1 processors are overloaded due to high background load.
>>> LDB: LOAD: AVG 0.00873701 MAX 0.0428593 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> Refine
>>> LDB: LOAD: AVG 0.00786597 MAX 0.0348988 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> None
>>> Warning: 2 processors are overloaded due to high background load.
>>> LDB: LOAD: AVG 0.00786597 MAX 0.031147 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> Refine
>>> LDB: LOAD: AVG 0.00806749 MAX 0.034482 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> None
>>> Warning: 2 processors are overloaded due to high background load.
>>> LDB: LOAD: AVG 0.00806749 MAX 0.0327935 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> Refine
>>> LDB: LOAD: AVG 0.00822078 MAX 0.0371771 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> None
>>> Warning: 1 processors are overloaded due to high background load.
>>> LDB: LOAD: AVG 0.00822078 MAX 0.0339828 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> Refine
>>> LDB: LOAD: AVG 0.00878559 MAX 0.0362875 MSGS: TOTAL 15 MAXC 1 MAXP 15
>>> None
>>> -----------------------------------------------------------------------------
>>>
>>> How can I solve this problem? Any suggestion is appreciated.
>>>
>>>
>>> Regards,
>>>
>>>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:33 CST