scaling issue

From: Robert Bjornson (rbjornson_at_gmail.com)
Date: Fri Oct 21 2005 - 15:56:13 CDT

(sorry about the previous incomplete message)

Hi,

I've got a question about load balancing.

I'm working with NAMD 2.6b1 on a Rocks linux cluster with GigE. I've been
benchmarking an input set that contains ~58000 atoms. It's not
scaling very well:

Procs sec/timestep
1 2.5
8 .75
16 .67
(no improvement after 16)

I ran another input set ap0a1, which I found on the NAMD site, and for
which there are published benchmarks showing good scaling, and saw much better
scaling:

Procs sec/timestep
1 3.0
2 1.57
8 .512
16 .335
32 .243

While looking at the output, I noticed that the load balancing output
for the two cases was quite different:

Mine:
LDB: LOAD: AVG 21.9132 MAX 54.5234 MSGS: TOTAL 693 MAXC 50 MAXP 11 None
LDB: LOAD: AVG 21.9132 MAX 40.0368 MSGS: TOTAL 782 MAXC 56 MAXP 11 Refine

ap0a1:
LDB: LOAD: AVG 7.90276 MAX 10.1384 MSGS: TOTAL 705 MAXC 13 MAXP 7 None
LDB: LOAD: AVG 7.90276 MAX 8.0608 MSGS: TOTAL 707 MAXC 13 MAXP 7 Refine

Seems to me that the LDB output on my input set is telling me that
it's failing to do a good job of load balancing, and that might be the
cause of my poor performance. I'm wondering if anyone can tell me if
they think that's a reasonable conjecture, and if so, whether there
might be any tricks for improving the load balance.

I'll attach the two config files, in case they might be of help.

Thanks very much for any assistence,

Rob Bjornson


This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:41:16 CST