scaling issues in distributed environment

From: Viral D. Tejani (vdt204_at_lehigh.edu)
Date: Tue Sep 04 2007 - 15:34:20 CDT

Hi,

We are running NAMD constant velocity simulations of an integrin-collagen complex in a waterbox, fixing collagen and pulling integrin away from collagen. The problem is computational efficiency of the NAMD simulations.

The size of our simulation is:
Info: ****************************
Info: STRUCTURE SUMMARY:
Info: 39236 ATOMS
Info: 27435 BONDS
Info: 18624 ANGLES
Info: 10038 DIHEDRALS
Info: 613 IMPROPERS
Info: 0 EXCLUSIONS
Info: 6 FIXED ATOMS
Info: 117690 DEGREES OF FREEDOM
Info: 13714 HYDROGEN GROUPS
Info: 0 HYDROGEN GROUPS WITH ALL ATOMS FIXED
Info: TOTAL MASS = 239916 amu
Info: TOTAL CHARGE = -0.999997 e
Info: *****************************

We are trying to use charm and NAMD over distributed network (using Condor to schedule all of our jobs... http://www.cs.wisc.edu/condor/description.html ). However we are not seeing the desired speedups. The LDB keeps showing load average of 6-7 even when we run on 40 nodes.

LDB: LOAD: AVG 6.33084 MAX 7.1477 MSGS: TOTAL 273 MAXC 16 MAXP 6 None
LDB: LOAD: AVG 6.33084 MAX 6.45697 MSGS: TOTAL 273 MAXC 16 MAXP 6 Refine

We are running over a cluster with 100MB/s Ethernet connections and sharing /home using NFS. All outputs and restart files are written on the shared NFS
volume.

Current performance shows:
WallClock: 118958.853257 CPUTime: 60204.160577 Memory: 25232 kB

A zip file located on: http://www.lehigh.edu/~vdt204/Integrin_Collagen_NAMD.zip contains the following files
- simulation log file (viral_pull_2.5e-4.log) (best viewed in WordPad)
- condor submit file (pull.submit) (best viewed in WordPad)
- condor log file (condor.log) (best viewed in WordPad)
- .conf file (pull_2.5e-4.txt) (can be viewed in NotePad)
- .pdb, .psf, and .ref files

Is this the expected behavior for a problem of this size and parameters, or can we do something to speed it up?

Thanks in advance for any suggestions.

- Viral Tejani

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:45:11 CST