Issue with Load Balancing

From: Al-Ali, Hassan (HAlAli_at_med.miami.edu)
Date: Fri Nov 05 2010 - 10:15:52 CDT

Hello everyone,

I'm running NAMD on fedora 12 64-bit installed in Vbox on a win7 64-bit (system: i7 CPU Q 820 , 8 GB RAM). NAMD starts and runs fine for short runs. Load balancing kicks in every now and then and I get this kind of messages:

------------------------------------------------------------------------------------------------------------------------------------------------------------

LDB: ============= START OF LOAD BALANCING ============== 10907.4

LDB: ============== END OF LOAD BALANCING =============== 10907.4

ENERGY: 124300 11028.7625 8412.6670 1448.0716 139.6135 -195827.3277 21124.1832 0.0000 0.0000 24463.0564 -129210.9736 203.3982 -153674.0300 -128705.3086 203.7087

WRITING COORDINATES TO DCD FILE AT STEP 124300

LDB: ============= START OF LOAD BALANCING ============== 10917.1

LDB: TIME 10917.1 LOAD: AVG 8.59323 MAX 9.29657 PROXIES: TOTAL 172 MAXPE 40 MAXPATCH 2 None 1.928

LDB: TIME 10917.1 LOAD: AVG 8.59323 MAX 9.00641 PROXIES: TOTAL 235 MAXPE 42 MAXPATCH 3 RefineTorusLB 1.928

LDB: ============== END OF LOAD BALANCING =============== 10917.1

------------------------------------------------------------------------------------------------------------------------------------------------------------

Often, it does the job and the run continues. However, sometimes it hangs up as such:

------------------------------------------------------------------------------------------------------------------------------------------------------------

WRITING COORDINATES TO DCD FILE AT STEP 128200

LDB: ============= START OF LOAD BALANCING ============== 11259.6

LDB: ============== END OF LOAD BALANCING =============== 11259.6

ENERGY: 128300 10786.9284 8530.2603 1444.4748 139.8239 -195399.8867 20960.1339 0.0000 0.0000 24350.0058 -129188.2595 202.4582 -153538.2653 -128697.3885 202.6723

WRITING COORDINATES TO DCD FILE AT STEP 128300

LDB: ============= START OF LOAD BALANCING ============== 11269.8

LDB: TIME 11269.8 LOAD: AVG 8.12661 MAX 9.05393 PROXIES: TOTAL 172 MAXPE 40 MAXPATCH 2 None 1.96374

------------------------------------------------------------------------------------------------------------------------------------------------------------

And then stays there. Is there anything I can do other than kill the job and restart it every time this happens?

Thanks in advance.

____________________
Hassan Al-Ali, Ph.D.
Postdoctoral Associate
University of Miami Miller School of Medicine

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:23:22 CST