NAMD 2.7/2.8b2 stuck - [0] processControlPoints() haveControlPointChangeCallback=0 frameworkShouldAdvancePhase=0

From: Bjoern Olausson (namdlist_at_googlemail.com)
Date: Thu May 12 2011 - 17:20:09 CDT

Hi,

with one of my Simulation I ran into the following problem.
Running the simulation "B" on less then 156 Cores works fine (Each try
incremented by 12 Cores).
But with 156 Cores the simulations hangs after minimization. Another
bigger simulation "A" runs fine with 156 Cores but stalls with 252.

I am using NAMD_2.8b2_Linux-x86_64-ibverbs-net-linux-x86_64-ibverbs-icc
currently, but the same happens with NAMD 2.7:

Simulation A is a monolayer (Vacuum | Monolayer with attached Protein
| Water | Monolayer with attached Protein | Vacuum)
Simulation B is the same but I removed the two proteins and some water
between the two monolayers.

A has 163214 Atoms
B has 79687 Atoms

I can't find a reason why it happens at a certain Core number.

LINE MINIMIZER BRACKET: DX 2.26297e-05 6.07123e-05 DU -0.112343
0.803579 DUDX -9856.98 -88.7072 26529.9
LINE MINIMIZER REDUCING GRADIENT FROM 488884 TO 488.884
PRESSURE: 998 -3096.26 0.240235 -2.11389 0.240235 -3036.98 30.6163
-2.11389 30.6163 -2719.13
GPRESSURE: 998 -3053.97 0.0322738 -2.31931 1.70752 -2997.23 32.1548
1.12647 30.6867 -2682.59
ENERGY: 998 5798.1099 9606.5134 11613.1689
14.3917 -220491.3201 259.2408 0.0000
0.0000 0.0000 -193199.8954 0.0000
-193199.8954 -193199.8954 0.0000 -2950.7895
-2911.2626

PRESSURE: 999 -3101.92 0.427017 -1.88108 0.427017 -3029.82 30.4947
-1.88108 30.4947 -2731.63
GPRESSURE: 999 -3056.02 0.387877 -3.93892 3.00918 -2994.69 32.1866
0.17135 30.0678 -2692.69
ENERGY: 999 5831.4354 9616.9842 11604.8301
13.8257 -220677.3820 308.1108 0.0000
0.0000 0.0000 -193302.1958 0.0000
-193302.1958 -193302.1958 0.0000 -2954.4553
-2914.4624

PRESSURE: 1000 -3101.92 0.427017 -1.88108 0.427017 -3029.82 30.4947
-1.88108 30.4947 -2731.63
GPRESSURE: 1000 -3056.02 0.387877 -3.93892 3.00918 -2994.69 32.1866
0.171348 30.0678 -2692.69
TIMING: 1000 CPU: 24.3443, 0.0242553/step Wall: 24.388,
0.0242993/step, 0 hours remaining, 238.144531 MB of memory in use.
ETITLE: TS BOND ANGLE DIHED
IMPRP ELECT VDW BOUNDARY MISC
       KINETIC TOTAL TEMP POTENTIAL
  TOTAL3 TEMPAVG PRESSURE GPRESSURE
ENERGY: 1000 5831.4354 9616.9842 11604.8301
13.8257 -220677.3820 308.1108 0.0000
0.0000 0.0000 -193302.1958 0.0000
-193302.1958 -193302.1958 0.0000 -2954.4553
-2914.4624

WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1000
WRITING COORDINATES TO DCD FILE AT STEP 1000
WRITING COORDINATES TO RESTART FILE AT STEP 1000
FINISHED WRITING RESTART COORDINATES
The last position output (seq=1000) takes 0.026 seconds, 238.145 MB of
memory in use
WRITING VELOCITIES TO RESTART FILE AT STEP 1000
FINISHED WRITING RESTART VELOCITIES
The last velocity output (seq=1000) takes 0.019 seconds, 238.145 MB of
memory in use
REINITIALIZING VELOCITIES AT STEP 1000 TO 303 KELVIN.
TCL: Running for 9000 steps
PRESSURE: 1000 -1607.18 5.85548 -10.9122 5.85548 -1546.56 26.3568
-10.9122 26.3568 -886.287
GPRESSURE: 1000 -1469.55 7.5989 -10.7156 10.9579 -1410.74 22.6426
-10.5674 20.7688 -1127
ETITLE: TS BOND ANGLE DIHED
IMPRP ELECT VDW BOUNDARY MISC
       KINETIC TOTAL TEMP POTENTIAL
  TOTAL3 TEMPAVG PRESSURE GPRESSURE
ENERGY: 1000 607.1667 6226.7038 11604.6460
13.8497 -203337.4899 27.6364 0.0000
0.0000 52831.6131 -132025.8742 303.3486
-184857.4873 -132057.5192 303.3486 -1346.6784
-1335.7638

it takes some hours until this message is printed:
[0] processControlPoints() haveControlPointChangeCallback=0
frameworkShouldAdvancePhase=0

Any clue where I could search?
If you need more information, don't hesitate to ask.

Cheers,
Bjoern

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:07 CST