Re: NAMD 2.7/2.8b2 stuck - [0] processControlPoints() haveControlPointChangeCallback=0 frameworkShouldAdvancePhase=0

From: Bjoern Olausson (namdlist_at_googlemail.com)
Date: Fri May 13 2011 - 13:10:46 CDT

Hi,

while tuning the "twoAway" options, the simulation which stalled on
156 cores now stalled on 264 cores.
with twoAwayX, twoAwayY, twoAwayZ all set to NO it stalls on 156 cores
with twoAwayX set to YES and twoAwayY, twoAwayZ set to NO it stalles
on 264 cores.

(This was tested with NAMD 2.7, but I guess 2.8 will behave the same way)
Please find the according logs under the following Link:
http://daten-transport.de/?id=7qK3HdCVnM7W (namd-logs.tar.bz2 584,5 Kilobytes)

Cheers and many thanks,
Bjoern

On Fri, May 13, 2011 at 15:20, Jim Phillips <jim_at_ks.uiuc.edu> wrote:
> Hi,
>
> Please send me the complete log file for the largest working and smallest
> hanging runs (I guess that's 144 and 156 cores).
>
> -Jim
>
>
> On Fri, 13 May 2011, Bjoern Olausson wrote:
>
>> Hi,
>>
>> with one of my Simulation I ran into the following problem.
>> Running the simulation "B" on less then 156 Cores works fine (Each try
>> incremented by 12 Cores).
>> But with 156 Cores the simulations hangs after minimization. Another
>> bigger simulation "A" runs fine with 156 Cores but stalls with 252.
>>
>> I am using NAMD_2.8b2_Linux-x86_64-ibverbs-net-linux-x86_64-ibverbs-icc
>> currently, but the same happens with NAMD 2.7:
>>
>> Simulation A is a monolayer (Vacuum | Monolayer with attached Protein
>> | Water | Monolayer with attached Protein | Vacuum)
>> Simulation B is the same but I removed the two proteins and some water
>> between the two monolayers.
>>
>> A has 163214 Atoms
>> B has   79687 Atoms
>>
>> I can't find a reason why it happens at a certain Core number.
>>
>> LINE MINIMIZER BRACKET: DX 2.26297e-05 6.07123e-05 DU -0.112343
>> 0.803579 DUDX -9856.98 -88.7072 26529.9
>> LINE MINIMIZER REDUCING GRADIENT FROM 488884 TO 488.884
>> PRESSURE: 998 -3096.26 0.240235 -2.11389 0.240235 -3036.98 30.6163
>> -2.11389 30.6163 -2719.13
>> GPRESSURE: 998 -3053.97 0.0322738 -2.31931 1.70752 -2997.23 32.1548
>> 1.12647 30.6867 -2682.59
>> ENERGY:     998      5798.1099      9606.5134     11613.1689
>> 14.3917        -220491.3201       259.2408         0.0000
>> 0.0000         0.0000        -193199.8954         0.0000
>> -193199.8954   -193199.8954         0.0000          -2950.7895
>> -2911.2626
>>
>> PRESSURE: 999 -3101.92 0.427017 -1.88108 0.427017 -3029.82 30.4947
>> -1.88108 30.4947 -2731.63
>> GPRESSURE: 999 -3056.02 0.387877 -3.93892 3.00918 -2994.69 32.1866
>> 0.17135 30.0678 -2692.69
>> ENERGY:     999      5831.4354      9616.9842     11604.8301
>> 13.8257        -220677.3820       308.1108         0.0000
>> 0.0000         0.0000        -193302.1958         0.0000
>> -193302.1958   -193302.1958         0.0000          -2954.4553
>> -2914.4624
>>
>> PRESSURE: 1000 -3101.92 0.427017 -1.88108 0.427017 -3029.82 30.4947
>> -1.88108 30.4947 -2731.63
>> GPRESSURE: 1000 -3056.02 0.387877 -3.93892 3.00918 -2994.69 32.1866
>> 0.171348 30.0678 -2692.69
>> TIMING: 1000  CPU: 24.3443, 0.0242553/step  Wall: 24.388,
>> 0.0242993/step, 0 hours remaining, 238.144531 MB of memory in use.
>> ETITLE:      TS           BOND          ANGLE          DIHED
>> IMPRP               ELECT            VDW       BOUNDARY           MISC
>>      KINETIC               TOTAL           TEMP      POTENTIAL
>>  TOTAL3        TEMPAVG            PRESSURE      GPRESSURE
>> ENERGY:    1000      5831.4354      9616.9842     11604.8301
>> 13.8257        -220677.3820       308.1108         0.0000
>> 0.0000         0.0000        -193302.1958         0.0000
>> -193302.1958   -193302.1958         0.0000          -2954.4553
>> -2914.4624
>>
>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1000
>> WRITING COORDINATES TO DCD FILE AT STEP 1000
>> WRITING COORDINATES TO RESTART FILE AT STEP 1000
>> FINISHED WRITING RESTART COORDINATES
>> The last position output (seq=1000) takes 0.026 seconds, 238.145 MB of
>> memory in use
>> WRITING VELOCITIES TO RESTART FILE AT STEP 1000
>> FINISHED WRITING RESTART VELOCITIES
>> The last velocity output (seq=1000) takes 0.019 seconds, 238.145 MB of
>> memory in use
>> REINITIALIZING VELOCITIES AT STEP 1000 TO 303 KELVIN.
>> TCL: Running for 9000 steps
>> PRESSURE: 1000 -1607.18 5.85548 -10.9122 5.85548 -1546.56 26.3568
>> -10.9122 26.3568 -886.287
>> GPRESSURE: 1000 -1469.55 7.5989 -10.7156 10.9579 -1410.74 22.6426
>> -10.5674 20.7688 -1127
>> ETITLE:      TS           BOND          ANGLE          DIHED
>> IMPRP               ELECT            VDW       BOUNDARY           MISC
>>      KINETIC               TOTAL           TEMP      POTENTIAL
>>  TOTAL3        TEMPAVG            PRESSURE      GPRESSURE
>> ENERGY:    1000       607.1667      6226.7038     11604.6460
>> 13.8497        -203337.4899        27.6364         0.0000
>> 0.0000     52831.6131        -132025.8742       303.3486
>> -184857.4873   -132057.5192       303.3486          -1346.6784
>> -1335.7638
>>
>>
>>
>> it takes some hours until this message is printed:
>> [0] processControlPoints() haveControlPointChangeCallback=0
>> frameworkShouldAdvancePhase=0
>>
>> Any clue where I could search?
>> If you need more information, don't hesitate to ask.
>>
>> Cheers,
>> Bjoern
>>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:57:07 CST