From: Bjoern Olausson (namdlist_at_googlemail.com)
Date: Fri May 13 2011 - 13:10:46 CDT
Hi,
while tuning the "twoAway" options, the simulation which stalled on
156 cores now stalled on 264 cores.
with twoAwayX, twoAwayY, twoAwayZ all set to NO it stalls on 156 cores
with twoAwayX set to YES and twoAwayY, twoAwayZ set to NO it stalles
on 264 cores.
(This was tested with NAMD 2.7, but I guess 2.8 will behave the same way)
Please find the according logs under the following Link:
http://daten-transport.de/?id=7qK3HdCVnM7W (namd-logs.tar.bz2 584,5 Kilobytes)
Cheers and many thanks,
Bjoern
On Fri, May 13, 2011 at 15:20, Jim Phillips <jim_at_ks.uiuc.edu> wrote:
> Hi,
>
> Please send me the complete log file for the largest working and smallest
> hanging runs (I guess that's 144 and 156 cores).
>
> -Jim
>
>
> On Fri, 13 May 2011, Bjoern Olausson wrote:
>
>> Hi,
>>
>> with one of my Simulation I ran into the following problem.
>> Running the simulation "B" on less then 156 Cores works fine (Each try
>> incremented by 12 Cores).
>> But with 156 Cores the simulations hangs after minimization. Another
>> bigger simulation "A" runs fine with 156 Cores but stalls with 252.
>>
>> I am using NAMD_2.8b2_Linux-x86_64-ibverbs-net-linux-x86_64-ibverbs-icc
>> currently, but the same happens with NAMD 2.7:
>>
>> Simulation A is a monolayer (Vacuum | Monolayer with attached Protein
>> | Water | Monolayer with attached Protein | Vacuum)
>> Simulation B is the same but I removed the two proteins and some water
>> between the two monolayers.
>>
>> A has 163214 Atoms
>> B has 79687 Atoms
>>
>> I can't find a reason why it happens at a certain Core number.
>>
>> LINE MINIMIZER BRACKET: DX 2.26297e-05 6.07123e-05 DU -0.112343
>> 0.803579 DUDX -9856.98 -88.7072 26529.9
>> LINE MINIMIZER REDUCING GRADIENT FROM 488884 TO 488.884
>> PRESSURE: 998 -3096.26 0.240235 -2.11389 0.240235 -3036.98 30.6163
>> -2.11389 30.6163 -2719.13
>> GPRESSURE: 998 -3053.97 0.0322738 -2.31931 1.70752 -2997.23 32.1548
>> 1.12647 30.6867 -2682.59
>> ENERGY: 998 5798.1099 9606.5134 11613.1689
>> 14.3917 -220491.3201 259.2408 0.0000
>> 0.0000 0.0000 -193199.8954 0.0000
>> -193199.8954 -193199.8954 0.0000 -2950.7895
>> -2911.2626
>>
>> PRESSURE: 999 -3101.92 0.427017 -1.88108 0.427017 -3029.82 30.4947
>> -1.88108 30.4947 -2731.63
>> GPRESSURE: 999 -3056.02 0.387877 -3.93892 3.00918 -2994.69 32.1866
>> 0.17135 30.0678 -2692.69
>> ENERGY: 999 5831.4354 9616.9842 11604.8301
>> 13.8257 -220677.3820 308.1108 0.0000
>> 0.0000 0.0000 -193302.1958 0.0000
>> -193302.1958 -193302.1958 0.0000 -2954.4553
>> -2914.4624
>>
>> PRESSURE: 1000 -3101.92 0.427017 -1.88108 0.427017 -3029.82 30.4947
>> -1.88108 30.4947 -2731.63
>> GPRESSURE: 1000 -3056.02 0.387877 -3.93892 3.00918 -2994.69 32.1866
>> 0.171348 30.0678 -2692.69
>> TIMING: 1000 CPU: 24.3443, 0.0242553/step Wall: 24.388,
>> 0.0242993/step, 0 hours remaining, 238.144531 MB of memory in use.
>> ETITLE: TS BOND ANGLE DIHED
>> IMPRP ELECT VDW BOUNDARY MISC
>> KINETIC TOTAL TEMP POTENTIAL
>> TOTAL3 TEMPAVG PRESSURE GPRESSURE
>> ENERGY: 1000 5831.4354 9616.9842 11604.8301
>> 13.8257 -220677.3820 308.1108 0.0000
>> 0.0000 0.0000 -193302.1958 0.0000
>> -193302.1958 -193302.1958 0.0000 -2954.4553
>> -2914.4624
>>
>> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 1000
>> WRITING COORDINATES TO DCD FILE AT STEP 1000
>> WRITING COORDINATES TO RESTART FILE AT STEP 1000
>> FINISHED WRITING RESTART COORDINATES
>> The last position output (seq=1000) takes 0.026 seconds, 238.145 MB of
>> memory in use
>> WRITING VELOCITIES TO RESTART FILE AT STEP 1000
>> FINISHED WRITING RESTART VELOCITIES
>> The last velocity output (seq=1000) takes 0.019 seconds, 238.145 MB of
>> memory in use
>> REINITIALIZING VELOCITIES AT STEP 1000 TO 303 KELVIN.
>> TCL: Running for 9000 steps
>> PRESSURE: 1000 -1607.18 5.85548 -10.9122 5.85548 -1546.56 26.3568
>> -10.9122 26.3568 -886.287
>> GPRESSURE: 1000 -1469.55 7.5989 -10.7156 10.9579 -1410.74 22.6426
>> -10.5674 20.7688 -1127
>> ETITLE: TS BOND ANGLE DIHED
>> IMPRP ELECT VDW BOUNDARY MISC
>> KINETIC TOTAL TEMP POTENTIAL
>> TOTAL3 TEMPAVG PRESSURE GPRESSURE
>> ENERGY: 1000 607.1667 6226.7038 11604.6460
>> 13.8497 -203337.4899 27.6364 0.0000
>> 0.0000 52831.6131 -132025.8742 303.3486
>> -184857.4873 -132057.5192 303.3486 -1346.6784
>> -1335.7638
>>
>>
>>
>> it takes some hours until this message is printed:
>> [0] processControlPoints() haveControlPointChangeCallback=0
>> frameworkShouldAdvancePhase=0
>>
>> Any clue where I could search?
>> If you need more information, don't hesitate to ask.
>>
>> Cheers,
>> Bjoern
>>
>
This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:15 CST