Re: 2.9b1 crashes

From: Chris Harrison (charris5_at_gmail.com)
Date: Tue Mar 27 2012 - 20:33:25 CDT

Dear Francesco,

Can you send me files to try and replicate this run/crash?

Best,
Chris

On Tue, Mar 27, 2012 at 12:25 PM, Francesco Pietra <chiendarret_at_gmail.com>wrote:

> Chris:
> binary 2.9b1, shared mem, Copy of first and last part of one of the
> interrupted simulations:
>
> Info: Based on Charm++/Converse 60400 for multicore-linux64-iccstatic
> Info: Built Fri Mar 23 02:24:33 CDT 2012 by jim on lisboa.ks.uiuc.edu
> Info: 1 NAMD CVS-2012-03-23 Linux-x86_64-multicore-CUDA 6
> Info: Running on 6 processors, 1 nodes, 1 physical nodes.
> Info: CPU topology information available.
> Info: Charm++/Converse parallel runtime startup completed at 0.0161591 s
> Pe 1 physical rank 1 will use CUDA device of pe 2
> Pe 5 physical rank 5 will use CUDA device of pe 4
> Pe 3 physical rank 3 will use CUDA device of pe 4
> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX
> 580' Mem: 1535MB Rev: 2.0
> Pe 4 physical rank 4 binding to CUDA device 1 on gig64: 'GeForce GTX
> 580' Mem: 1535MB Rev: 2.0
> Did not find +devices i,j,k,... argument, using all
> Pe 0 physical rank 0 will use CUDA device of pe 2
> Info: 8.08203 MB of memory in use based on /proc/self/stat
> .........................................
>
> 346; IT CHANGED BY 0.021601997104863102
> TCL: RAMD: 307580 >>> THE DISTANCE TRAVELLED BY THE LIGAND IS:
> 0.024398553207142418 (> 0.002)
> TCL: RAMD: 307580 >>> CONTINUE WITH 10 STEPS OF RAMD SIMULATION
> TCL: RAMD: 307580 >>> KEEP PREVIOUS ACCELERATION DIRECTION:
> -0.014608637449769847 0.6976478739797756 -0.7162918620530045; ||r|| =
> 1.0
> TCL: RAMD FORCE: 307590 > LIGAND COM is: 28.823670817865874
> 50.24453136853554 21.420611595557947
> TCL: RAMD FORCE: 307590 > PROTEIN COM IS 44.10530837455286
> 41.03634602179381 45.33359806029746
> TCL: RAMD FORCE: 307590 > EXTERNAL FORCE VECTOR (F):
> -0.1725104793964081 8.2383842849103 -8.458547413157712; ||F|| =
> 11.808800101280214
> TCL: RAMD FORCE: 307590 > EXTERNAL FORCE DIRECTION (r):
> -0.014608637449769847 0.6976478739797756 -0.7162918620530045; ||r|| =
> 1.0
> TCL: RAMD FORCE: 307590 > TOTAL FORCE ON THE LIGAND COM IS:
> -14.446372659790779 14.467027540234124 1.6080987533820146
> (278.76769196328866)
> TCL: RAMD: 307590 ***** EVALUATE 10 RAMD STEPS AT TIMESTEP 307590 *****
> TCL: RAMD: 307590 >>> DISTANCE LIGAND COM - PPROTEIN COM IS:
> 29.83538244193814; IT CHANGED BY 0.022416677617794534
> TCL: RAMD: 307590 >>> THE DISTANCE TRAVELLED BY THE LIGAND IS:
> 0.023730819999103987 (> 0.002)
> TCL: RAMD: 307590 >>> CONTINUE WITH 10 STEPS OF RAMD SIMULATION
> TCL: RAMD: 307590 >>> KEEP PREVIOUS ACCELERATION DIRECTION:
> -0.014608637449769847 0.6976478739797756 -0.7162918620530045; ||r|| =
> 1.0
> TCL: RAMD FORCE: 307600 > LIGAND COM is: 28.788177891078085
> 50.254477369073776 21.402313808257652
> TCL: RAMD FORCE: 307600 > PROTEIN COM IS 44.106214050700984
> 41.0367525808034 45.334669716623246
> TCL: RAMD FORCE: 307600 > EXTERNAL FORCE VECTOR (F):
> -0.1725104793964081 8.2383842849103 -8.458547413157712; ||F|| =
> 11.808800101280214
> TCL: RAMD FORCE: 307600 > EXTERNAL FORCE DIRECTION (r):
> -0.014608637449769847 0.6976478739797756 -0.7162918620530045; ||r|| =
> 1.0
> TCL: RAMD FORCE: 307600 > TOTAL FORCE ON THE LIGAND COM IS:
> -20.690664226093556 0.5551696291805648 -15.11794172186723
> (290.5796750467721)
> TCL: RAMD: 307600 ***** EVALUATE 10 RAMD STEPS AT TIMESTEP 307600 *****
> TCL: RAMD: 307600 >>> DISTANCE LIGAND COM - PPROTEIN COM IS:
> 29.872501424964234; IT CHANGED BY 0.037118983026093844
> TCL: RAMD: 307600 >>> THE DISTANCE TRAVELLED BY THE LIGAND IS:
> 0.042320616568623064 (> 0.002)
> TCL: RAMD: 307600 >>> CONTINUE WITH 10 STEPS OF RAMD SIMULATION
> TCL: RAMD: 307600 >>> KEEP PREVIOUS ACCELERATION DIRECTION:
> -0.014608637449769847 0.6976478739797756 -0.7162918620530045; ||r|| =
> 1.0
> ENERGY: 307600 31023.2158 3000.3836 3812.8794
> 0.0000 -223730.4895 31379.1808 0.0000
> 0.0000 46519.6766 -107995.1533 300.5182
> -154514.8299 -106988.7545 300.3515 110.4391
> 156.7135 506633.1739 -50.1573 -54.5505
>
> WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 307600
> WRITING COORDINATES TO RESTART FILE AT STEP 307600
> FINISHED WRITING RESTART COORDINATES
> WRITING VELOCITIES TO RESTART FILE AT STEP 307600
> FINISHED WRITING RESTART VELOCITIES
>
> Should you find anomalies in the log, I could provide successful log
> for the same files.
>
> regards
> francesco
>
>
>
>
>
> On Tue, Mar 27, 2012 at 5:15 PM, Chris Harrison <charris5_at_gmail.com>
> wrote:
> > Francesco,
> >
> > We're going to need more info. Since RAMD is predominately a Tcl-script
> > based implementation at present, your particular input, any altered
> scripts,
> > and underlying parallel infrastructure (charm++ version, mpi vs non-mpi,
> > network, etc) could be involved as a potential problem.
> >
> > More detail on the output, or a copy of the last 20 lines of the log file
> > might begin to help.
> >
> > Best,
> > Chris
> >
> >
> > On Tue, Mar 27, 2012 at 9:56 AM, Francesco Pietra <chiendarret_at_gmail.com
> >
> > wrote:
> >>
> >> Hi:
> >> I deleted previous mail, where I posted that that the binary linux 64
> >> cuda night build 2012-03-23 had solved problems of crashing in
> >> min/RAMD, which I encountered with the standard 2.9b1. Actually, in my
> >> hands, there are still problems, at least on running RAMD for longer
> >> than 300,000 steps. Repeatedly, the simulation halted, going to the
> >> linux prompt, without any error or warning message. Something that I
> >> had never seen before.
> >>
> >> Back to 2.8, I have carried out two RAMD simulations - with the same
> >> files - for longer than 600,000 steps. Each one closed regularly
> >>
> >> System Debian amd 64, two GPU-580. FF parm7.
> >>
> >> francesco pietra
> >>
> >
> >
> >
> > --
> > Chris Harrison, Ph.D.
> > NIH Center for Macromolecular Modeling and Bioinformatics
> > Theoretical and Computational Biophysics Group
> > Beckman Institute for Advanced Science and Technology
> > University of Illinois, 405 N. Mathews Ave., Urbana, IL 61801
> >
> > http://www.ks.uiuc.edu/Research/namd Voice: 773-570-6078
> > http://www.ks.uiuc.edu/~char Fax: 217-244-6078
> >
> >
> >
>

-- 
Chris Harrison, Ph.D.
NIH Center for Macromolecular Modeling and Bioinformatics
Theoretical and Computational Biophysics Group
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave., Urbana, IL 61801
http://www.ks.uiuc.edu/Research/namd       Voice: 773-570-6078
http://www.ks.uiuc.edu/~char                          Fax: 217-244-6078

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:22 CST