From: Brian Bennion (brian_at_youkai.llnl.gov)
Date: Wed Nov 03 2004 - 13:05:56 CST
Hello Charles,
A little background...
Namd requires charm++ to compile correctly, so the natural order is that
charm++ is compiled first and then namd is compiled against it.  The fact
that namd runs at all on your system would suggest that charm++ has been
compiled at some point.
I am not familiar with the sun sparc setup, but charmrun maybe used here
to propagate the job through the nodes.
Can anyone comment here?
Steps that I can recommend....
Try just minimizing the protein alone in vacuo for 200+ steps?
Are you sure that the total system is 58,236 atoms?  That seems small for
such a complex box.
Can you send the whole log file from startup to crash?
There just might not be enough memory?  But I would think that this would
manifest itself earlier.
Thanks
Brian
On Wed, 3 Nov 2004, Charles Danko wrote:
> Hi,
>
> Thanks to Brian and Dr. Valencia for their help.
>
> The machines are a cluster of Sun SPARC 64 bit processors running
> Solaris 7.  I am using bsub for multithreading.  My administrator may
> let me run charm+ if you think that it may solve the problem, but
> there may be some good reason that it wasn't used before (namd was
> compiled by a colleague of mine, and I am not sure of specific issues
> he faced when putting it together).
>
> The system is a protein, lipid, and water system, in total, 58,236
> atoms constructed from a protein homology model.  The system was
> assembled using VMD, the membrane 1.0 plug-in, and solvate 1.2 (to
> solvate the top and bottom where the protein was sticking out of the
> pre-equilibrated lipid-water system constructed by membrane).  I
> deleted all atoms within 1A of the protein and am now trying to
> minimize the system.
>
> Based on Dr. Valencia and Dr. Bennion's suggestions I changed the
> script file.  I adapted the one intended to heat the system after the
> minimization.  I have included the new script file as an attachment.
> The run still crashes after 199 steps, but this time it returns a
> malloc error.  Short by 2GB?
> The last part of the output is pasted below.  Many of the forces are
> positive again.
>
> I have tried to fix the protein and minimize the water/lipids; the
> output is pasted below.  The system lasted for 299 steps this time,
> but received the same malloc error.
>
> I have NOT deleted the atoms which fall outside of my periodic
> boundary.  If you recommend I will do this and try to run the new
> script again.  I am acting under the assumption that these atoms will
> be ignored.
> Is this coorect?
>
> Because the problem seems to be a memory allocation error, I am
> thinking that the next step will to be trying to convince my
> administrator to compile charm+.
> Any thoughts or suggestions?
> Do I need to recompile all of namd, or can I just compile charm+ without it?
>
> Thanks again for all of the help,
> Charles
>
> Output files:
>
> New script, no atoms fixed.
>
> BRACKET: 6.57916e-07 652.946 -2.45009e+09 -8.45313e+07 9.29531e+08
> ENERGY:     198    522579.9239    151303.8494     10858.5910      1446.8211
>    -80557.5403    481695.6698         0.0000         0.0000         0.0000
>   1087327.3149         0.0000   1087327.3149   1087327.3149         0.0000
>    188642.4062    235104.9289    576000.0000    188642.4062    235104.9289
>
> BRACKET: 1.6835e-07 70.1294 -8.45313e+07 1.17645e+07 9.29531e+08
> ENERGY:     199    522585.2964    151303.6975     10858.5915      1446.8152
>    -80557.4059    481690.3089         0.0000         0.0000         0.0000
>   1087327.3036         0.0000   1087327.3036   1087327.3036         0.0000
>    188639.2161    235101.2267    576000.0000    188639.2161    235101.2267
>
> LDB:  LOAD: AVG 231.478 MAX 291.895  MSGS: TOTAL 184 MAXC 20 MAXP 5  None
> LDB:  LOAD: AVG 231.478 MAX 255.756  MSGS: TOTAL 184 MAXC 20 MAXP 5  Alg7
> LDB:  LOAD: AVG 231.478 MAX 236.106  MSGS: TOTAL 184 MAXC 20 MAXP 5  Alg7
> Could not malloc() 2118274080 bytes--are we out of memory?Fatal error, aborting.
> Rtasks fail:
> Rtask(s) 1 : exited with signal <6>
> Rtask(s) 3 2 4 5 8 6 7 10 9 : exited with signal <15>
> Rtask(s) 1  : coredump
> >
>
> New Script, Fixed Protein
>
> BRACKET: 1.64649e-05 26875.6 -8.15248e+09 -2.11699e+09 7.56124e+09
> ENERGY:     298    246811.3244    127002.1868      7801.5138       776.0334
>   -110553.6799    343151.2350         0.0000         0.0000         0.0000
>    614988.6135         0.0000    614988.6135    614988.6135         0.0000
>    156657.9925    177933.1586    576000.0000    156657.9925    177933.1586
>
> BRACKET: 8.23246e-06 12090.3 -2.11699e+09 -9.70546e+08 7.56124e+09
> ENERGY:     299    245766.2529    126976.5313      7802.2252       775.5543
>   -110592.3262    343704.2276         0.0000         0.0000         0.0000
>    614432.4651         0.0000    614432.4651    614432.4651         0.0000
>    156870.7170    178776.2517    576000.0000    156870.7170    178776.2517
>
> LDB:  LOAD: AVG 212.831 MAX 217.851  MSGS: TOTAL 184 MAXC 20 MAXP 5  None
> LDB:  LOAD: AVG 212.831 MAX 216.577  MSGS: TOTAL 184 MAXC 20 MAXP 5  Refine
> Could not malloc()--are we out of memory?Fatal error, aborting.
> Rtasks fail:
> Rtask(s) 1 : exited with signal <6>
> Rtask(s) 3 2 4 5 6 8 7 9 10 : exited with signal <15>
> Rtask(s) 1  : coredump
> >
>
>
>
>
> On Tue, 02 Nov 2004 13:22:50 -0600 (CST), J. Valencia
> <jonathan_at_ibt.unam.mx> wrote:
> >   Also, for par_all27_prot_lipid.prm the suggested cutoff scheme is:
> > switchdist      10.0
> > cutoff          12.0
> > pairlistdist    14.0
> > This is stated almost at the end of the file.
> >
> > Good luck!
> >
> > J. Valencia.
> >
>
*****************************************************************
**Brian Bennion, Ph.D.                                         **
**Computational and Systems Biology Division                   **
**Biology and Biotechnology Research Program                   **
**Lawrence Livermore National Laboratory                       **
**P.O. Box 808, L-448    bennion1_at_llnl.gov                     **
**7000 East Avenue       phone: (925) 422-5722                 **
**Livermore, CA  94550   fax:   (925) 424-6605                 **
*****************************************************************
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:38:58 CST