Re: NAMD 2.6 - PME Error on Large Protein-DNA System!

From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Thu Feb 28 2008 - 14:27:20 CST

Hi Sean,
thanks for the extra information. I'm not seeing any of the usual red
flags you'd expect; it sounds like your system is well equilibrated.

Sean Law wrote:
>
> I'm not sure if this answers your questions but from the results
> obtained from my previous tests, the boxsize/cell size appears to have
> equilibrated after the first 20 ps and the run does restart and is
> capable of going for a while (approximately 95 ps) before hitting the
> same error. Finally, is there any correlation between using the
> "+netpoll" option and having a successful run?
In principle there shouldn't be. The netpoll option affects how
interprocessor communication is managed, and thus should only be
expected to affect speed, not simulation stability. Do you see a
consistent connection between using netpoll and having your simulation
go longer before crashing? If it does, that makes me suspect some sort
of underlying instability in your hardware/software setup (honestly,
that's what I'm starting to suspect anyway, given that there doesn't
appear to be any particular reason for the error in this case). Have you
ever tried a sustained namd run on whatever cluster you're using, or do
you know anyone who has?

Best,
Peter
 

> Thank you for your time.
>
> Sean
> Michigan State University
>
>
> > Date: Wed, 27 Feb 2008 23:59:41 -0600
> > From: petefred_at_ks.uiuc.edu
> > To: magicmen_at_hotmail.com
> > CC: namd-l_at_ks.uiuc.edu; slaw_at_msu.edu; feig_at_msu.edu
> > Subject: Re: namd-l: NAMD 2.6 - PME Error on Large Protein-DNA System!
> >
> > Hi Sean,
> >
> > >
> > >
> > > I did some digging and tried the following (while supplying the SAME
> > > random seed to keep things consistent):
> > Please note that if you're doing langevin dynamics in parallel, this
> > still doesn't guarantee identical results because of non-determinism in
> > communication ordering and the way the NAMD rng works. There have
> been a
> > few threads in the past 6 months on namd-l on this subject, if you're
> > interested.
> > >
> > >
> > > I would really appreciate if anybody could help explain this
> behaviour
> > > and welcome any comments, questions, or feedback that would help
> solve
> > > this problem!
> > >
> > Given the troubleshooting you've already tried (and thank you for
> noting
> > what you've done!), the next likely culprit is changes in your periodic
> > cell size that are too large. I'd recommend checking two things:
> > -see how much your periodic cell size changes over the course of the run
> > -if you restart your simulation from the restart files saved before the
> > crash, does it run fine or crash again?
> >
> > This should (hopefully) help to find whatever is going on here.
> >
> > Best,
> > Peter
> >
>
>
>
>
> ------------------------------------------------------------------------

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:49:18 CST