Re: Running multiple-replicas metadynamics

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Wed Nov 08 2017 - 15:49:54 CST

Given the messages, I would try to let each replica finish writing its
output files. A "run 0" on each replica should be sufficient. To minimize
the risk of further errors, set the number of steps so that each replica
finishes its run before the allotted walltime by the queue system.

If you use Joshua's suggestion of launching a bundled job, this should be
taken care of automatically.

Giacomo

On Wed, Nov 8, 2017 at 4:37 PM, Prapasiri Pongprayoon <fsciprpo_at_ku.ac.th>
wrote:

> Hi Giacomo,
>
> Thanks so much for your reply.
> I saw the msg below:
>
> colvars: Metadynamics bias "meta_3": reading the state of replica
> "/scratch/g15/pp8244/metadynam
> ics/3/well-tempered/new/rep3/multi_rep3.restart.colvars.state" from file
> "".
> colvars: Reading from file "" failed or incomplete: will try again in
> 1000 steps.
> colvars: WARNING: in metadynamics bias "meta_3" failed to read
> completely the output of replica
> "/scratch/g15/pp8244/metadynamics/3/well-tempered/
> new/rep3/multi_rep3.restart.colvars.state" after
> more than 704000 steps. Ensure that it is still running.
> colvars: WARNING: in metadynamics bias "meta_3" failed to read
> completely the output of replica
> "/scratch/g15/pp8244/metadynamics/3/well-tempered/
> new/rep3/.colvars.meta_3.hills" after more than
> 525772144270990136 steps. Ensure that it is still running.
> colvars: Metadynamics bias "meta_3": reading the state of replica "5"
> from file "/scratch/g15/pp
> 8244/metadynamics/3/well-tempered/new/rep5/.colvars.meta_3.5.state".
> colvars: Error: failed to read all of the grid points from file. Possible
> explanations: grid para
> meters in the configuration (lowerBoundary, upperBoundary, width) are
> different from those in the
> file, or the file is corrupt/incomplete.
> colvars: No such file or directory
> colvars: If this error message is unclear, try recompiling with
> -DCOLVARS_DEBUG.
> FATAL ERROR: Error in the collective variables module: No such file or
> directory
> [0] Stack Traceback:
> [0:0] _Z8NAMD_errPKc+0xe4 [0x20239d44]
> [0:1] _ZN16colvarproxy_namd5errorERKSs+0x1aa [0x207c429a]
> [0:2] _ZN18colvar_grid_scalar12read_restartERSi+0x299 [0x20783629]
>
> Based on the output, do you have any idea why the program read state&hill
> files from both "multi_rep3.restart.colvars.state” and "
> .colvars.meta_3.5.state/.colvars.meta_3.5.hills.traj/.colvars.meta_3.hills”?
> The first one is the output name set by me, but the latter are
> automatically generated from a program. Also, any reason why the program
> have to generate another set of files?
> I have checked the file "multi_rep3.restart.colvars.state”, but nothing
> is written in there. The no. of steps shown are also odd in
> .colvar.meta_3.* files.
>
> I used NAMD2.12.
>
> Any helps would be very appreciated.
>
> Thanks for the link. I will have a look.
>
> Regards,
> Prapasiri
>
> On Nov 8, 2560 BE, at 6:58 PM, Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
> wrote:
>
> Hi Prapasiri, did the jobs exit with an error before the set walltime or
> is that the last error message you see? You should also see the message:
> Metadynamics bias "XX": failed to read the file "YY": will try again after
> "ZZ" steps.
> If you don't see any occurrences of that error, please let me know the
> NAMD version that you are using.
>
> For as long as the communication remains file-based, you shouldn't need to
> run the replicas at exactly the same time. The main guideline is not to
> keep them off sync too long, otherwise replicas that are idle see to much
> biasing energy appear.
>
> Regarding the boundary issue, check whether PBC wrapping may be a problem--001a1146bb9867b413055d7fad3a--

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:41 CST