Re: Running multiple-replicas metadynamics

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Wed Nov 08 2017 - 16:50:17 CST

Each replica finishes writing output when the number of steps requested is
completed. NAMD should print "End of program".

"run 0" means that you run for 0 steps, meaning you read the necessary
input restart files and you write the output files corresponding to the
same configuration.

Giacomo

On Wed, Nov 8, 2017 at 5:05 PM, Prapasiri Pongprayoon <fsciprpo_at_ku.ac.th>
wrote:

> Hi Giacomo,
>
> Given the messages, I would try to let each replica finish writing its
> output files.
>
> How do I know when each replica finish writing output? Do you mean that I
> need to increase “replicaUpdatefrequency”?
>
> A "run 0" on each replica should be sufficient.
>
> Could you please explain a little bit more what is "run 0”?
>
> Also, could you explain why this error happens? Is this due to the error
> of communications between replicas?
>
> I have tried +replicas as Joshua’s suggestion, but the error still
> persists.
>
> Thanks,
> Prapasiri
>
> On Nov 9, 2560 BE, at 4:49 AM, Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
> wrote:
>
> Given the messages, I would try to let each replica finish writing its
> output files. A "run 0" on each replica should be sufficient. To minimize
> the risk of further errors, set the number of steps so that each replica
> finishes its run before the allotted walltime by the queue system.
>
> If you use Joshua's suggestion of launching a bundled job, this should be
> taken care of automatically.
>
> Giacomo
>
> On Wed, Nov 8, 2017 at 4:37 PM, Prapasiri Pongprayoon <fsciprpo_at_ku.ac.th>
> wrote:
>
>> Hi Giacomo,
>>
>> Thanks so much for your reply.
>> I saw the msg below:
>>
>> colvars: Metadynamics bias "meta_3": reading the state of replica
>> "/scratch/g15/pp8244/metadynam
>> ics/3/well-tempered/new/rep3/multi_rep3.restart.colvars.state" from file
>> "".
>> colvars: Reading from file "" failed or incomplete: will try again in
>> 1000 steps.
>> colvars: WARNING: in metadynamics bias "meta_3" failed to read
>> completely the output of replica
>> "/scratch/g15/pp8244/metadynamics/3/well-tempered/new/rep3/
>> multi_rep3.restart.colvars.state" after
>> more than 704000 steps. Ensure that it is still running.
>> colvars: WARNING: in metadynamics bias "meta_3" failed to read
>> completely the output of replica
>> "/scratch/g15/pp8244/metadynamics/3/well-tempered/new/rep3/.colvars.meta_3.hills"
>> after more than
>> 525772144270990136 steps. Ensure that it is still running.
>> colvars: Metadynamics bias "meta_3": reading the state of replica "5"
>> from file "/scratch/g15/pp
>> 8244/metadynamics/3/well-tempered/new/rep5/.colvars.meta_3.5.state".
>> colvars: Error: failed to read all of the grid points from file.
>> Possible explanations: grid para
>> meters in the configuration (lowerBoundary, upperBoundary, width) are
>> different from those in the
>> file, or the file is corrupt/incomplete.
>> colvars: No such file or directory
>> colvars: If this error message is unclear, try recompiling with
>> -DCOLVARS_DEBUG.
>> FATAL ERROR: Error in the collective variables module: No such file or
>> directory
>> [0] Stack Traceback:
>> [0:0] _Z8NAMD_errPKc+0xe4 [0x20239d44]
>> [0:1] _ZN16colvarproxy_namd5errorERKSs+0x1aa [0x207c429a]
>> [0:2] _ZN18colvar_grid_scalar12read_restartERSi+0x299 [0x20783629]
>>
>> Based on the output, do you have any idea why the program read state&hill
>> files from both "multi_rep3.restart.colvars.state” and "
>> .colvars.meta_3.5.state/.colvars.meta_3.5.hills.traj/.colva
>> rs.meta_3.hills”? The first one is the output name set by me, but the
>> latter are automatically generated from a program. Also, any reason why
>> the program have to generate another set of files?
>> I have checked the file "multi_rep3.restart.colvars.state”, but nothing
>> is written in there. The no. of steps shown are also odd in
>> .colvar.meta_3.* files.
>>
>> I used NAMD2.12.
>>
>> Any helps would be very appreciated.
>>
>> Thanks for the link. I will have a look.
>>
>> Regards,
>> Prapasiri
>>
>> On Nov 8, 2560 BE, at 6:58 PM, Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
>> wrote:
>>
>> Hi Prapasiri, did the jobs exit with an error before the set walltime or
>> is that the last error message you see? You should also see the message:
>> Metadynamics bias "XX": failed to read the file "YY": will try again
>> after "ZZ" steps.
>> If you don't see any occurrences of that error, please let me know the
>> NAMD version that you are using.
>>
>> For as long as the communication remains file-based, you shouldn't need
>> to run the replicas at exactly the same time. The main guideline is not to
>> keep them off sync too long, otherwise replicas that are idle see to much
>> biasing energy appear.
>>
>> Regarding the boundary issue, check whether PBC wrapping may be a problem.
>> http://colvars.github.io/colvars-refman-namd/colvars-refman-
>> namd.html#sec:colvar_atom_groups_wrapping
>>
>>
>> On Mon, Nov 6, 2017 at 5:35 PM, Prapasiri Pongprayoon <fsciprpo_at_ku.ac.th>
>> wrote:
>>
>>> Hi Josh and Giacomo,
>>>
>>> Thanks for your kind help.
>>> I have set up the run, but there is an error. I have 5 replicas, they
>>> didn’t run at the same time due to the queuing system. 3’re running while 2
>>> ‘re waiting in the queue. Those 3 could run for few hours and died at the
>>> end with the error below (the rest 2 jobs are still in the queue).
>>>
>>> colvars: Error: failed to read all of the grid points from file.
>>> Possible explanations: grid parameters in the
>>> configuration (lowerBoundary, upperBoundary, width) are different from
>>> those in the file, or the file is corrupt
>>> /incomplete.
>>>
>>> I searched online, but still don’t find the solution.
>>>
>>> My system is drug-membrane protein system (want to observe a drug
>>> permeation). Except initial positions of drug and replicaID, the rest are
>>> the same. All boundaries are the same in all replicas.
>>>
>>> I have a few questions to ask:
>>> 1.Did the jobs die because they couldn’t communicate to each other? All
>>> inputs work fine if I run normal well-tempered metadynamics.
>>> 2.Does this mean that all replicas have to run nearly at the same time?
>>> If so, is there any way that I can solve the problem if I can’t get all
>>> replicas run at the same time?
>>>
>>> I still have a problem with the metadynamics and need your help for
>>> better understanding. This work was run before I moved to multi-walker
>>> metadynamics.
>>> Since I want to observe the drug transport, I set up 2 colvars
>>> (orientation angle and z-distance along a pore axis) for metadynamics. The
>>> lower and upper boundaries are obtained physically from pdb file (with
>>> Upper/Lower WallConstant = 5kcal/mol). While running normal metadynamics (I
>>> got 2 metadynamics work for 2 drugs), I observed that, one of my system,
>>> the drug translocated out of the upper boundaries. I suspected that this
>>> was due to too low force constant so I increased Upper/LowerWallConstant
>>> to 10 and 20 kcal/mol, but the problem still persists. Do you have any
>>> idea why this happens?
>>>
>>> Any advice you give me would be appreciated.
>>>
>>> Regards,
>>> Prapasiri
>>>
>>>
>>>
>>> On Nov 5, 2560 BE, at 11:17 PM, Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
>>> wrote:
>>>
>>> Hi Prapasiri, your overall understanding (1-6) is correct. The
>>> file-based muliple-replicas infrastructure is designed to work as an
>>> extension of the single-replica workflow, by adding the options replicaID
>>> and optionally replicaRegistry.
>>>
>>> Regarding your two questions:
>>> 1. The replicas will communicate through files, the paths of which are
>>> maintained in the registry file. For this reason, you should run the
>>> simulations on a fast file system, preferably not NFS (contact your
>>> sysadmins to know what kind of shared filesystem is there between nodes).
>>> 2. The replicas *should* explore their own space, but this is a
>>> condition that you have to check for (e.g. checking that each explores a
>>> different energy basin).
>>>
>>> If you want to run a quick test, you can try the example input files at:
>>> https://github.com/Colvars/colvars/tree/master/namd/tests/li
>>> brary/011_multiple_walker_mtd
>>>
>>> Giacomo
>>>
>>>
>>> On Fri, Nov 3, 2017 at 7:30 PM, Prapasiri Pongprayoon <fsciprpo_at_ku.ac.th
>>> > wrote:
>>>
>>>> Hi All,
>>>>
>>>> I’m new to NAMD and need your valuable help. I ‘m now doing
>>>> metadynamics of drug translocation through membrane protein. The
>>>> simulations go well, but I suspect that it will require large CPU time for
>>>> my drug to explore all configuration space. So, I decide to move to the
>>>> multiple-replicas metadynamics. Based on the manual, it seems that I need
>>>> to:
>>>>
>>>> 1. turn on “multipleReplicas”
>>>> 2. Add replicaID, replicasRegistry, replicasUpdateFrequency, and
>>>> dumpPartialFreeEnergyFile
>>>>
>>>> After going through the manual, I still don’t understand the process
>>>> clearly. So, it would be very appreciated, if you could explain how to set
>>>> up the run.
>>>>
>>>> From my understanding: (Pls correct me if it’s wrong)
>>>> 1. To run multiple-replicas meta dynamics, I need a number of pdbs with
>>>> different drug’s positions. Each system is called “replica” where each is
>>>> defined as replicaID. 5 systems in their individual folder.
>>>> 2. I still need to have a single file (put in replicasRegistry)
>>>> containing the paths of “colvar.state" and “hill.traj” files that will be
>>>> generated when all five start running.
>>>> 3. If I have 5 systems = 5 replicas (everything is the same except
>>>> position of drug in each system), I need to run them separately with its
>>>> own .conf and colvar files. The only difference among them are “replicaID”.
>>>> Is this correct?
>>>> 4. When all are run, they will talk to each other via “colvar.state"
>>>> and “hill.traj” files defined in “replicasRegistry”. The file just has
>>>> lines showing the location of state and hill files.
>>>> 5. 5 runs will generate their own outputs and .pmf, but the pmf
>>>> obtained from each replica is generated by combining data among 5 replicas.
>>>> So, 5 pmfs from 5 replicas are generated, but there are the same. Is this
>>>> correct? For the partial.pmf, does this file reflect the influence of
>>>> individual run on the overall pmf file?
>>>> 6. To restart the runs, I just add “colvarsInput input.colvars.state"
>>>> in .conf of all five.
>>>>
>>>> These are what I understand from the manual and NAMD list.
>>>>
>>>> If these are correct, I still have some questions
>>>> 1. If I have 5 replicas, I have to run 5 replicas independently. How do
>>>> they communicate if they don’t start running at the same time?
>>>> 2. Based on the recipe above, does it mean that each replica explores
>>>> its own configuration space and then the data obtained from each replica
>>>> will be combined and used to get the overall pmf?
>>>>
>>>> Is there any tutorial for multiple-replicas metadynamics that I can go
>>>> through?
>>>>
>>>> Thanks for your help and patience in advance.
>>>>
>>>> Regards,
>>>> Prapasiri
>>>>
>>>>
>>>
>>>
>>> --
>>> Giacomo Fiorin
>>> Associate Professor of Research, Temple University, Philadelphia, PA
>>> Contractor, National Institutes of Health, Bethesda, MD
>>> http://goo.gl/Q3TBQU
>>> https://github.com/giacomofiorin
>>>
>>>
>>>
>>
>>
>> --
>> Giacomo Fiorin
>> Associate Professor of Research, Temple University, Philadelphia, PA
>> Contractor, National Institutes of Health, Bethesda, MD
>> http://goo.gl/Q3TBQU
>> https://github.com/giacomofiorin
>>
>>
>>
>
>
> --
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Contractor, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU
> https://github.com/giacomofiorin
>
>
>

-- 
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:41 CST