Re: Running multiple-replicas metadynamics

From: Prapasiri Pongprayoon (fsciprpo_at_ku.ac.th)
Date: Wed Nov 08 2017 - 15:37:51 CST

Hi Giacomo,

Thanks so much for your reply.
I saw the msg below:

colvars: Metadynamics bias "meta_3": reading the state of replica "/scratch/g15/pp8244/metadynam
ics/3/well-tempered/new/rep3/multi_rep3.restart.colvars.state" from file "".
colvars: Reading from file "" failed or incomplete: will try again in 1000 steps.
colvars: WARNING: in metadynamics bias "meta_3" failed to read completely the output of replica
"/scratch/g15/pp8244/metadynamics/3/well-tempered/new/rep3/multi_rep3.restart.colvars.state" after
 more than 704000 steps. Ensure that it is still running.
colvars: WARNING: in metadynamics bias "meta_3" failed to read completely the output of replica
"/scratch/g15/pp8244/metadynamics/3/well-tempered/new/rep3/.colvars.meta_3.hills" after more than
525772144270990136 steps. Ensure that it is still running.
colvars: Metadynamics bias "meta_3": reading the state of replica "5" from file "/scratch/g15/pp
8244/metadynamics/3/well-tempered/new/rep5/.colvars.meta_3.5.state".
colvars: Error: failed to read all of the grid points from file. Possible explanations: grid para
meters in the configuration (lowerBoundary, upperBoundary, width) are different from those in the
file, or the file is corrupt/incomplete.
colvars: No such file or directory
colvars: If this error message is unclear, try recompiling with -DCOLVARS_DEBUG.
FATAL ERROR: Error in the collective variables module: No such file or directory
[0] Stack Traceback:
  [0:0] _Z8NAMD_errPKc+0xe4 [0x20239d44]
  [0:1] _ZN16colvarproxy_namd5errorERKSs+0x1aa [0x207c429a]
  [0:2] _ZN18colvar_grid_scalar12read_restartERSi+0x299 [0x20783629]

Based on the output, do you have any idea why the program read state&hill files from both "multi_rep3.restart.colvars.state” and ".colvars.meta_3.5.state/.colvars.meta_3.5.hills.traj/.colvars.meta_3.hills”? The first one is the output name set by me, but the latter are automatically generated from a program. Also, any reason why the program have to generate another set of files?
I have checked the file "multi_rep3.restart.colvars.state”, but nothing is written in there. The no. of steps shown are also odd in .colvar.meta_3.* files.

I used NAMD2.12.

Any helps would be very appreciated.

Thanks for the link. I will have a look.

Regards,
Prapasiri

> On Nov 8, 2560 BE, at 6:58 PM, Giacomo Fiorin <giacomo.fiorin_at_gmail.com> wrote:
>
> Hi Prapasiri, did the jobs exit with an error before the set walltime or is that the last error message you see? You should also see the message:
> Metadynamics bias "XX": failed to read the file "YY": will try again after "ZZ" steps.
> If you don't see any occurrences of that error, please let me know the NAMD version that you are using.
>
> For as long as the communication remains file-based, you shouldn't need to run the replicas at exactly the same time. The main guideline is not to keep them off sync too long, otherwise replicas that are idle see to much biasing energy appear.
>
> Regarding the boundary issue, check whether PBC wrapping may be a problem.
> http://colvars.github.io/colvars-refman-namd/colvars-refman-namd.html#sec:colvar_atom_groups_wrapping <http://colvars.github.io/colvars-refman-namd/colvars-refman-namd.html#sec:colvar_atom_groups_wrapping>
>
>
> On Mon, Nov 6, 2017 at 5:35 PM, Prapasiri Pongprayoon <fsciprpo_at_ku.ac.th <mailto:fsciprpo_at_ku.ac.th>> wrote:
> Hi Josh and Giacomo,
>
> Thanks for your kind help.
> I have set up the run, but there is an error. I have 5 replicas, they didn’t run at the same time due to the queuing system. 3’re running while 2 ‘re waiting in the queue. Those 3 could run for few hours and died at the end with the error below (the rest 2 jobs are still in the queue).
>
> colvars: Error: failed to read all of the grid points from file. Possible explanations: grid parameters in the
> configuration (lowerBoundary, upperBoundary, width) are different from those in the file, or the file is corrupt
> /incomplete.
>
> I searched online, but still don’t find the solution.
>
> My system is drug-membrane protein system (want to observe a drug permeation). Except initial positions of drug and replicaID, the rest are the same. All boundaries are the same in all replicas.
>
> I have a few questions to ask:
> 1.Did the jobs die because they couldn’t communicate to each other? All inputs work fine if I run normal well-tempered metadynamics.
> 2.Does this mean that all replicas have to run nearly at the same time? If so, is there any way that I can solve the problem if I can’t get all replicas run at the same time?
>
> I still have a problem with the metadynamics and need your help for better understanding. This work was run before I moved to multi-walker metadynamics.
> Since I want to observe the drug transport, I set up 2 colvars (orientation angle and z-distance along a pore axis) for metadynamics. The lower and upper boundaries are obtained physically from pdb file (with Upper/Lower WallConstant = 5kcal/mol). While running normal metadynamics (I got 2 metadynamics work for 2 drugs), I observed that, one of my system, the drug translocated out of the upper boundaries. I suspected that this was due to too low force constant so I increased Upper/LowerWallConstant to 10 and 20 kcal/mol, but the problem still persists. Do you have any idea why this happens?
>
> Any advice you give me would be appreciated.
>
> Regards,
> Prapasiri
>
>
>
>> On Nov 5, 2560 BE, at 11:17 PM, Giacomo Fiorin <giacomo.fiorin_at_gmail.com <mailto:giacomo.fiorin_at_gmail.com>> wrote:
>>
>> Hi Prapasiri, your overall understanding (1-6) is correct. The file-based muliple-replicas infrastructure is designed to work as an extension of the single-replica workflow, by adding the options replicaID and optionally replicaRegistry.
>>
>> Regarding your two questions:
>> 1. The replicas will communicate through files, the paths of which are maintained in the registry file. For this reason, you should run the simulations on a fast file system, preferably not NFS (contact your sysadmins to know what kind of shared filesystem is there between nodes).
>> 2. The replicas should explore their own space, but this is a condition that you have to check for (e.g. checking that each explores a different energy basin).
>>
>> If you want to run a quick test, you can try the example input files at:
>> https://github.com/Colvars/colvars/tree/master/namd/tests/library/011_multiple_walker_mtd <https://github.com/Colvars/colvars/tree/master/namd/tests/library/011_multiple_walker_mtd>
>>
>> Giacomo
>>
>>
>> On Fri, Nov 3, 2017 at 7:30 PM, Prapasiri Pongprayoon <fsciprpo_at_ku.ac.th <mailto:fsciprpo_at_ku.ac.th>> wrote:
>> Hi All,
>>
>> I’m new to NAMD and need your valuable help. I ‘m now doing metadynamics of drug translocation through membrane protein. The simulations go well, but I suspect that it will require large CPU time for my drug to explore all configuration space. So, I decide to move to the multiple-replicas metadynamics. Based on the manual, it seems that I need to:
>>
>> 1. turn on “multipleReplicas”
>> 2. Add replicaID, replicasRegistry, replicasUpdateFrequency, and dumpPartialFreeEnergyFile
>>
>> After going through the manual, I still don’t understand the process clearly. So, it would be very appreciated, if you could explain how to set up the run.
>>
>> From my understanding: (Pls correct me if it’s wrong)
>> 1. To run multiple-replicas meta dynamics, I need a number of pdbs with different drug’s positions. Each system is called “replica” where each is defined as replicaID. 5 systems in their individual folder.
>> 2. I still need to have a single file (put in replicasRegistry) containing the paths of “colvar.state" and “hill.traj” files that will be generated when all five start running.
>> 3. If I have 5 systems = 5 replicas (everything is the same except position of drug in each system), I need to run them separately with its own .conf and colvar files. The only difference among them are “replicaID”. Is this correct?
>> 4. When all are run, they will talk to each other via “colvar.state" and “hill.traj” files defined in “replicasRegistry”. The file just has lines showing the location of state and hill files.
>> 5. 5 runs will generate their own outputs and .pmf, but the pmf obtained from each replica is generated by combining data among 5 replicas. So, 5 pmfs from 5 replicas are generated, but there are the same. Is this correct? For the partial.pmf, does this file reflect the influence of individual run on the overall pmf file?
>> 6. To restart the runs, I just add “colvarsInput input.colvars.state" in .conf of all five.
>>
>> These are what I understand from the manual and NAMD list.
>>
>> If these are correct, I still have some questions
>> 1. If I have 5 replicas, I have to run 5 replicas independently. How do they communicate if they don’t start running at the same time?
>> 2. Based on the recipe above, does it mean that each replica explores its own configuration space and then the data obtained from each replica will be combined and used to get the overall pmf?
>>
>> Is there any tutorial for multiple-replicas metadynamics that I can go through?
>>
>> Thanks for your help and patience in advance.
>>
>> Regards,
>> Prapasiri
>>
>>
>>
>>
>> --
>> Giacomo Fiorin
>> Associate Professor of Research, Temple University, Philadelphia, PA
>> Contractor, National Institutes of Health, Bethesda, MD
>> http://goo.gl/Q3TBQU <http://goo.gl/Q3TBQU>
>> https://github.com/giacomofiorin <https://github.com/giacomofiorin>
>
>
>
>
> --
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Contractor, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU <http://goo.gl/Q3TBQU>
> https://github.com/giacomofiorin <https://github.com/giacomofiorin>

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:46 CST