Re: How to integrate multiple walkers 2D metadynamics results?

From: Sebastian S (thecromicusproductions_at_gmail.com)
Date: Fri Nov 15 2019 - 12:34:57 CST

I tried in the same node and I'm getting the same errors. The funny thing
is that I can run 4 replicas without problems, but when I try 10 they start
failing

module load namd
mpirun -np 2 namd2 testres.rep1.namd > s1.0.log &
mpirun -np 2 namd2 testres.rep2.namd > s2.0.log &
mpirun -np 2 namd2 testres.rep3.namd > s3.0.log &
mpirun -np 2 namd2 testres.rep4.namd > s4.0.log &
mpirun -np 2 namd2 testres.rep5.namd > s5.0.log &
mpirun -np 2 namd2 testres.rep6.namd > s6.0.log &
mpirun -np 2 namd2 testres.rep7.namd > s7.0.log &
mpirun -np 2 namd2 testres.rep8.namd > s8.0.log &
mpirun -np 2 namd2 testres.rep9.namd > s9.0.log &
mpirun -np 2 namd2 testres.rep10.namd > s10.0.log &
wait

On Fri, Nov 15, 2019 at 12:54 PM Victor Kwan <vkwan8_at_uwo.ca> wrote:

> Try with running the 12 replicas on the same node to see if the problem
> relates to MPI?
>
> Victor
>
> On Fri, Nov 15, 2019 at 12:26 PM Canal de Sebassen <
> thecromicusproductions_at_gmail.com> wrote:
>
>> I have another question about these simulations. I started running some
>> yesterday and:
>>
>> 1) initially some walkers do not start at all. I get messages like
>> colvars: Metadynamics bias "metadynamics1": failed to read the file
>> "metadynamics1.rep1.files.txt": will try again after 10000 steps.
>> and in the same step the walker reads the other replicas and ends with
>> colvars: Metadynamics bias "metadynamics1": reading the state of
>> replica "rep1" from file "".
>> colvars: Error: in reading state configuration for "metadynamics" bias
>> "metadynamics1" at position -1 in stream.
>>
>> 2) others, they run for a while but then give me a message
>> colvars: Error: in reading state configuration for "metadynamics" bias
>> "metadynamics1" at position -1 in stream.
>> FATAL ERROR: Error in the collective variables module: exiting.
>>
>> 3) in the end, I only get 3 walkers to work, with the other 9 I sent left
>> for dead. I'm running these simulations in my local cluster, with the
>> following code
>>
>> #!/bin/bash
>> #$ -pe mpi-24 288 # Specify parallel environment and legal core size
>> #$ -q long # Specify queue
>> #$ -N Trial1 # Specify job name
>>
>> TASK=0
>> cat $PE_HOSTFILE | while read -r line; do
>> host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
>> echo $host >> hostfile
>> done
>> hostfile="./hostfile"
>> while IFS= read -r host
>> do
>> let "TASK+=1"
>> /usr/kerberos/bin/rsh -F $host -n "uname -a; echo $TASK; cd XXXXXXXXX;
>> pwd; module load namd; mpirun -np 24 namd2 testres.rep$TASK.namd >
>> s$TASK.0.log ; exit" &
>> done < $hostfile
>> wait
>> rm ./hostfile
>>
>> Am I doing something wrong? Currently the times of my colvars are
>> colvarsTrajFrequency 10000
>> metadynamics {
>> colvars d1 d2
>>
>> useGrids on
>> hillWeight 0.05
>> newHillFrequency 10000
>> dumpFreeEnergyFile on
>> dumpPartialFreeEnergyFile on
>> saveFreeEnergyFile on
>> writeHillsTrajectory on
>>
>> multipleReplicas yes
>> replicaID rep9
>> replicasRegistry replicas.registry.txt
>> replicaUpdateFrequency 10000
>>
>>
>> and my namd outputs are
>> numSteps 25000000
>> outputEnergies 10000
>> outputPressure 10000
>> outputTiming 10000
>> xstFreq 10000
>> dcdFreq 10000
>> restartFreq 10000
>>
>> Thanks,
>>
>> Sebastian
>>
>> On Sat, Nov 9, 2019 at 8:03 PM Canal de Sebassen <
>> thecromicusproductions_at_gmail.com> wrote:
>>
>>> Thanks for your reply, Giacomo. I'll take your suggestions into
>>> consideration when setting up the system.
>>>
>>> Regards,
>>>
>>> Sebastian
>>>
>>> On Thu, Nov 7, 2019 at 6:37 PM Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
>>> wrote:
>>>
>>>> Hi Canal, first of all try upgrading to the latest NAMD nightly build.
>>>> Thanks to Jim's help, I added extra checks that make the input/output
>>>> functionality more robust (the same checks are used when writing the NAMD
>>>> restart files):
>>>> https://github.com/Colvars/colvars/pull/276
>>>> There is also an important bugfix in the output of the PMF (the restart
>>>> files are fine):
>>>> https://github.com/Colvars/colvars/pull/259
>>>>
>>>> About the exchange rate, on modern hardware optimal performance is
>>>> around few milliseconds/step, so 1000 steps is kind of short for a full
>>>> cycle with all replicas reading each others' files. Best to increase it by
>>>> a factor of 10 or more: I would have made its default value the same of the
>>>> restart frequency, but there is no telling how long that would be for each
>>>> user's input.
>>>>
>>>> Regarding the PMFs, nothing special is needed. Each replica will write
>>>> PMFs with the same contents (the PMF extracted from the shared bias), so
>>>> they will be equal minus the fluctuations arising from synchronization.
>>>> You are probably confused by the partial output files, which are triggered
>>>> by dumpPartialFreeEnergyFile (a flag that is off by default).
>>>>
>>>> Lastly, Gaussians 0.01 kcal/mol high added every 100 steps is quite a
>>>> bit of bias, and will be further multiplied by the number of replicas.
>>>>
>>>> Giacomo
>>>>
>>>> On Thu, Nov 7, 2019 at 6:06 PM Canal de Sebassen <
>>>> thecromicusproductions_at_gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Say I run a metadynamics simulation with 10 walkers. I then get 10
>>>>> different pmf files. If my simulation was in 2D, how do I get a single
>>>>> energy landscape? Do I use abf_integrate?
>>>>>
>>>>> Also, what are some good practices when running these kind of
>>>>> simulations?
>>>>> I haven't found many examples. This is one my current colvars files. I
>>>>> plan to get about 1-5 microseconds of data. Is a replicaUpdateFrequency of
>>>>> 1000 too large? I tried with a smaller one but I get problems because some
>>>>> files of a replica cannot be found by another one (maybe due to lagging?).
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Sebastian
>>>>>
>>>>> colvarsTrajFrequency 100
>>>>>
>>>>> colvar {
>>>>>
>>>>> name d1
>>>>>
>>>>> outputAppliedForce on
>>>>> width 0.5
>>>>>
>>>>> lowerBoundary 0.0
>>>>> upperBoundary 30.0
>>>>>
>>>>> upperWallConstant 100.0
>>>>>
>>>>> distanceZ {
>>>>> forceNoPBC yes
>>>>> main {
>>>>> atomsFile labels.pdb
>>>>> atomsCol B
>>>>> atomsColValue 1.0
>>>>> }
>>>>> ref {
>>>>> atomsFile labels.pdb
>>>>> atomsCol B
>>>>> atomsColValue 2.0
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> colvar {
>>>>>
>>>>> name d2
>>>>>
>>>>> outputAppliedForce on
>>>>> width 1
>>>>>
>>>>> lowerBoundary 0.0
>>>>> upperBoundary 10.0
>>>>>
>>>>> upperWallConstant 100.0
>>>>>
>>>>> coordNum {
>>>>> cutoff 4.0
>>>>>
>>>>>
>>>>> group1 {
>>>>> atomsFile labels.pdb
>>>>> atomsCol O
>>>>> atomsColValue 1.0
>>>>> }
>>>>> group2 {
>>>>> atomsFile labels.pdb
>>>>> atomsCol B
>>>>> atomsColValue 2.0
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> metadynamics {
>>>>> colvars d1 d2
>>>>>
>>>>> useGrids on
>>>>> hillWeight 0.01
>>>>> newHillFrequency 100
>>>>> dumpFreeEnergyFile on
>>>>> dumpPartialFreeEnergyFile on
>>>>> saveFreeEnergyFile on
>>>>> writeHillsTrajectory on
>>>>>
>>>>> multipleReplicas yes
>>>>> replicaID rep1
>>>>> replicasRegistry replicas.registry.txt
>>>>> replicaUpdateFrequency 1000
>>>>> }
>>>>>
>>>>
>>>>
>>>> --
>>>> Giacomo Fiorin
>>>> Associate Professor of Research, Temple University, Philadelphia, PA
>>>> Research collaborator, National Institutes of Health, Bethesda, MD
>>>> http://goo.gl/Q3TBQU
>>>> https://github.com/giacomofiorin
>>>>
>>>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:21:01 CST