From: Sebastian S (thecromicusproductions_at_gmail.com)
Date: Fri Nov 15 2019 - 18:29:23 CST
Dear Dr. Fiorin,
Thanks for your explanation. Adding "sleep" actually solved this problem. I
was running a test, now I increased the output parameters significantly.
I have a question about the colvars parameters. In your experience, would
the following choice be appropriate for a simulation where I study the
binding
of a protein to a membrane? I'm hoping to run a couple microseconds in
total (probably about 150 ns per walker, with 12 walkers)
colvarsTrajFrequency 100000
hillWeight 0.05
newHillFrequency 10000
replicaUpdateFrequency 100000
Thanks,
Sebastian
On Fri, Nov 15, 2019 at 2:12 PM Giacomo Fiorin <giacomo.fiorin_at_gmail.com>
wrote:
> Hi Sebastian, with 2.13 keep in mind that the PMF written will count the
> local replica twice. To get the correct one, you can just download the
> precompiled nightly build and run it for zero steps on one processor.
>
> As for launching all replicas simultaneously, this will make I/O issues
> much more fragile. Adding a "sleep 5s" command between launching two
> replicas could help.
>
> If you can't use a more recent version of NAMD, consider increasing
> replicaUpdateFrequency even further, but only that. I definitely did not
> recommend changing all the output parameters to have the same value...
> Making everything written every 10000 steps will stress the file system
> more without need.
>
> Lastly, because you are running on 2 processors for each replicas, why
> don't you just download yourself the "multicore" nightly build and use
> that? The "multicore" version is not MPI capable, but it can definitely
> use up efficiently all the processors on each node. You just need to ask
> the sysadmins for the best way to launch each replica (i.e. each copy of
> NAMD) on a different node, and you won't need the added complication of
> figuring out the correct MPI options.
>
> When it comes to MPI, NAMD can be built to use it quite efficiently but it
> also becomes tightly integrated with your cluster setup, the details of
> which we can't help you with. But pretty much everyone on this list is
> familiar with the "multicore" build, which runs over multiple processors of
> a single node and is independent of MPI implementation or inter-node
> network.
>
> Giacomo
>
>
> On Fri, Nov 15, 2019 at 1:41 PM Sebastian S <
> thecromicusproductions_at_gmail.com> wrote:
>
>> By the way, I'm using version 2.13, as the administrators of my network
>> haven't installed the new one yet
>>
>> On Fri, Nov 15, 2019 at 1:34 PM Sebastian S <
>> thecromicusproductions_at_gmail.com> wrote:
>>
>>> I tried in the same node and I'm getting the same errors. The funny
>>> thing is that I can run 4 replicas without problems, but when I try 10 they
>>> start failing
>>>
>>> module load namd
>>> mpirun -np 2 namd2 testres.rep1.namd > s1.0.log &
>>> mpirun -np 2 namd2 testres.rep2.namd > s2.0.log &
>>> mpirun -np 2 namd2 testres.rep3.namd > s3.0.log &
>>> mpirun -np 2 namd2 testres.rep4.namd > s4.0.log &
>>> mpirun -np 2 namd2 testres.rep5.namd > s5.0.log &
>>> mpirun -np 2 namd2 testres.rep6.namd > s6.0.log &
>>> mpirun -np 2 namd2 testres.rep7.namd > s7.0.log &
>>> mpirun -np 2 namd2 testres.rep8.namd > s8.0.log &
>>> mpirun -np 2 namd2 testres.rep9.namd > s9.0.log &
>>> mpirun -np 2 namd2 testres.rep10.namd > s10.0.log &
>>> wait
>>>
>>>
>>>
>>> On Fri, Nov 15, 2019 at 12:54 PM Victor Kwan <vkwan8_at_uwo.ca> wrote:
>>>
>>>> Try with running the 12 replicas on the same node to see if the problem
>>>> relates to MPI?
>>>>
>>>> Victor
>>>>
>>>> On Fri, Nov 15, 2019 at 12:26 PM Canal de Sebassen <
>>>> thecromicusproductions_at_gmail.com> wrote:
>>>>
>>>>> I have another question about these simulations. I started running
>>>>> some yesterday and:
>>>>>
>>>>> 1) initially some walkers do not start at all. I get messages like
>>>>> colvars: Metadynamics bias "metadynamics1": failed to read the file
>>>>> "metadynamics1.rep1.files.txt": will try again after 10000 steps.
>>>>> and in the same step the walker reads the other replicas and ends with
>>>>> colvars: Metadynamics bias "metadynamics1": reading the state of
>>>>> replica "rep1" from file "".
>>>>> colvars: Error: in reading state configuration for "metadynamics" bias
>>>>> "metadynamics1" at position -1 in stream.
>>>>>
>>>>> 2) others, they run for a while but then give me a message
>>>>> colvars: Error: in reading state configuration for "metadynamics" bias
>>>>> "metadynamics1" at position -1 in stream.
>>>>> FATAL ERROR: Error in the collective variables module: exiting.
>>>>>
>>>>> 3) in the end, I only get 3 walkers to work, with the other 9 I sent
>>>>> left for dead. I'm running these simulations in my local cluster, with the
>>>>> following code
>>>>>
>>>>> #!/bin/bash
>>>>> #$ -pe mpi-24 288 # Specify parallel environment and legal core size
>>>>> #$ -q long # Specify queue
>>>>> #$ -N Trial1 # Specify job name
>>>>>
>>>>> TASK=0
>>>>> cat $PE_HOSTFILE | while read -r line; do
>>>>> host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
>>>>> echo $host >> hostfile
>>>>> done
>>>>> hostfile="./hostfile"
>>>>> while IFS= read -r host
>>>>> do
>>>>> let "TASK+=1"
>>>>> /usr/kerberos/bin/rsh -F $host -n "uname -a; echo $TASK; cd
>>>>> XXXXXXXXX; pwd; module load namd; mpirun -np 24 namd2
>>>>> testres.rep$TASK.namd > s$TASK.0.log ; exit" &
>>>>> done < $hostfile
>>>>> wait
>>>>> rm ./hostfile
>>>>>
>>>>> Am I doing something wrong? Currently the times of my colvars are
>>>>> colvarsTrajFrequency 10000
>>>>> metadynamics {
>>>>> colvars d1 d2
>>>>>
>>>>> useGrids on
>>>>> hillWeight 0.05
>>>>> newHillFrequency 10000
>>>>> dumpFreeEnergyFile on
>>>>> dumpPartialFreeEnergyFile on
>>>>> saveFreeEnergyFile on
>>>>> writeHillsTrajectory on
>>>>>
>>>>> multipleReplicas yes
>>>>> replicaID rep9
>>>>> replicasRegistry replicas.registry.txt
>>>>> replicaUpdateFrequency 10000
>>>>>
>>>>>
>>>>> and my namd outputs are
>>>>> numSteps 25000000
>>>>> outputEnergies 10000
>>>>> outputPressure 10000
>>>>> outputTiming 10000
>>>>> xstFreq 10000
>>>>> dcdFreq 10000
>>>>> restartFreq 10000
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Sebastian
>>>>>
>>>>> On Sat, Nov 9, 2019 at 8:03 PM Canal de Sebassen <
>>>>> thecromicusproductions_at_gmail.com> wrote:
>>>>>
>>>>>> Thanks for your reply, Giacomo. I'll take your suggestions into
>>>>>> consideration when setting up the system.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Sebastian
>>>>>>
>>>>>> On Thu, Nov 7, 2019 at 6:37 PM Giacomo Fiorin <
>>>>>> giacomo.fiorin_at_gmail.com> wrote:
>>>>>>
>>>>>>> Hi Canal, first of all try upgrading to the latest NAMD nightly
>>>>>>> build. Thanks to Jim's help, I added extra checks that make the
>>>>>>> input/output functionality more robust (the same checks are used when
>>>>>>> writing the NAMD restart files):
>>>>>>> https://github.com/Colvars/colvars/pull/276
>>>>>>> There is also an important bugfix in the output of the PMF (the
>>>>>>> restart files are fine):
>>>>>>> https://github.com/Colvars/colvars/pull/259
>>>>>>>
>>>>>>> About the exchange rate, on modern hardware optimal performance is
>>>>>>> around few milliseconds/step, so 1000 steps is kind of short for a full
>>>>>>> cycle with all replicas reading each others' files. Best to increase it by
>>>>>>> a factor of 10 or more: I would have made its default value the same of the
>>>>>>> restart frequency, but there is no telling how long that would be for each
>>>>>>> user's input.
>>>>>>>
>>>>>>> Regarding the PMFs, nothing special is needed. Each replica will
>>>>>>> write PMFs with the same contents (the PMF extracted from the shared bias),
>>>>>>> so they will be equal minus the fluctuations arising from synchronization.
>>>>>>> You are probably confused by the partial output files, which are triggered
>>>>>>> by dumpPartialFreeEnergyFile (a flag that is off by default).
>>>>>>>
>>>>>>> Lastly, Gaussians 0.01 kcal/mol high added every 100 steps is quite
>>>>>>> a bit of bias, and will be further multiplied by the number of replicas.
>>>>>>>
>>>>>>> Giacomo
>>>>>>>
>>>>>>> On Thu, Nov 7, 2019 at 6:06 PM Canal de Sebassen <
>>>>>>> thecromicusproductions_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Say I run a metadynamics simulation with 10 walkers. I then get 10
>>>>>>>> different pmf files. If my simulation was in 2D, how do I get a
>>>>>>>> single
>>>>>>>> energy landscape? Do I use abf_integrate?
>>>>>>>>
>>>>>>>> Also, what are some good practices when running these kind of
>>>>>>>> simulations?
>>>>>>>> I haven't found many examples. This is one my current colvars
>>>>>>>> files. I plan to get about 1-5 microseconds of data. Is a
>>>>>>>> replicaUpdateFrequency of 1000 too large? I tried with a smaller one but I
>>>>>>>> get problems because some files of a replica cannot be found by another one
>>>>>>>> (maybe due to lagging?).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Sebastian
>>>>>>>>
>>>>>>>> colvarsTrajFrequency 100
>>>>>>>>
>>>>>>>> colvar {
>>>>>>>>
>>>>>>>> name d1
>>>>>>>>
>>>>>>>> outputAppliedForce on
>>>>>>>> width 0.5
>>>>>>>>
>>>>>>>> lowerBoundary 0.0
>>>>>>>> upperBoundary 30.0
>>>>>>>>
>>>>>>>> upperWallConstant 100.0
>>>>>>>>
>>>>>>>> distanceZ {
>>>>>>>> forceNoPBC yes
>>>>>>>> main {
>>>>>>>> atomsFile labels.pdb
>>>>>>>> atomsCol B
>>>>>>>> atomsColValue 1.0
>>>>>>>> }
>>>>>>>> ref {
>>>>>>>> atomsFile labels.pdb
>>>>>>>> atomsCol B
>>>>>>>> atomsColValue 2.0
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> colvar {
>>>>>>>>
>>>>>>>> name d2
>>>>>>>>
>>>>>>>> outputAppliedForce on
>>>>>>>> width 1
>>>>>>>>
>>>>>>>> lowerBoundary 0.0
>>>>>>>> upperBoundary 10.0
>>>>>>>>
>>>>>>>> upperWallConstant 100.0
>>>>>>>>
>>>>>>>> coordNum {
>>>>>>>> cutoff 4.0
>>>>>>>>
>>>>>>>>
>>>>>>>> group1 {
>>>>>>>> atomsFile labels.pdb
>>>>>>>> atomsCol O
>>>>>>>> atomsColValue 1.0
>>>>>>>> }
>>>>>>>> group2 {
>>>>>>>> atomsFile labels.pdb
>>>>>>>> atomsCol B
>>>>>>>> atomsColValue 2.0
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> metadynamics {
>>>>>>>> colvars d1 d2
>>>>>>>>
>>>>>>>> useGrids on
>>>>>>>> hillWeight 0.01
>>>>>>>> newHillFrequency 100
>>>>>>>> dumpFreeEnergyFile on
>>>>>>>> dumpPartialFreeEnergyFile on
>>>>>>>> saveFreeEnergyFile on
>>>>>>>> writeHillsTrajectory on
>>>>>>>>
>>>>>>>> multipleReplicas yes
>>>>>>>> replicaID rep1
>>>>>>>> replicasRegistry replicas.registry.txt
>>>>>>>> replicaUpdateFrequency 1000
>>>>>>>> }
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Giacomo Fiorin
>>>>>>> Associate Professor of Research, Temple University, Philadelphia, PA
>>>>>>> Research collaborator, National Institutes of Health, Bethesda, MD
>>>>>>> http://goo.gl/Q3TBQU
>>>>>>> https://github.com/giacomofiorin
>>>>>>>
>>>>>>
>
> --
> Giacomo Fiorin
> Associate Professor of Research, Temple University, Philadelphia, PA
> Research collaborator, National Institutes of Health, Bethesda, MD
> http://goo.gl/Q3TBQU
> https://github.com/giacomofiorin
>
This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:12 CST