Re: Uneven sampling when using ABF

From: Aron Broom (broomsday_at_gmail.com)
Date: Tue Jul 31 2012 - 07:36:22 CDT

Next message: Ajasja Ljubetič: "peculiar scaling on the GPU"
Previous message: Norman Geist: "AW: AW: Low CPU usage with NAMD running on linux cluster"
In reply to: JC Gumbart: "Re: Uneven sampling when using ABF"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Jian,

I can now see your image. It looks like you have significant sampling
beyond the value of your boundaries, is that true? If that is the case, it
may just be that you've crossed those boundaries and have become stuck on
the other side, in which case you should just increase the value of those
walls.

Also, what you really should do in terms of setting a new fullSamples
value, is to calculate the autocorrelation function for each of your
reaction coordinates during the simulation. Then check and see when
(x-axis) that autcorrelation function drops below ~0.36. This value can be
useful, because if, for instance, it ends up being 5000 timesteps, then
even setting your fullSamples to 10k will mean that your biasing force is
based on a mere 2 unique samples. I'm not sure how many unique samples
you'd want for a reliable estimate of the force, but I'd think 100 would be
a reasonable minimum.

~Aron

On Mon, Jul 30, 2012 at 10:36 PM, JC Gumbart <gumbart_at_ks.uiuc.edu> wrote:

> Try increasing fullSamples to, say, 10k and decrease your width to 0.1.
> I've seen in some cases too large a width produce the behavior you observe.
>
> Ultimately, Aron is right that slowly converging orthogonal degrees of
> freedom could be a problem, as they are the bane of almost any free energy
> method. More fundamentally, the question is whether your chosen reaction
> coordinate is truly representative of the process that you want to
> understand. And unfortunately, for the transition between two
> conformations, which is quite complicated, the answer is quite possible
> not. Here's an example of a good paper (although I'm sure there are many)
> where they demonstrate the improper choice of RC leads to incorrect
> results: http://www.ncbi.nlm.nih.gov/pubmed/19462398
>
> You might also want to look into the string method for finding pathways
> between states in a multi-dimensional RC space.
>
> On Jul 30, 2012, at 8:54 PM, DAI, JIAN wrote:
>
> Hi, James and Aron:
> As can be seen from the definition for the CV below, the window is 2
> angstrom by 2 angstrom and the binsize is 0.2.
> I'm thinking of using smaller windows, but then I suspect that even with
> smaller windows, the sampling behavior will probably remain, i.e., with
> boundaries sampled better than the interior of the window.
> Does this have to do with the initial configurations? In this case, we
> used targeted MD to direct the protein to change conformation from closed
> state to the open state and in this trajectory we pick initial
> configurations for each window.
> Thanks.
> Jian
>
>
> Here goes my definition for the CV:
>
> colvarsTrajFrequency 1000
> colvarsRestartFrequency 1000
>
> colvar {
> name distance1
>
> width 0.2
>
> lowerboundary 7.5
> upperboundary 9.5
>
> lowerwallconstant 10.0
> upperwallconstant 10.0
>
> distance {
> group1 {
> atomNumbers {1870 1889 1903}
> }
> group2 {
> atomNumbers {2790 2801}
> }
> }
> }
>
> colvar {
> name distance2
>
> width 0.2
>
> lowerboundary 7.0
> upperboundary 9.0
>
> lowerwallconstant 10.0
> upperwallconstant 10.0
>
> distance {
> group1 {
> atomNumbers {152 169 186}
> }
> group2 {
> atomNumbers {3240 3250}
> }
> }
> }
>
> abf {
> name distance12abf
> colvars distance1 distance2
> fullSamples 500
> }
>
> As well as the config file:
>
> amber on
> parmfile ../../../01Prep/system.parm
> ambercoor ../../../01Prep/system.crd
> bincoordinates ../../../02Equil/md.1.coor
> extendedSystem ../../../02Equil/md.1.xsc
> set outputname md.1
> firsttimestep 0
> set temperature 310
> temperature $temperature
> exclude scaled1-4
> 1-4scaling 0.83333
> cutoff 12.
> switching off
> timestep 1.0 ;# 2fs/step (only if needed to finish quickly)
> rigidBonds water ;# needed for 2fs steps
> nonbondedFreq 2
> fullElectFrequency 4
> stepspercycle 20
> langevin on ;# do langevin dynamics
> langevinDamping 1 ;# damping coefficient (gamma) of 5/ps
> langevinTemp $temperature
> langevinHydrogen off ;# don't couple langevin bath to hydrogens
> wrapAll on
> PME yes
> PMEGridSpacing 1.0
> useGroupPressure yes ;# needed for rigidBonds
> useFlexibleCell no
> useConstantArea no
> langevinPiston on
> langevinPistonTarget 1.01325 ;# in bar -> 1 atm
> langevinPistonPeriod 100.
> langevinPistonDecay 50.
> langevinPistonTemp $temperature
> outputName $outputname
> restartfreq 1000 ;# 1000steps = every 1ps
> dcdfreq 5000
> xstFreq 1000
> outputEnergies 1000
> outputPressure 1000
> colvars on
> colvarsConfig ../colvars.tcl
> run 2000000
>
>
> ________________________________________
> From: Aron Broom [broomsday_at_gmail.com]
> Sent: Monday, July 30, 2012 9:33 PM
> To: JC Gumbart
> Cc: DAI, JIAN; namd-l_at_ks.uiuc.edu
> Subject: Re: namd-l: Uneven sampling when using ABF
>
> Yeah, I knew that's what fullSamples meant, but wouldn't the initial
> estimate from filling all those samples still give you a good estimate of
> the PMF?
>
> I had thought the accuracy only depended on how many samples you had for
> each window, presuming that you at least explore all the windows (i.e.
> reach fullSamples for each window). I had been thinking the only purpose
> for applying the biasing force was to make sure you explored past the
> initial local minimum in order to collect force samples over the whole
> reaction coordinate. Does the biasing force do something other than allow
> exploration?
>
> Still, if there are some unbiased slow degrees of freedom that are causing
> the force estimate to converge very slowly, I think you can only get a good
> result from ABF by having fullSamples high enough to capture those unbiased
> degrees of freedom.
>
> ~Aron
>
> On Mon, Jul 30, 2012 at 9:15 PM, JC Gumbart <gumbart_at_ks.uiuc.edu<
> mailto:gumbart_at_ks.uiuc.edu <gumbart_at_ks.uiuc.edu>>> wrote:
> I think you misunderstand what fullSamples means. The fullSamples
> parameter is how many steps per bin are required before the estimated
> biasing force is applied. While I agree that 500 is too small, you don't
> want it to be some absurdly large number either, because then you are not
> doing ABF, you're just doing pure Boltzmann sampling. If the initial
> estimate of the force is very bad, then it's possible that ABF can never
> correct for it, at least on a reasonable timescale. Try setting this
> parameter to, say, 10000.
>
> Also, what is the bin size you're using? Are there any kind of restraints
> being applied to the system?
>
> On Jul 30, 2012, at 6:10 PM, Aron Broom wrote:
>
> for whatever reason, the link to your image kept timing out for me.
>
> I have only very limited experience with ABF, but I think you are simply
> describing non-convergence, and the simplest solution is to sample for
> longer. That being said, if the force applied in certain areas is
> particularly far from the real average force over the whole ensemble, you
> could become trapped in a non-equilibrium condition for a VERY LONG time.
>
> In my experience the solution to this is to increase fullSamples. You say
> it's currently set at 500, so that is 500 timesteps, or 500 fs. And you
> say the dimensions are 2 angstroms, which means that the system should
> fully explore all the relevant microstates within that 2 angstrom window,
> within 500 fs. In my experience, if you are looking at something like
> protein-ligand binding, you have a decorrelation time on the order of
> thousands of fs in explicit solvent, and I would expect if you are looking
> at protein folding it would be even higher. So my conclusion would be that
> your current "fullSamples" is only capturing one statistically relevant
> unique sample per window, and therefore, is highly unreliable. I would
> expect you'd need to increase that by at least several orders of magnitude
> to see good results.
>
> Personally I stopped using ABF, because once you run into a problem like
> this, it seems like you more or less need to start over with a different
> fullSamples value. MetaDynamics may have similar problems, but if you use
> the default hill height and a deposit hills every 2000 fs or so, you can
> often get a rough idea, that can be refined later. For not wasting
> simulations, Umbrella sampling is possibly the best, and you can do it in
> 2D.
>
> Keep in mind that the protein folding problem has a tremendous number of
> degrees of freedom, so there are many that ABF is not biasing against,
> meaning that it should take a lot of simulation time to get good data.
> From what you say it sounds like you currently get 162 (9x9x2) ns of
> simulation time. Maybe something to try just a sanity check would be to
> redo the ABF, but set it up such that at least half that time (81ns) is the
> time to fill all the "fullSamples" (in this case fullSamples would be
> 1,000,000 or 1 ns), and see how much different your PMF is from what you
> currently have with 500 as fullSamples.
>
> ~Aron
>
> On Mon, Jul 30, 2012 at 1:38 PM, DAI, JIAN <jdai2_at_fsu.edu<
> mailto:jdai2_at_fsu.edu <jdai2_at_fsu.edu>>> wrote:
> Dear fellows:
> We are trying to construct a free energy landscape of a protein using two
> dimensional ABF calculation. The landscape is divided into 9 equal sized
> windows, each with a dimension of 2 Angstrom by 2 Angstrom, where each
> dimension describes the center of mass between two groups of atoms.
> Please see the figure by the link.
>
> http://s663.photobucket.com/albums/uu358/djpittdj/?action=view&current=count.png
> The figure shows the number of counts with respect to those two order
> parameters, and they are not evenly distributed in each window, as I would
> expect. Instead, in four windows on the left bottom corner, the counts are
> heavily clustered in their respective boundaries, indicated by those
> red/yellow regions. Does anybody knows why and can possibly offer us a
> solution?
> The fullSamples parameters is set to 500, and the landscape was obtained
> using abf_integrate after each window is run for 2 ns with a timestep of 1
> fs.
> Thanks a lot.
> Jian
>
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>
>
>
> --
> Aron Broom M.Sc
> PhD Student
> Department of Chemistry
> University of Waterloo
>
>
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

Next message: Ajasja Ljubetič: "peculiar scaling on the GPU"
Previous message: Norman Geist: "AW: AW: Low CPU usage with NAMD running on linux cluster"
In reply to: JC Gumbart: "Re: Uneven sampling when using ABF"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:52 CST