Re: Uneven sampling when using ABF

From: Aron Broom (broomsday_at_gmail.com)
Date: Mon Jul 30 2012 - 18:10:03 CDT

for whatever reason, the link to your image kept timing out for me.

I have only very limited experience with ABF, but I think you are simply
describing non-convergence, and the simplest solution is to sample for
longer. That being said, if the force applied in certain areas is
particularly far from the real average force over the whole ensemble, you
could become trapped in a non-equilibrium condition for a VERY LONG time.

In my experience the solution to this is to increase fullSamples. You say
it's currently set at 500, so that is 500 timesteps, or 500 fs. And you
say the dimensions are 2 angstroms, which means that the system should
fully explore all the relevant microstates within that 2 angstrom window,
within 500 fs. In my experience, if you are looking at something like
protein-ligand binding, you have a decorrelation time on the order of
thousands of fs in explicit solvent, and I would expect if you are looking
at protein folding it would be even higher. So my conclusion would be that
your current "fullSamples" is only capturing one statistically relevant
unique sample per window, and therefore, is highly unreliable. I would
expect you'd need to increase that by at least several orders of magnitude
to see good results.

Personally I stopped using ABF, because once you run into a problem like
this, it seems like you more or less need to start over with a different
fullSamples value. MetaDynamics may have similar problems, but if you use
the default hill height and a deposit hills every 2000 fs or so, you can
often get a rough idea, that can be refined later. For not wasting
simulations, Umbrella sampling is possibly the best, and you can do it in
2D.

Keep in mind that the protein folding problem has a tremendous number of
degrees of freedom, so there are many that ABF is not biasing against,
meaning that it should take a lot of simulation time to get good data.
>From what you say it sounds like you currently get 162 (9x9x2) ns of
simulation time. Maybe something to try just a sanity check would be to
redo the ABF, but set it up such that at least half that time (81ns) is the
time to fill all the "fullSamples" (in this case fullSamples would be
1,000,000 or 1 ns), and see how much different your PMF is from what you
currently have with 500 as fullSamples.

~Aron

On Mon, Jul 30, 2012 at 1:38 PM, DAI, JIAN <jdai2_at_fsu.edu> wrote:

> Dear fellows:
> We are trying to construct a free energy landscape of a protein using two
> dimensional ABF calculation. The landscape is divided into 9 equal sized
> windows, each with a dimension of 2 Angstrom by 2 Angstrom, where each
> dimension describes the center of mass between two groups of atoms.
> Please see the figure by the link.
>
> http://s663.photobucket.com/albums/uu358/djpittdj/?action=view&current=count.png
> The figure shows the number of counts with respect to those two order
> parameters, and they are not evenly distributed in each window, as I would
> expect. Instead, in four windows on the left bottom corner, the counts are
> heavily clustered in their respective boundaries, indicated by those
> red/yellow regions. Does anybody knows why and can possibly offer us a
> solution?
> The fullSamples parameters is set to 500, and the landscape was obtained
> using abf_integrate after each window is run for 2 ns with a timestep of 1
> fs.
> Thanks a lot.
> Jian
>
>
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:51 CST