Re: Error Analysis in Metadynamics

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Wed Apr 28 2021 - 10:34:17 CDT

Hi Diship, sorry, I missed this later one.

There is no fundamentally different scenario when increasing dimensionality
(1D, 2D, 3D, ...) other than higher dimensionalities usually mean that more
bins need to be sampled on average. But the error analysis is done on each
grid point individually, regardless of how many dimensions the grid has.

With well-tempered metadynamics specifically, it gets a bit trickier. One
may still consider different trajectory segments statistically independent
as in standard metadynamics (provided that the CVs are "good" i.e. they
decorrelate themselves quickly and fully). But because of the WT's scale
factor, the statistical weight of later trajectory frames will be on
average much lower. The tutorial that Miro linked earlier presumes
(without saying it explicitly) that the trajectory being examined is not
very different from standard metadynamics, where all segments have equal
weight. In practice, you have to accept the potential risk that by using a
well-tempered scaling, the later stages of a simulation will always count
much less than the earlier ones and may make it appear as if the PMF had
converged better than it actually did. Even if interesting stuff is still
going on.

This complication is somewhat unique to metadynamics, which unlike other
methods (umbrella sampling, ABF, etc) doesn't converge asymptotically by
design, so you basically are choosing between averaging the incremental
PMFs of standard metadynamics yourselves, or use well-tempered to slow the
bias down. Or, as you saw, people even use both approaches concurrently.

What I really suggest is using a clear and well-defined procedure to
quantify the error bars that you mean to plot. But rather than go very
sophisticated with the methodology, I feel it much more important to
critically examine possible ways that your choices of CVs may be less than
ideal. All error analyses that only look at the bias or at the PMF are
implicitly assuming that the CVs are ideal. Showing that you actually
looked in detail at the atomic trajectory along the simulation will
increase the confidence that your purely-statistical error may actually be
used as "experimental uncertainty".

Giacomo

On Sun, Apr 25, 2021 at 1:39 PM Diship Srivastava <
dishipsrivastava_at_gmail.com> wrote:

> Thanks for the answer. I have one more question - since I am doing 2D well
> tempered metadynamics will the error analysis be done for both cv's
> separately or is there a method to combine the error analysis for both cv
> into one?
>
>
>
>
> On Sat, 24 Apr 2021, 05:22 Giacomo Fiorin, <giacomo.fiorin_at_gmail.com>
> wrote:
>
>> Hi René, I think it all depends on how expensive is the system to
>> simulate, how complex the PMF (e.g. dimensionality, number of minima) and
>> most importantly, the intended use of the PMF: qualitative insight, or
>> actual quantification/measurement.
>>
>> If the goal is to estimate the relative probabilities of different
>> states, the relevant scale is dictated by the thermal energy, i.e. ~0.6
>> kcal/mol in most biological applications. An estimated PMF error of 1
>> kcal/mol, after exponentiating, means that the corresponding probability
>> can very easily be *a factor of 5 larger or smaller*. On the other
>> hand, 0.1 kcal/mol means that the relative error on the corresponding
>> probability is just ~15%.
>>
>> Giacomo
>>
>>
>> On Fri, Apr 23, 2021 at 12:40 PM René Hafner TUK <
>> hamburge_at_physik.uni-kl.de> wrote:
>>
>>> Dear Giacomo,
>>>
>>> a question upon this:
>>>
>>> What would you call "long enough for today's review standards":
>>>
>>> Having convergence over serveral runs in the PMF within <1kcal/mol
>>> or rather on the order of (maybe a few) 0.1 kcal/mol?
>>>
>>> Kind regards
>>>
>>> René
>>> On 4/23/2021 5:57 PM, Giacomo Fiorin wrote:
>>>
>>> Hi Diship, it highly depends on what error analysis you plan on doing.
>>> The approach that I and others use with metadynamics has very few
>>> assumptions: after estimating that the systems has explored all relevant
>>> states at least once, start collecting the PMF at regular intervals and
>>> simulate for additional time such that all states are visited a few more
>>> times, i.e. a few more layers of Gaussian hills are added on top of the
>>> converged PMF. Then just compute the average and SD of the free-energy
>>> over the sample of multiple PMFs, which are only text files, so you can do
>>> the following:
>>>
>>> all_pmfs = np.zeros(shape=(n_files, n_points))
>>> for i_file in range(n_files):
>>> _, pmf = np.loadtxt(pmf_file_names[i_file], unpack=True)
>>> all_pmfs[i_file,:] = pmf[:]
>>> pmf_mean = all_pmfs.mean(axis=0)
>>> pmf_SD = all_pmfs.std(axis=0)
>>>
>>> The tutorial that Miro has linked uses block averages, where the
>>> additional difference is that the length of the blocks is also varied to
>>> converge the SD. You should see for yourself that the dependence of the SD
>>> on the block length is rather slow (in theory, it should follow a square
>>> root).
>>>
>>> Having said all that, here you're not simulating something trivial like
>>> dialanine isomerization: any meaningful error estimate will require that
>>> the CV has already visited all states repeatedly during the simulation. If
>>> you can see such trajectory, then you're in good shape and any PMF method
>>> including metadynamics will give you unbiased results. If you ran long
>>> enough for today's review standards, your statistical error bars will be
>>> comparable or smaller than the size of the marker used in the plot :-) You
>>> ought to be able to explain how you computed the statistical error bars,
>>> but these ought to be very small or downright negligible.
>>>
>>> On the other hand, if the CV is not a good reaction coordinate any error
>>> analysis will be uninformative: you need to analyze the atomic trajectory
>>> and show that at any given frame the structural properties (e.g.
>>> coordination with water or lipids) depend only on the value of the CV, but
>>> not on the simulation history. Any inconsistency that you may see there
>>> may not easily be solved with longer simulation times.
>>>
>>> In short, statistical sampling is less of an issue than it used to be
>>> (the most common PMF methods are now at least two decades old!), but *picking
>>> a good CV* has always been "the big deal", where your insight into the
>>> specific problem matters the most.
>>>
>>> Giacomo
>>>
>>> On Tue, Apr 20, 2021 at 3:37 AM Miro Astore <miro.astore_at_gmail.com>
>>> wrote:
>>>
>>>> Hi Diship, you should be able to use the code/techniques in this
>>>> tutorial. Good luck.
>>>>
>>>> https://urldefense.com/v3/__https://www.plumed.org/doc-v2.5/user-doc/html/trieste-4.html__;!!DZ3fjg!qu2_rhPJjAUJMkx_8Wa9l_4t6T1jpu4YBlzw8sa8T06WzeT_KcZjWLqFPyLR63eLzQ$
>>>> <https://urldefense.com/v3/__https://www.plumed.org/doc-v2.5/user-doc/html/trieste-4.html__;!!DZ3fjg!sp8OPR83ZO7SNR9X9YYdw-9CIV1DXCkkqKUva3mWJ5o6ou3SWvonbwVzjmecWx3n9w$>
>>>>
>>>> On Tue, Apr 20, 2021 at 4:41 PM Diship Srivastava <
>>>> dishipsrivastava_at_gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> I have done well tempered metadynamics for a system consisting of a
>>>>> molecule insertion into a bilayer using the colvars module. I am
>>>>> interested in doing an error analysis for obtained free energy
>>>>> preferably using colvars.
>>>>> Any help would be most appreciated.
>>>>>
>>>>> Thanks in advance
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Diship Srivastava
>>>>> JRF
>>>>> Department of Chemistry
>>>>> IIT(ISM) - Dhanbad
>>>>> India
>>>>>
>>>>
>>>>
>>>> --
>>>> Miro A. Astore (he/him)
>>>> PhD Candidate | Computational Biophysics
>>>> Office 434 A28 School of Physics
>>>> University of Sydney
>>>>
>>> --
>>> --
>>> Dipl.-Phys. René Hafner
>>> TU Kaiserslautern
>>> Germany
>>>
>>>

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST