Re: How to interpret Thermodynamic Integration output

From: Brian Radak (brian.radak_at_gmail.com)
Date: Tue Feb 27 2018 - 13:00:48 CST

Next message: Francesco Pietra: "FEP minimum requirements"
Previous message: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
In reply to: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Next in thread: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Reply: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Unfortunately most of what you conveyed is not at all correct - I strongly
recommend you look at the extensive literature on thermodynamic
integration. The most recent systematic work I can think of is the
comparison paper by Paliwal and Shirts in JCTC, but several other more
"seminal" papers exist.

You only need 6 numbers from each simulation - the columns beginning with
AVG. These are simply the cumulative average over the course of the
simulation and the thing that you literally want to integrate. You can also
average the block average components or just average the instantaneous
values. I doubt that any of the noise between those methods is anywhere
near the overall standard error unless your simulation is quite short.

If you are using a staged integration scheme (i.e. alchBondLambdaEnd and/or
alchVdwLambdaEnd != 1 and/or alchElecLambdaStart != 0 - NB This is NOT the
default) then you have to consider multiple integrals over multiple
intervals (because the integral formally vanishes even though the integrand
does not).

If you are using pure linear coupling (NOT the default), then the integrand
is simply AVGBOND1 + AVGELECT1 + AVGVDW1 - (AVGBOND2 + AVGELECT2 + AVGVDW2).

If your lambda values are evenly spaced such that you have a fixed
"deltaLambda" AND you have collected data at your endpoints (i.e. 0 and 1),
then you can perform trapezoidal integration by taking a weighted sum of
the integrand (the first and last value get a weight of 1/2 and the rest
get a weight of 1) and then multiplying through by deltaLambda.

I would be VERY careful with any more complicated integration scheme - the
efficacy is extremely dependent on the shape of the integrand. Spline
interpolation can be especially abysmal if you have very few points and may
not be any better than trapezoidal integration IMO. Furthermore, there is
no way to assess the systematic error of any integration method by doing
anything other than adding more lambda values, but this also adds more
statistical error.

HTH,
BKR

On Tue, Feb 27, 2018 at 1:26 PM, Sadegh Faramarzi Ganjabad <
safaramarziganjabad_at_mix.wvu.edu> wrote:

> Brian and Josh,
>
> Thanks for your replies. As a rough analysis, I plotted the 'non-averaged'
> columns vs. timesteps. The plots look noisy for some energy components.
> Also, I multiplied each entry of the columns by 10 (since I sampled every
> 10 timesteps), and summed them up. I believe that is the same as
> integration; correct me if I'm wrong. Then I summed the sums of
> BOND+ELECT+VDW columns for each group (I guess that is the total DeltaG).
> The DeltaG is -10.9 kcal/mol for group 1 and -15.4 kcal/mol for group 2. I
> assume in an ideal case they should be equal. If the procedure I explained
> is correct, I wonder what is the best way to get rid of noisy data and
> having two DeltaG values close together. I assume increasing the number of
> lambdas form 16 to 30+ or increasing the length of lambdas from 20ns to
> 40ns may help.
>
> Thanks and sorry if some of my questions look basic,
> Sadegh
>
> On Sun, Feb 25, 2018 at 6:23 PM, Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov>
> wrote:
>
>> Hi Sadegh,
>>
>> To add to what Brian said, the way I tend to parse the output is in a
>> two-step process. In the first step, you are only looking to extract the
>> non-averaged columns of the output file, which may be a simple call to awk
>> or numpy.loadtxt, depending on your comfort level in bash or python. Taking
>> the non-averaged columns let you have some idea of how noisy your averaging
>> process needs to be, and lets you rigorously compute error estimates for
>> each point. From there, you have the value for dG/dlambda along your lambda
>> coordinate, which you can plot/integrate to get your final answer. I would,
>> however, NOT recommend a simple trapezoidal rule in general for the
>> integration. While it is normally ok for TI, if you ever decide to switch
>> to FEP, I find that a spline interpolation deals better with the spiky end
>> points I see on occasion, so learning how to integrate an interpolated
>> curve is a useful skill to have (see https://docs.scipy.org/doc/sci
>> py/reference/generated/scipy.interpolate.BSpline.html#
>> scipy.interpolate.BSpline).
>>
>> -Josh
>>
>> On 02/25/2018 11:52 AM, Brian Radak wrote:
>> Hi Sadegh,
>>
>> Yes, this is indeed a slightly undocumented update in the latest version
>> (though not much less documented than the old version).
>>
>> The derivatives are separated by alchemical groups (group 1 is scaled
>> with lambda and group 2 is scaled with 1 - lambda, these are specified in
>> the input PDB with 1 and -1).
>>
>> Note also that everything in the alchOutFile is an average. The normal
>> output is a local/block average over the last alchOutFreq steps and the AVG
>> columns are cumulative averages. There's really no good reason to have
>> alchOutFreq as small as you have it - it just hurts performance due to
>> frequent I/O.
>>
>> It is up to the user to decide what they want to do with the output. If
>> you are using simple trapezoidal integration, you would have to scale each
>> column according to the appropriate deltaLambda increment (which you have
>> to compute yourself).
>>
>> HTH,
>> BKR
>>
>> P.S. Note also that the instantaneous TI derivatives are now also sent to
>> stdout according to outputEnergies. You can safely suppress all output to
>> alchOutFile by setting alchOutFreq 0 and just use the stdout if you prefer
>> - I sincerely doubt you would be able to detect a difference in the
>> analysis of any reasonable calculation.
>>
>> On Fri, Feb 23, 2018 at 7:09 PM, Sadegh Faramarzi Ganjabad <
>> safaramarziganjabad_at_mix.wvu.edu<mailto:safaramarziganjabad_at_mix.wvu.edu>>
>> wrote:
>> Hello all,
>>
>> I have done a TI calculation, but can't find any tutorial or document
>> about how to interpret the output. Here is how my output file starts
>>
>> #TITITLE: TS BOND1 AVGBOND1 ELECT1
>> AVGELECT1 VDW1 AVGVDW1 BOND2 AVGBOND2
>> ELECT2 AVGELECT2 VDW2 AVGVDW2
>> #NEW TI WINDOW: LAMBDA 0
>> #PARTITION 1 SCALING: BOND 0 VDW 0 ELEC 0
>> #PARTITION 2 SCALING: BOND 1 VDW 1 ELEC 1
>> #CONSTANT TEMPERATURE: 310 K
>> TI: 0 19.7158 19.7158 -31.3316
>> -31.3316 -6.5855 -6.5855 12.4099 12.4099
>> -40.2905 -40.2905 21.8477 21.8477
>> TI: 10 19.4744 19.4964 -31.2402
>> -31.2486 -6.5801 -6.5806 10.7752 10.9238
>> -40.0014 -40.0277 18.3626 18.6795
>> TI: 20 24.9347 22.0861 -29.0597
>> -30.2062 -6.5748 -6.5778 10.0471 10.5063
>> -38.9418 -39.5106 12.0912 15.5422
>> TI: 30 32.2653 25.3697 -25.5063
>> -28.6901 -6.5840 -6.5798 9.2265 10.0935
>> -38.3273 -39.1289 10.4395 13.8962
>> TI: 40 42.3413 29.5091 -20.9719
>> -26.8077 -6.6005 -6.5848 9.9239 10.0521
>> -38.7601 -39.0389 13.5623 13.8147
>> TI: 50 58.5311 35.1997 -15.9560
>> -24.6799 -6.6079 -6.5894 11.9345 10.4212
>> -39.2660 -39.0835 12.5075 13.5584
>> TI: 60 80.3605 42.6031 -12.8759
>> -22.7448 -6.6090 -6.5926 10.8168 10.4861
>> -39.5107 -39.1535 6.7353 12.4399
>> TI: 70 102.0040 50.9694 -14.6704
>> -21.6076 -6.6234 -6.5969 10.6856 10.5142
>> -39.9275 -39.2625 5.4385 11.4538
>> TI: 80 119.1639 59.3885 -20.0657
>> -21.4172 -6.6514 -6.6037 12.3398 10.7396
>> -39.4442 -39.2849 8.3762 11.0738
>> TI: 90 130.3328 67.1846 -23.9356
>> -21.6940 -6.6710 -6.6111 9.7465 10.6304
>> -37.6627 -39.1067 11.6755 11.1399
>> #100 STEPS OF EQUILIBRATION AT LAMBDA 0 COMPLETED
>> #STARTING COLLECTION OF ENSEMBLE AVERAGE
>> TI: 100 127.3925 127.3925 -23.4962
>> -23.4962 -6.6485 -6.6485 9.9449 9.9449
>> -37.7934 -37.7934 9.3023 9.3023
>> TI: 110 123.5767 123.9236 -23.1205
>> -23.1546 -6.6329 -6.6343 10.3050 10.2723
>> -39.3857 -39.2410 6.6564 6.8969
>>
>> It seems like BOND1 and BOND2 etc. are for the initial and final values
>> of lambda, respectively. If so, is AVGBOND1+AVGELECT1+AVGVDW1 the total TI
>> energy for each timestep? then if I sum the total energies of all timesteps
>> I will get the total TI for the total transition from lambda=0 to lambda=1?
>>
>> Thanks in advance,
>> Sadegh
>>
>>
>>
>>
>

Next message: Francesco Pietra: "FEP minimum requirements"
Previous message: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
In reply to: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Next in thread: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Reply: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:53 CST