Re: How to interpret Thermodynamic Integration output

From: Brian Radak (brian.radak_at_gmail.com)
Date: Thu Apr 19 2018 - 09:15:32 CDT

Next message: Stefano Guglielmo: "torsion scan"
Previous message: Suresh Ghimire: "variation of pressure"
In reply to: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Next in thread: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Reply: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

It's hard to say. One is inclined to believe that the force along the
coupling coordinate (lambda) is smooth and continuous, but I can't recall a
proof that says this must be true. It almost certainly depends on the path.

Running simulations at lambda = 0 is a fraught proposition. It has nothing
to do with TI or FEP per se, but rather the fact that you are asking MD,
which samples phase space according to the force, to sample without forces
(since the ligand is completely decoupled). This will almost assuredly
yield nonsense unless you provide additional restraints - I assume you are
doing this?

On Wed, Apr 18, 2018, 3:58 PM Sadegh Faramarzi Ganjabad <
safaramarziganjabad_at_mix.wvu.edu> wrote:

> Brian,
>
> Thanks for explaining. I ran the extra lambda values between lambda = 0
> and 0.125. Now the TI values drop more smoothly, but they are generally
> lower than the original TI. I assume that is because the original
> simulation wasn't long enough (?).
>
> Sadegh Faramarzi,
> Research Assistant
> West Virginia University, Department of Chemistry
> Email:safaramarziganjabad_at_mix.wvu.edu
>
> On Fri, Mar 23, 2018 at 11:30 AM, Brian Radak <brian.radak_at_gmail.com>
> wrote:
>
>>
>>
>> On Thu, Mar 22, 2018 at 3:13 PM, Sadegh Faramarzi Ganjabad <
>> safaramarziganjabad_at_mix.wvu.edu> wrote:
>>
>>> Brian,
>>>
>>> I read that paper, although they discuss the theory rather than the
>>> procedure to calculate TI energies from the output data. Anyways, I simply
>>> plotted 'AVGBOND1 + AVGELECT1 + AVGVDW1 - (AVGBOND2 + AVGELECT2 + AVGVDW2)'
>>> vs. lambda values. The plot looks smooth for each lambda values, except for
>>> some noises when moving from one lambda to the next one. Here are the
>>> averages and errors of the cumulative averages for all 16 lambda values
>>>
>>> 76.9±2.0 , 24.0±1.5 , 20.6± 1.7, 20.4± 1.8 , 22.3± 1.8 ,
>>> 21.3± 2.6 , 21.5± 0.9 , 19.3± 1.0 , 16.6± 0.9 , 8.2± 0.9 ,
>>> -6.0± 1.5 , -21.2± 2.3 , -15.5± 2.6 , -32.2± 1.7 , -21.1±
>>> 1.8 , -32.5± 1.6
>>>
>>> As seen from data, from lambda = 0.0625 to 0.125 there is a sharp
>>> decrease in TI values (from 76.9 to 24.0). I was wondering
>>>
>>> 1. What would be possible reasons for the discontinuity near lambda=
>>> [0-0.125]
>>>
>>> There is generally a large derivative when an atom is almost entirely
>> decoupled, especially for Lennard-Jones interactions. Using a non-zero
>> alchVdwShiftCoeff (the default is 5.0 A**2) should help mitigate this.
>> Otherwise there is no way around this.
>>
>> Really the problem here is sampling and has nothing to do with TI. The
>> derivative has some dependence on lambda, but for very small values lambda
>> has very little effect on the dynamics (at 0 it's just free diffusion + any
>> bonded terms you might still have). This is a general problem in alchemical
>> simulations and there is no general solution as of yet.
>>
>>
>>> 2. How can I resolve it? if I am thinking of going back and running some
>>> extra lambda values in between 0 and 0.125.
>>>
>>> This is recommended.
>>
>>
>>> 3. As seen from the data above, for lambda > 0.5 I get negative values
>>> for TI. So I can't calculate the logarithm for those values like this plot
>>> on the manual
>>>
>>> http://www.ks.uiuc.edu/Research/namd/2.10/ug/node64.html
>>>
>>> should I consider the absolute values of TI?
>>>
>>> I have no idea why they chose a log plot other than for easier
>> visualization. I would just use a logscale ("set logscale y" if you are
>> using gnuplot).
>>
>>
>>> 4. Regardless of the integration method, the area below the plot
>>> obtained from the data above will give me DeltaG of the whole process. Is
>>> that right?
>>>
>>> By the Fundamental Theorem of Calculus, the integral of the mean
>> derivative over an interval will give you the free energy change on that
>> interval (presumably 0 to 1).
>>
>>
>>> Thanks,
>>> Sadegh Faramarzi,
>>> Research Assistant
>>> West Virginia University, Department of Chemistry
>>> Email:safaramarziganjabad_at_mix.wvu.edu
>>>
>>> On Tue, Feb 27, 2018 at 2:00 PM, Brian Radak <brian.radak_at_gmail.com>
>>> wrote:
>>>
>>>> Unfortunately most of what you conveyed is not at all correct - I
>>>> strongly recommend you look at the extensive literature on thermodynamic
>>>> integration. The most recent systematic work I can think of is the
>>>> comparison paper by Paliwal and Shirts in JCTC, but several other more
>>>> "seminal" papers exist.
>>>>
>>>> You only need 6 numbers from each simulation - the columns beginning
>>>> with AVG. These are simply the cumulative average over the course of the
>>>> simulation and the thing that you literally want to integrate. You can also
>>>> average the block average components or just average the instantaneous
>>>> values. I doubt that any of the noise between those methods is anywhere
>>>> near the overall standard error unless your simulation is quite short.
>>>>
>>>> If you are using a staged integration scheme (i.e. alchBondLambdaEnd
>>>> and/or alchVdwLambdaEnd != 1 and/or alchElecLambdaStart != 0 - NB This is
>>>> NOT the default) then you have to consider multiple integrals over multiple
>>>> intervals (because the integral formally vanishes even though the integrand
>>>> does not).
>>>>
>>>> If you are using pure linear coupling (NOT the default), then the
>>>> integrand is simply AVGBOND1 + AVGELECT1 + AVGVDW1 - (AVGBOND2 + AVGELECT2
>>>> + AVGVDW2).
>>>>
>>>> If your lambda values are evenly spaced such that you have a fixed
>>>> "deltaLambda" AND you have collected data at your endpoints (i.e. 0 and 1),
>>>> then you can perform trapezoidal integration by taking a weighted sum of
>>>> the integrand (the first and last value get a weight of 1/2 and the rest
>>>> get a weight of 1) and then multiplying through by deltaLambda.
>>>>
>>>> I would be VERY careful with any more complicated integration scheme -
>>>> the efficacy is extremely dependent on the shape of the integrand. Spline
>>>> interpolation can be especially abysmal if you have very few points and may
>>>> not be any better than trapezoidal integration IMO. Furthermore, there is
>>>> no way to assess the systematic error of any integration method by doing
>>>> anything other than adding more lambda values, but this also adds more
>>>> statistical error.
>>>>
>>>> HTH,
>>>> BKR
>>>>
>>>>
>>>> On Tue, Feb 27, 2018 at 1:26 PM, Sadegh Faramarzi Ganjabad <
>>>> safaramarziganjabad_at_mix.wvu.edu> wrote:
>>>>
>>>>> Brian and Josh,
>>>>>
>>>>> Thanks for your replies. As a rough analysis, I plotted the
>>>>> 'non-averaged' columns vs. timesteps. The plots look noisy for some energy
>>>>> components. Also, I multiplied each entry of the columns by 10 (since I
>>>>> sampled every 10 timesteps), and summed them up. I believe that is the same
>>>>> as integration; correct me if I'm wrong. Then I summed the sums of
>>>>> BOND+ELECT+VDW columns for each group (I guess that is the total DeltaG).
>>>>> The DeltaG is -10.9 kcal/mol for group 1 and -15.4 kcal/mol for group 2. I
>>>>> assume in an ideal case they should be equal. If the procedure I explained
>>>>> is correct, I wonder what is the best way to get rid of noisy data and
>>>>> having two DeltaG values close together. I assume increasing the number of
>>>>> lambdas form 16 to 30+ or increasing the length of lambdas from 20ns to
>>>>> 40ns may help.
>>>>>
>>>>> Thanks and sorry if some of my questions look basic,
>>>>> Sadegh
>>>>>
>>>>> On Sun, Feb 25, 2018 at 6:23 PM, Vermaas, Joshua <
>>>>> Joshua.Vermaas_at_nrel.gov> wrote:
>>>>>
>>>>>> Hi Sadegh,
>>>>>>
>>>>>> To add to what Brian said, the way I tend to parse the output is in a
>>>>>> two-step process. In the first step, you are only looking to extract the
>>>>>> non-averaged columns of the output file, which may be a simple call to awk
>>>>>> or numpy.loadtxt, depending on your comfort level in bash or python. Taking
>>>>>> the non-averaged columns let you have some idea of how noisy your averaging
>>>>>> process needs to be, and lets you rigorously compute error estimates for
>>>>>> each point. From there, you have the value for dG/dlambda along your lambda
>>>>>> coordinate, which you can plot/integrate to get your final answer. I would,
>>>>>> however, NOT recommend a simple trapezoidal rule in general for the
>>>>>> integration. While it is normally ok for TI, if you ever decide to switch
>>>>>> to FEP, I find that a spline interpolation deals better with the spiky end
>>>>>> points I see on occasion, so learning how to integrate an interpolated
>>>>>> curve is a useful skill to have (see
>>>>>> https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.BSpline.html#scipy.interpolate.BSpline
>>>>>> ).
>>>>>>
>>>>>> -Josh
>>>>>>
>>>>>> On 02/25/2018 11:52 AM, Brian Radak wrote:
>>>>>> Hi Sadegh,
>>>>>>
>>>>>> Yes, this is indeed a slightly undocumented update in the latest
>>>>>> version (though not much less documented than the old version).
>>>>>>
>>>>>> The derivatives are separated by alchemical groups (group 1 is scaled
>>>>>> with lambda and group 2 is scaled with 1 - lambda, these are specified in
>>>>>> the input PDB with 1 and -1).
>>>>>>
>>>>>> Note also that everything in the alchOutFile is an average. The
>>>>>> normal output is a local/block average over the last alchOutFreq steps and
>>>>>> the AVG columns are cumulative averages. There's really no good reason to
>>>>>> have alchOutFreq as small as you have it - it just hurts performance due to
>>>>>> frequent I/O.
>>>>>>
>>>>>> It is up to the user to decide what they want to do with the output.
>>>>>> If you are using simple trapezoidal integration, you would have to scale
>>>>>> each column according to the appropriate deltaLambda increment (which you
>>>>>> have to compute yourself).
>>>>>>
>>>>>> HTH,
>>>>>> BKR
>>>>>>
>>>>>> P.S. Note also that the instantaneous TI derivatives are now also
>>>>>> sent to stdout according to outputEnergies. You can safely suppress all
>>>>>> output to alchOutFile by setting alchOutFreq 0 and just use the stdout if
>>>>>> you prefer - I sincerely doubt you would be able to detect a difference in
>>>>>> the analysis of any reasonable calculation.
>>>>>>
>>>>>> On Fri, Feb 23, 2018 at 7:09 PM, Sadegh Faramarzi Ganjabad <
>>>>>> safaramarziganjabad_at_mix.wvu.edu<mailto:
>>>>>> safaramarziganjabad_at_mix.wvu.edu>> wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> I have done a TI calculation, but can't find any tutorial or document
>>>>>> about how to interpret the output. Here is how my output file starts
>>>>>>
>>>>>> #TITITLE: TS BOND1 AVGBOND1 ELECT1
>>>>>> AVGELECT1 VDW1 AVGVDW1 BOND2 AVGBOND2
>>>>>> ELECT2 AVGELECT2 VDW2 AVGVDW2
>>>>>> #NEW TI WINDOW: LAMBDA 0
>>>>>> #PARTITION 1 SCALING: BOND 0 VDW 0 ELEC 0
>>>>>> #PARTITION 2 SCALING: BOND 1 VDW 1 ELEC 1
>>>>>> #CONSTANT TEMPERATURE: 310 K
>>>>>> TI: 0 19.7158 19.7158 -31.3316
>>>>>> -31.3316 -6.5855 -6.5855 12.4099 12.4099
>>>>>> -40.2905 -40.2905 21.8477 21.8477
>>>>>> TI: 10 19.4744 19.4964 -31.2402
>>>>>> -31.2486 -6.5801 -6.5806 10.7752 10.9238
>>>>>> -40.0014 -40.0277 18.3626 18.6795
>>>>>> TI: 20 24.9347 22.0861 -29.0597
>>>>>> -30.2062 -6.5748 -6.5778 10.0471 10.5063
>>>>>> -38.9418 -39.5106 12.0912 15.5422
>>>>>> TI: 30 32.2653 25.3697 -25.5063
>>>>>> -28.6901 -6.5840 -6.5798 9.2265 10.0935
>>>>>> -38.3273 -39.1289 10.4395 13.8962
>>>>>> TI: 40 42.3413 29.5091 -20.9719
>>>>>> -26.8077 -6.6005 -6.5848 9.9239 10.0521
>>>>>> -38.7601 -39.0389 13.5623 13.8147
>>>>>> TI: 50 58.5311 35.1997 -15.9560
>>>>>> -24.6799 -6.6079 -6.5894 11.9345 10.4212
>>>>>> -39.2660 -39.0835 12.5075 13.5584
>>>>>> TI: 60 80.3605 42.6031 -12.8759
>>>>>> -22.7448 -6.6090 -6.5926 10.8168 10.4861
>>>>>> -39.5107 -39.1535 6.7353 12.4399
>>>>>> TI: 70 102.0040 50.9694 -14.6704
>>>>>> -21.6076 -6.6234 -6.5969 10.6856 10.5142
>>>>>> -39.9275 -39.2625 5.4385 11.4538
>>>>>> TI: 80 119.1639 59.3885 -20.0657
>>>>>> -21.4172 -6.6514 -6.6037 12.3398 10.7396
>>>>>> -39.4442 -39.2849 8.3762 11.0738
>>>>>> TI: 90 130.3328 67.1846 -23.9356
>>>>>> -21.6940 -6.6710 -6.6111 9.7465 10.6304
>>>>>> -37.6627 -39.1067 11.6755 11.1399
>>>>>> #100 STEPS OF EQUILIBRATION AT LAMBDA 0 COMPLETED
>>>>>> #STARTING COLLECTION OF ENSEMBLE AVERAGE
>>>>>> TI: 100 127.3925 127.3925 -23.4962
>>>>>> -23.4962 -6.6485 -6.6485 9.9449 9.9449
>>>>>> -37.7934 -37.7934 9.3023 9.3023
>>>>>> TI: 110 123.5767 123.9236 -23.1205
>>>>>> -23.1546 -6.6329 -6.6343 10.3050 10.2723
>>>>>> -39.3857 -39.2410 6.6564 6.8969
>>>>>>
>>>>>> It seems like BOND1 and BOND2 etc. are for the initial and final
>>>>>> values of lambda, respectively. If so, is AVGBOND1+AVGELECT1+AVGVDW1 the
>>>>>> total TI energy for each timestep? then if I sum the total energies of all
>>>>>> timesteps I will get the total TI for the total transition from lambda=0 to
>>>>>> lambda=1?
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Sadegh
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Next message: Stefano Guglielmo: "torsion scan"
Previous message: Suresh Ghimire: "variation of pressure"
In reply to: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Next in thread: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Reply: Sadegh Faramarzi Ganjabad: "Re: How to interpret Thermodynamic Integration output"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:21:01 CST