Re: How to interpret Thermodynamic Integration output

From: Sadegh Faramarzi Ganjabad (safaramarziganjabad_at_mix.wvu.edu)
Date: Wed Apr 18 2018 - 14:57:56 CDT

Brian,

Thanks for explaining. I ran the extra lambda values between lambda = 0 and
0.125. Now the TI values drop more smoothly, but they are generally lower
than the original TI. I assume that is because the original simulation
wasn't long enough (?).

Sadegh Faramarzi,
Research Assistant
West Virginia University, Department of Chemistry
Email:safaramarziganjabad_at_mix.wvu.edu

On Fri, Mar 23, 2018 at 11:30 AM, Brian Radak <brian.radak_at_gmail.com> wrote:

>
>
> On Thu, Mar 22, 2018 at 3:13 PM, Sadegh Faramarzi Ganjabad <
> safaramarziganjabad_at_mix.wvu.edu> wrote:
>
>> Brian,
>>
>> I read that paper, although they discuss the theory rather than the
>> procedure to calculate TI energies from the output data. Anyways, I simply
>> plotted 'AVGBOND1 + AVGELECT1 + AVGVDW1 - (AVGBOND2 + AVGELECT2 + AVGVDW2)'
>> vs. lambda values. The plot looks smooth for each lambda values, except for
>> some noises when moving from one lambda to the next one. Here are the
>> averages and errors of the cumulative averages for all 16 lambda values
>>
>> 76.9±2.0 , 24.0±1.5 , 20.6± 1.7, 20.4± 1.8 , 22.3± 1.8 ,
>> 21.3± 2.6 , 21.5± 0.9 , 19.3± 1.0 , 16.6± 0.9 , 8.2± 0.9 ,
>> -6.0± 1.5 , -21.2± 2.3 , -15.5± 2.6 , -32.2± 1.7 , -21.1±
>> 1.8 , -32.5± 1.6
>>
>> As seen from data, from lambda = 0.0625 to 0.125 there is a sharp
>> decrease in TI values (from 76.9 to 24.0). I was wondering
>>
>> 1. What would be possible reasons for the discontinuity near lambda=
>> [0-0.125]
>>
>> There is generally a large derivative when an atom is almost entirely
> decoupled, especially for Lennard-Jones interactions. Using a non-zero
> alchVdwShiftCoeff (the default is 5.0 A**2) should help mitigate this.
> Otherwise there is no way around this.
>
> Really the problem here is sampling and has nothing to do with TI. The
> derivative has some dependence on lambda, but for very small values lambda
> has very little effect on the dynamics (at 0 it's just free diffusion + any
> bonded terms you might still have). This is a general problem in alchemical
> simulations and there is no general solution as of yet.
>
>
>> 2. How can I resolve it? if I am thinking of going back and running some
>> extra lambda values in between 0 and 0.125.
>>
>> This is recommended.
>
>
>> 3. As seen from the data above, for lambda > 0.5 I get negative values
>> for TI. So I can't calculate the logarithm for those values like this plot
>> on the manual
>>
>> http://www.ks.uiuc.edu/Research/namd/2.10/ug/node64.html
>>
>> should I consider the absolute values of TI?
>>
>> I have no idea why they chose a log plot other than for easier
> visualization. I would just use a logscale ("set logscale y" if you are
> using gnuplot).
>
>
>> 4. Regardless of the integration method, the area below the plot obtained
>> from the data above will give me DeltaG of the whole process. Is that right?
>>
>> By the Fundamental Theorem of Calculus, the integral of the mean
> derivative over an interval will give you the free energy change on that
> interval (presumably 0 to 1).
>
>
>> Thanks,
>> Sadegh Faramarzi,
>> Research Assistant
>> West Virginia University, Department of Chemistry
>> Email:safaramarziganjabad_at_mix.wvu.edu
>>
>> On Tue, Feb 27, 2018 at 2:00 PM, Brian Radak <brian.radak_at_gmail.com>
>> wrote:
>>
>>> Unfortunately most of what you conveyed is not at all correct - I
>>> strongly recommend you look at the extensive literature on thermodynamic
>>> integration. The most recent systematic work I can think of is the
>>> comparison paper by Paliwal and Shirts in JCTC, but several other more
>>> "seminal" papers exist.
>>>
>>> You only need 6 numbers from each simulation - the columns beginning
>>> with AVG. These are simply the cumulative average over the course of the
>>> simulation and the thing that you literally want to integrate. You can also
>>> average the block average components or just average the instantaneous
>>> values. I doubt that any of the noise between those methods is anywhere
>>> near the overall standard error unless your simulation is quite short.
>>>
>>> If you are using a staged integration scheme (i.e. alchBondLambdaEnd
>>> and/or alchVdwLambdaEnd != 1 and/or alchElecLambdaStart != 0 - NB This is
>>> NOT the default) then you have to consider multiple integrals over multiple
>>> intervals (because the integral formally vanishes even though the integrand
>>> does not).
>>>
>>> If you are using pure linear coupling (NOT the default), then the
>>> integrand is simply AVGBOND1 + AVGELECT1 + AVGVDW1 - (AVGBOND2 + AVGELECT2
>>> + AVGVDW2).
>>>
>>> If your lambda values are evenly spaced such that you have a fixed
>>> "deltaLambda" AND you have collected data at your endpoints (i.e. 0 and 1),
>>> then you can perform trapezoidal integration by taking a weighted sum of
>>> the integrand (the first and last value get a weight of 1/2 and the rest
>>> get a weight of 1) and then multiplying through by deltaLambda.
>>>
>>> I would be VERY careful with any more complicated integration scheme -
>>> the efficacy is extremely dependent on the shape of the integrand. Spline
>>> interpolation can be especially abysmal if you have very few points and may
>>> not be any better than trapezoidal integration IMO. Furthermore, there is
>>> no way to assess the systematic error of any integration method by doing
>>> anything other than adding more lambda values, but this also adds more
>>> statistical error.
>>>
>>> HTH,
>>> BKR
>>>
>>>
>>> On Tue, Feb 27, 2018 at 1:26 PM, Sadegh Faramarzi Ganjabad <
>>> safaramarziganjabad_at_mix.wvu.edu> wrote:
>>>
>>>> Brian and Josh,
>>>>
>>>> Thanks for your replies. As a rough analysis, I plotted the
>>>> 'non-averaged' columns vs. timesteps. The plots look noisy for some energy
>>>> components. Also, I multiplied each entry of the columns by 10 (since I
>>>> sampled every 10 timesteps), and summed them up. I believe that is the same
>>>> as integration; correct me if I'm wrong. Then I summed the sums of
>>>> BOND+ELECT+VDW columns for each group (I guess that is the total DeltaG).
>>>> The DeltaG is -10.9 kcal/mol for group 1 and -15.4 kcal/mol for group 2. I
>>>> assume in an ideal case they should be equal. If the procedure I explained
>>>> is correct, I wonder what is the best way to get rid of noisy data and
>>>> having two DeltaG values close together. I assume increasing the number of
>>>> lambdas form 16 to 30+ or increasing the length of lambdas from 20ns to
>>>> 40ns may help.
>>>>
>>>> Thanks and sorry if some of my questions look basic,
>>>> Sadegh
>>>>
>>>> On Sun, Feb 25, 2018 at 6:23 PM, Vermaas, Joshua <
>>>> Joshua.Vermaas_at_nrel.gov> wrote:
>>>>
>>>>> Hi Sadegh,
>>>>>
>>>>> To add to what Brian said, the way I tend to parse the output is in a
>>>>> two-step process. In the first step, you are only looking to extract the
>>>>> non-averaged columns of the output file, which may be a simple call to awk
>>>>> or numpy.loadtxt, depending on your comfort level in bash or python. Taking
>>>>> the non-averaged columns let you have some idea of how noisy your averaging
>>>>> process needs to be, and lets you rigorously compute error estimates for
>>>>> each point. From there, you have the value for dG/dlambda along your lambda
>>>>> coordinate, which you can plot/integrate to get your final answer. I would,
>>>>> however, NOT recommend a simple trapezoidal rule in general for the
>>>>> integration. While it is normally ok for TI, if you ever decide to switch
>>>>> to FEP, I find that a spline interpolation deals better with the spiky end
>>>>> points I see on occasion, so learning how to integrate an interpolated
>>>>> curve is a useful skill to have (see https://docs.scipy.org/doc/sci
>>>>> py/reference/generated/scipy.interpolate.BSpline.html#scipy.
>>>>> interpolate.BSpline).
>>>>>
>>>>> -Josh
>>>>>
>>>>> On 02/25/2018 11:52 AM, Brian Radak wrote:
>>>>> Hi Sadegh,
>>>>>
>>>>> Yes, this is indeed a slightly undocumented update in the latest
>>>>> version (though not much less documented than the old version).
>>>>>
>>>>> The derivatives are separated by alchemical groups (group 1 is scaled
>>>>> with lambda and group 2 is scaled with 1 - lambda, these are specified in
>>>>> the input PDB with 1 and -1).
>>>>>
>>>>> Note also that everything in the alchOutFile is an average. The normal
>>>>> output is a local/block average over the last alchOutFreq steps and the AVG
>>>>> columns are cumulative averages. There's really no good reason to have
>>>>> alchOutFreq as small as you have it - it just hurts performance due to
>>>>> frequent I/O.
>>>>>
>>>>> It is up to the user to decide what they want to do with the output.
>>>>> If you are using simple trapezoidal integration, you would have to scale
>>>>> each column according to the appropriate deltaLambda increment (which you
>>>>> have to compute yourself).
>>>>>
>>>>> HTH,
>>>>> BKR
>>>>>
>>>>> P.S. Note also that the instantaneous TI derivatives are now also sent
>>>>> to stdout according to outputEnergies. You can safely suppress all output
>>>>> to alchOutFile by setting alchOutFreq 0 and just use the stdout if you
>>>>> prefer - I sincerely doubt you would be able to detect a difference in the
>>>>> analysis of any reasonable calculation.
>>>>>
>>>>> On Fri, Feb 23, 2018 at 7:09 PM, Sadegh Faramarzi Ganjabad <
>>>>> safaramarziganjabad_at_mix.wvu.edu<mailto:safaramarziganjabad_at_mix.wvu.edu>>
>>>>> wrote:
>>>>> Hello all,
>>>>>
>>>>> I have done a TI calculation, but can't find any tutorial or document
>>>>> about how to interpret the output. Here is how my output file starts
>>>>>
>>>>> #TITITLE: TS BOND1 AVGBOND1 ELECT1
>>>>> AVGELECT1 VDW1 AVGVDW1 BOND2 AVGBOND2
>>>>> ELECT2 AVGELECT2 VDW2 AVGVDW2
>>>>> #NEW TI WINDOW: LAMBDA 0
>>>>> #PARTITION 1 SCALING: BOND 0 VDW 0 ELEC 0
>>>>> #PARTITION 2 SCALING: BOND 1 VDW 1 ELEC 1
>>>>> #CONSTANT TEMPERATURE: 310 K
>>>>> TI: 0 19.7158 19.7158 -31.3316
>>>>> -31.3316 -6.5855 -6.5855 12.4099 12.4099
>>>>> -40.2905 -40.2905 21.8477 21.8477
>>>>> TI: 10 19.4744 19.4964 -31.2402
>>>>> -31.2486 -6.5801 -6.5806 10.7752 10.9238
>>>>> -40.0014 -40.0277 18.3626 18.6795
>>>>> TI: 20 24.9347 22.0861 -29.0597
>>>>> -30.2062 -6.5748 -6.5778 10.0471 10.5063
>>>>> -38.9418 -39.5106 12.0912 15.5422
>>>>> TI: 30 32.2653 25.3697 -25.5063
>>>>> -28.6901 -6.5840 -6.5798 9.2265 10.0935
>>>>> -38.3273 -39.1289 10.4395 13.8962
>>>>> TI: 40 42.3413 29.5091 -20.9719
>>>>> -26.8077 -6.6005 -6.5848 9.9239 10.0521
>>>>> -38.7601 -39.0389 13.5623 13.8147
>>>>> TI: 50 58.5311 35.1997 -15.9560
>>>>> -24.6799 -6.6079 -6.5894 11.9345 10.4212
>>>>> -39.2660 -39.0835 12.5075 13.5584
>>>>> TI: 60 80.3605 42.6031 -12.8759
>>>>> -22.7448 -6.6090 -6.5926 10.8168 10.4861
>>>>> -39.5107 -39.1535 6.7353 12.4399
>>>>> TI: 70 102.0040 50.9694 -14.6704
>>>>> -21.6076 -6.6234 -6.5969 10.6856 10.5142
>>>>> -39.9275 -39.2625 5.4385 11.4538
>>>>> TI: 80 119.1639 59.3885 -20.0657
>>>>> -21.4172 -6.6514 -6.6037 12.3398 10.7396
>>>>> -39.4442 -39.2849 8.3762 11.0738
>>>>> TI: 90 130.3328 67.1846 -23.9356
>>>>> -21.6940 -6.6710 -6.6111 9.7465 10.6304
>>>>> -37.6627 -39.1067 11.6755 11.1399
>>>>> #100 STEPS OF EQUILIBRATION AT LAMBDA 0 COMPLETED
>>>>> #STARTING COLLECTION OF ENSEMBLE AVERAGE
>>>>> TI: 100 127.3925 127.3925 -23.4962
>>>>> -23.4962 -6.6485 -6.6485 9.9449 9.9449
>>>>> -37.7934 -37.7934 9.3023 9.3023
>>>>> TI: 110 123.5767 123.9236 -23.1205
>>>>> -23.1546 -6.6329 -6.6343 10.3050 10.2723
>>>>> -39.3857 -39.2410 6.6564 6.8969
>>>>>
>>>>> It seems like BOND1 and BOND2 etc. are for the initial and final
>>>>> values of lambda, respectively. If so, is AVGBOND1+AVGELECT1+AVGVDW1 the
>>>>> total TI energy for each timestep? then if I sum the total energies of all
>>>>> timesteps I will get the total TI for the total transition from lambda=0 to
>>>>> lambda=1?
>>>>>
>>>>> Thanks in advance,
>>>>> Sadegh
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:21:01 CST