Re: How to interpret Thermodynamic Integration output

From: Brian Radak (brian.radak_at_gmail.com)
Date: Fri Mar 23 2018 - 10:30:26 CDT

On Thu, Mar 22, 2018 at 3:13 PM, Sadegh Faramarzi Ganjabad <
safaramarziganjabad_at_mix.wvu.edu> wrote:

> Brian,
>
> I read that paper, although they discuss the theory rather than the
> procedure to calculate TI energies from the output data. Anyways, I simply
> plotted 'AVGBOND1 + AVGELECT1 + AVGVDW1 - (AVGBOND2 + AVGELECT2 + AVGVDW2)'
> vs. lambda values. The plot looks smooth for each lambda values, except for
> some noises when moving from one lambda to the next one. Here are the
> averages and errors of the cumulative averages for all 16 lambda values
>
> 76.9±2.0 , 24.0±1.5 , 20.6± 1.7, 20.4± 1.8 , 22.3± 1.8 ,
> 21.3± 2.6 , 21.5± 0.9 , 19.3± 1.0 , 16.6± 0.9 , 8.2± 0.9 ,
> -6.0± 1.5 , -21.2± 2.3 , -15.5± 2.6 , -32.2± 1.7 , -21.1±
> 1.8 , -32.5± 1.6
>
> As seen from data, from lambda = 0.0625 to 0.125 there is a sharp decrease
> in TI values (from 76.9 to 24.0). I was wondering
>
> 1. What would be possible reasons for the discontinuity near lambda=
> [0-0.125]
>
> There is generally a large derivative when an atom is almost entirely
decoupled, especially for Lennard-Jones interactions. Using a non-zero
alchVdwShiftCoeff (the default is 5.0 A**2) should help mitigate this.
Otherwise there is no way around this.

Really the problem here is sampling and has nothing to do with TI. The
derivative has some dependence on lambda, but for very small values lambda
has very little effect on the dynamics (at 0 it's just free diffusion + any
bonded terms you might still have). This is a general problem in alchemical
simulations and there is no general solution as of yet.

> 2. How can I resolve it? if I am thinking of going back and running some
> extra lambda values in between 0 and 0.125.
>
> This is recommended.

> 3. As seen from the data above, for lambda > 0.5 I get negative values for
> TI. So I can't calculate the logarithm for those values like this plot on
> the manual
>
> http://www.ks.uiuc.edu/Research/namd/2.10/ug/node64.html
>
> should I consider the absolute values of TI?
>
> I have no idea why they chose a log plot other than for easier
visualization. I would just use a logscale ("set logscale y" if you are
using gnuplot).

> 4. Regardless of the integration method, the area below the plot obtained
> from the data above will give me DeltaG of the whole process. Is that right?
>
> By the Fundamental Theorem of Calculus, the integral of the mean
derivative over an interval will give you the free energy change on that
interval (presumably 0 to 1).

> Thanks,
> Sadegh Faramarzi,
> Research Assistant
> West Virginia University, Department of Chemistry
> Email:safaramarziganjabad_at_mix.wvu.edu
>
> On Tue, Feb 27, 2018 at 2:00 PM, Brian Radak <brian.radak_at_gmail.com>
> wrote:
>
>> Unfortunately most of what you conveyed is not at all correct - I
>> strongly recommend you look at the extensive literature on thermodynamic
>> integration. The most recent systematic work I can think of is the
>> comparison paper by Paliwal and Shirts in JCTC, but several other more
>> "seminal" papers exist.
>>
>> You only need 6 numbers from each simulation - the columns beginning with
>> AVG. These are simply the cumulative average over the course of the
>> simulation and the thing that you literally want to integrate. You can also
>> average the block average components or just average the instantaneous
>> values. I doubt that any of the noise between those methods is anywhere
>> near the overall standard error unless your simulation is quite short.
>>
>> If you are using a staged integration scheme (i.e. alchBondLambdaEnd
>> and/or alchVdwLambdaEnd != 1 and/or alchElecLambdaStart != 0 - NB This is
>> NOT the default) then you have to consider multiple integrals over multiple
>> intervals (because the integral formally vanishes even though the integrand
>> does not).
>>
>> If you are using pure linear coupling (NOT the default), then the
>> integrand is simply AVGBOND1 + AVGELECT1 + AVGVDW1 - (AVGBOND2 + AVGELECT2
>> + AVGVDW2).
>>
>> If your lambda values are evenly spaced such that you have a fixed
>> "deltaLambda" AND you have collected data at your endpoints (i.e. 0 and 1),
>> then you can perform trapezoidal integration by taking a weighted sum of
>> the integrand (the first and last value get a weight of 1/2 and the rest
>> get a weight of 1) and then multiplying through by deltaLambda.
>>
>> I would be VERY careful with any more complicated integration scheme -
>> the efficacy is extremely dependent on the shape of the integrand. Spline
>> interpolation can be especially abysmal if you have very few points and may
>> not be any better than trapezoidal integration IMO. Furthermore, there is
>> no way to assess the systematic error of any integration method by doing
>> anything other than adding more lambda values, but this also adds more
>> statistical error.
>>
>> HTH,
>> BKR
>>
>>
>> On Tue, Feb 27, 2018 at 1:26 PM, Sadegh Faramarzi Ganjabad <
>> safaramarziganjabad_at_mix.wvu.edu> wrote:
>>
>>> Brian and Josh,
>>>
>>> Thanks for your replies. As a rough analysis, I plotted the
>>> 'non-averaged' columns vs. timesteps. The plots look noisy for some energy
>>> components. Also, I multiplied each entry of the columns by 10 (since I
>>> sampled every 10 timesteps), and summed them up. I believe that is the same
>>> as integration; correct me if I'm wrong. Then I summed the sums of
>>> BOND+ELECT+VDW columns for each group (I guess that is the total DeltaG).
>>> The DeltaG is -10.9 kcal/mol for group 1 and -15.4 kcal/mol for group 2. I
>>> assume in an ideal case they should be equal. If the procedure I explained
>>> is correct, I wonder what is the best way to get rid of noisy data and
>>> having two DeltaG values close together. I assume increasing the number of
>>> lambdas form 16 to 30+ or increasing the length of lambdas from 20ns to
>>> 40ns may help.
>>>
>>> Thanks and sorry if some of my questions look basic,
>>> Sadegh
>>>
>>> On Sun, Feb 25, 2018 at 6:23 PM, Vermaas, Joshua <
>>> Joshua.Vermaas_at_nrel.gov> wrote:
>>>
>>>> Hi Sadegh,
>>>>
>>>> To add to what Brian said, the way I tend to parse the output is in a
>>>> two-step process. In the first step, you are only looking to extract the
>>>> non-averaged columns of the output file, which may be a simple call to awk
>>>> or numpy.loadtxt, depending on your comfort level in bash or python. Taking
>>>> the non-averaged columns let you have some idea of how noisy your averaging
>>>> process needs to be, and lets you rigorously compute error estimates for
>>>> each point. From there, you have the value for dG/dlambda along your lambda
>>>> coordinate, which you can plot/integrate to get your final answer. I would,
>>>> however, NOT recommend a simple trapezoidal rule in general for the
>>>> integration. While it is normally ok for TI, if you ever decide to switch
>>>> to FEP, I find that a spline interpolation deals better with the spiky end
>>>> points I see on occasion, so learning how to integrate an interpolated
>>>> curve is a useful skill to have (see https://docs.scipy.org/doc/sci
>>>> py/reference/generated/scipy.interpolate.BSpline.html#scipy.
>>>> interpolate.BSpline).
>>>>
>>>> -Josh
>>>>
>>>> On 02/25/2018 11:52 AM, Brian Radak wrote:
>>>> Hi Sadegh,
>>>>
>>>> Yes, this is indeed a slightly undocumented update in the latest
>>>> version (though not much less documented than the old version).
>>>>
>>>> The derivatives are separated by alchemical groups (group 1 is scaled
>>>> with lambda and group 2 is scaled with 1 - lambda, these are specified in
>>>> the input PDB with 1 and -1).
>>>>
>>>> Note also that everything in the alchOutFile is an average. The normal
>>>> output is a local/block average over the last alchOutFreq steps and the AVG
>>>> columns are cumulative averages. There's really no good reason to have
>>>> alchOutFreq as small as you have it - it just hurts performance due to
>>>> frequent I/O.
>>>>
>>>> It is up to the user to decide what they want to do with the output. If
>>>> you are using simple trapezoidal integration, you would have to scale each
>>>> column according to the appropriate deltaLambda increment (which you have
>>>> to compute yourself).
>>>>
>>>> HTH,
>>>> BKR
>>>>
>>>> P.S. Note also that the instantaneous TI derivatives are now also sent
>>>> to stdout according to outputEnergies. You can safely suppress all output
>>>> to alchOutFile by setting alchOutFreq 0 and just use the stdout if you
>>>> prefer - I sincerely doubt you would be able to detect a difference in the
>>>> analysis of any reasonable calculation.
>>>>
>>>> On Fri, Feb 23, 2018 at 7:09 PM, Sadegh Faramarzi Ganjabad <
>>>> safaramarziganjabad_at_mix.wvu.edu<mailto:safaramarziganjabad_at_mix.wvu.edu>>
>>>> wrote:
>>>> Hello all,
>>>>
>>>> I have done a TI calculation, but can't find any tutorial or document
>>>> about how to interpret the output. Here is how my output file starts
>>>>
>>>> #TITITLE: TS BOND1 AVGBOND1 ELECT1
>>>> AVGELECT1 VDW1 AVGVDW1 BOND2 AVGBOND2
>>>> ELECT2 AVGELECT2 VDW2 AVGVDW2
>>>> #NEW TI WINDOW: LAMBDA 0
>>>> #PARTITION 1 SCALING: BOND 0 VDW 0 ELEC 0
>>>> #PARTITION 2 SCALING: BOND 1 VDW 1 ELEC 1
>>>> #CONSTANT TEMPERATURE: 310 K
>>>> TI: 0 19.7158 19.7158 -31.3316
>>>> -31.3316 -6.5855 -6.5855 12.4099 12.4099
>>>> -40.2905 -40.2905 21.8477 21.8477
>>>> TI: 10 19.4744 19.4964 -31.2402
>>>> -31.2486 -6.5801 -6.5806 10.7752 10.9238
>>>> -40.0014 -40.0277 18.3626 18.6795
>>>> TI: 20 24.9347 22.0861 -29.0597
>>>> -30.2062 -6.5748 -6.5778 10.0471 10.5063
>>>> -38.9418 -39.5106 12.0912 15.5422
>>>> TI: 30 32.2653 25.3697 -25.5063
>>>> -28.6901 -6.5840 -6.5798 9.2265 10.0935
>>>> -38.3273 -39.1289 10.4395 13.8962
>>>> TI: 40 42.3413 29.5091 -20.9719
>>>> -26.8077 -6.6005 -6.5848 9.9239 10.0521
>>>> -38.7601 -39.0389 13.5623 13.8147
>>>> TI: 50 58.5311 35.1997 -15.9560
>>>> -24.6799 -6.6079 -6.5894 11.9345 10.4212
>>>> -39.2660 -39.0835 12.5075 13.5584
>>>> TI: 60 80.3605 42.6031 -12.8759
>>>> -22.7448 -6.6090 -6.5926 10.8168 10.4861
>>>> -39.5107 -39.1535 6.7353 12.4399
>>>> TI: 70 102.0040 50.9694 -14.6704
>>>> -21.6076 -6.6234 -6.5969 10.6856 10.5142
>>>> -39.9275 -39.2625 5.4385 11.4538
>>>> TI: 80 119.1639 59.3885 -20.0657
>>>> -21.4172 -6.6514 -6.6037 12.3398 10.7396
>>>> -39.4442 -39.2849 8.3762 11.0738
>>>> TI: 90 130.3328 67.1846 -23.9356
>>>> -21.6940 -6.6710 -6.6111 9.7465 10.6304
>>>> -37.6627 -39.1067 11.6755 11.1399
>>>> #100 STEPS OF EQUILIBRATION AT LAMBDA 0 COMPLETED
>>>> #STARTING COLLECTION OF ENSEMBLE AVERAGE
>>>> TI: 100 127.3925 127.3925 -23.4962
>>>> -23.4962 -6.6485 -6.6485 9.9449 9.9449
>>>> -37.7934 -37.7934 9.3023 9.3023
>>>> TI: 110 123.5767 123.9236 -23.1205
>>>> -23.1546 -6.6329 -6.6343 10.3050 10.2723
>>>> -39.3857 -39.2410 6.6564 6.8969
>>>>
>>>> It seems like BOND1 and BOND2 etc. are for the initial and final values
>>>> of lambda, respectively. If so, is AVGBOND1+AVGELECT1+AVGVDW1 the total TI
>>>> energy for each timestep? then if I sum the total energies of all timesteps
>>>> I will get the total TI for the total transition from lambda=0 to lambda=1?
>>>>
>>>> Thanks in advance,
>>>> Sadegh
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:56 CST