Re: namdEnergy gives significantly different values from those in log file?

From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Sat Nov 29 2008 - 23:12:01 CST

Hi Steve,
just to be clear, you're not "sampling" using potential energy alone,
but you would be doing your reweighting and analysis based on potential
energy alone. Because the observables you care about are functions only
of the spatial coordinates, you should really be concerning yourself
with the configurational partition function, potential energies, and so
on (you'll also see this throughout the literature on reweighting and
replica exchange). If you're really sampling in the canonical ensemble
(and do enough sampling) then you'd find that the kinetic energy
distribution is the same for any particular coordinates, so the kinetic
energy doesn't contribute to (for example) conformational free energy
differences. This is why Jerome noted that you could think of your
frames as if they were generated by MC: for the particular analysis that
you're doing, MD is just a machine for generating conformations sampled
from the canonical ensemble.

More sampling is frequently the answer; you'll also certainly want to
rerun in NVT if you haven't already. It still might also be a good idea
to switch to using a more sophisticated method for doing the reweighting
and analysis (like WHAM, or the derivatives thereof cited earlier in the
thread) that is more specifically designed for replica exchange; if
nothing else, you should get a better feel for your statistical
uncertainty that way.

Best,
Peter

Steven Samuel Plotkin wrote:
>
> Dear Peter, Jerome,
>
> Sampling using potential energy alone is an interesting suggestion. If
> I were doing MC simulation I would not question it of course, however
> the MD system has KE, and the Boltzmann weights in principle are not
> exp(-PE_i/kT) but are exp(-E_i/kT). So I suppose the physical
> motivation behind this must be that one can trace over the momenta or
> velocities independent of the coordinates. I think this is almost
> always OK so long as the system is in equilibrium (and potentials are
> not velocity-dependent). I decided to check whether the KEs and PEs
> were correlated, (which I suppose they could be if the system was not
> fully in equilibrium). If they were correlated then one would have to
> use E_tot I suppose:
> http://www.physics.ubc.ca/~steve/NAMD/Repx/PEvsKE.jpg
> But I would call that uncorrelated.
>
> So I think you can generally use either PE or E_total for sampling
> purposes. PE will have larger negative numbers which might make things
> more difficult in practice. I don't think one is fundamentally better
> than the other though, would you agree?
>
> To be sure, I plotted my preliminary Free energy surface for both
> E_tot and PE. They seem quite close, relatively speaking:
> http://www.physics.ubc.ca/~steve/NAMD/Repx/FvsRMSDprelim2.jpg
> I think if my sampling were better they would be even closer. The
> ruggedness in my free E surface seems to indicate sampling is too
> sparse. I thought 8000 data points would have been enough for this 1-d
> surface, but it is apparently not.
>
> Thanks again for your feedback,
>
> Steve
>
>
>
> Peter Freddolino wrote:
>> Oops... I completely read past the fact that Steve was including the
>> kinetic energies. I should have been more clear and said "total
>> potential energy"; my point was that one should not just take the
>> potential energy of the protein (or some other component of the
>> system). Sorry for any confusion that generated; I shouldn't have
>> missed that, but got caught up in the namdenergy problem of
>> calculating the potential energies of interactions.
>>
>> And yes, it is now clear that Steve is properly accumulating
>> microstates for analysis.
>>
>> Thanks,
>> Peter
>>
>> Jerome Henin wrote:
>>> Hi everyone,
>>> Peter: to answer your question about a frame being a "state", here
>>> Steve is using the word in the sense of microstate.
>>>
>>> Steve: since you are interested in statistics in conformation space,
>>> if I were you I'd try reweighting based on the potential energy only
>>> (using the target temperature as you did: I don't think the
>>> instantaneous "temperature" is meaningful). You can always think of
>>> your MD trajectories as if they were MC.
>>>
>>> Jerome
>>>
>>> On Wed, Nov 26, 2008 at 1:18 AM, Steven Samuel Plotkin
>>> <steve_at_physics.ubc.ca> wrote:
>>>
>>>> Hi Peter,
>>>>
>>>> Yes I am careful to get good equilibrium sampling by writing each
>>>> "state" to
>>>> the log and dcd files every 20ps. The reweighting of states should be
>>>> independent of whatever reduced coordinate I choose, I think.
>>>> However there
>>>> are 2 ways to do the reweighting, and I'm not sure which is the
>>>> appropriate
>>>> one. One is to use T_real for the current temperature of the state,
>>>> the
>>>> other is to use T_targ for the current temperature of the state. I
>>>> noticed
>>>> that the state that dominated my statistics had an anomalously high
>>>> T_real.
>>>> Here is a segment of the Temperatures T_real (blue) and T_targ
>>>> (green) vs
>>>> time:
>>>>
>>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/TtargAndTreal900-950.jpg
>>>>
>>>> The blue peak in the graph corresponds to the state that dominated my
>>>> statistics. I think the reweighting is best done with T_targ rather
>>>> than
>>>> T_real, which gives a more robust answer. Does that make sense? The
>>>> T_real's
>>>> fluctuate about the target temperatures. ( I have to apologize for my
>>>> confusing notation previously, where I used T_target for the
>>>> temperature I
>>>> want to obtain my free energy surface at. )
>>>>
>>>> Using T_targ instead of T_real alleviated but did not eliminate the
>>>> problem.
>>>> I think I simply need to collect more statistics. I thought 8000
>>>> states
>>>> would be enough for a decent 1-dimensional free energy surface but
>>>> apparently not in this case (the free E vs RMSD is quite rugged).
>>>>
>>>> The NVT simulations seem to be running fine.
>>>>
>>>> Thanks for your latest advice-- I do realize this thread has lately
>>>> had more
>>>> to do with data analysis than with the workings of NAMD.
>>>>
>>>> Cheers,
>>>>
>>>> Steve
>>>>
>>>>
>>>> Peter Freddolino wrote:
>>>>
>>>> Hi Steve,
>>>>
>>>> I'm a little unclear on what you're doing... are you labeling as a
>>>> "state"
>>>> each single frame from your trajectory? This is unlikely to give good
>>>> statistics as you need to average over solvent conformations and
>>>> such (not
>>>> to mention local fluctuations in the protein). Or are you defining
>>>> some set
>>>> of reduced coordinates to calculate your free energy surface? In
>>>> the latter
>>>> case I'd recommend looking over some papers in the literature with
>>>> complete
>>>> solutions for applying WHAM-style analysis to replica exchange
>>>> simulations:
>>>> J. Phys. Chem. B, 2005, 109 (14), pp 67226731
>>>> JCTC, 2007, 3, pp 26-41
>>>>
>>>> All you should need to do to run at constant volume is turn off
>>>> langevinPiston and comment out the settings following it.
>>>>
>>>> Best,
>>>> Peter
>>>>
>>>> Steven Samuel Plotkin wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> I think you are right, and this was why I took the total energies
>>>> initially,
>>>> which includes both KE and PE of the whole peptide plus water. This
>>>> is what
>>>> governs the original sampling probabilities, so I have discarded
>>>> the idea of
>>>> cherry-picking parts of the system. To find the re-weighted
>>>> probabilities
>>>> for each state I have been simply reweighting each state by the
>>>> factor:
>>>> exp( E_i ( beta_i - beta_targ) )
>>>> where E_i is the total energy of the state, beta_i is 1 over kB
>>>> times the
>>>> temperature at that sample time, and beta_targ is 1 over kB times
>>>> the target
>>>> temperature. This should be fine I think, at least for const volume
>>>> simulations.
>>>>
>>>> After doing this re-weighting however, the new exponents in the
>>>> Boltzmann
>>>> factors are (when sorted from highest)
>>>> 1673.3
>>>> 1657
>>>> 1638
>>>> 1623.2
>>>> 1616.6
>>>> 1606.1
>>>> 1596.6
>>>> .
>>>> .
>>>> .
>>>> so one particular state ends up having all the Boltzmann weight in the
>>>> partition function (i.e. exp(16.3) more weight than the next
>>>> weightiest
>>>> state). I didn't think this was reasonable, it says entropy isnt
>>>> important
>>>> in the partition function. What are the solutions for this problem?
>>>> Perhaps
>>>> more sampled configurations, but I have 8000 now which I thought
>>>> should be
>>>> enough to at least get a rough idea what the potential looks like.
>>>> Maybe I
>>>> need a narrower temperature range- the beta_i - beta_targ factors
>>>> are making
>>>> the exponents too large. However previous studies of REMD seem to
>>>> have gone
>>>> to 500K without difficulty... I must be missing something.
>>>>
>>>> I did not add a pressure correction to the replica exchange scripts. I
>>>> hadn't thought this through carefully enough and didn't know the REMD
>>>> scripts had not accounted for it. According to the reference you
>>>> mentioned,
>>>> there is an extra term in the transition matrix, and an extra term
>>>> in the
>>>> probabilities ~ Vi^N exp(-beta P V_i ). I will have to check how
>>>> large this
>>>> factor is. I don't think the volume fluctuations should be very
>>>> large but I
>>>> could be wrong. Then I would have to rerun my simulations at constant
>>>> volume. I assume from reading some previous posts on the forum that
>>>> removing
>>>> all statements pertaining to pressure and variable volume such as
>>>>
>>>> useGroupPressure yes ;# needed for rigidBonds
>>>> useFlexibleCell no
>>>> useConstantArea no
>>>> langevinPiston on
>>>> langevinPistonTarget 1.01325 ;# in bar -> 1 atm
>>>> langevinPistonPeriod 100.
>>>> langevinPistonDecay 50.
>>>>
>>>> from the conf file should do the trick.
>>>>
>>>>
>>>> Thanks for any help,
>>>> Steve
>>>>
>>>>
>>>>
>>>> Peter Freddolino wrote:
>>>>
>>>> By the way, regarding the text below... I do not think that you
>>>> should be
>>>> taking the energies of a subset of your system to use in the
>>>> analysis. You
>>>> really ought to be using the total energy of your system for doing
>>>> the type
>>>> of analysis that you're doing; otherwise you're basically
>>>> cherry-picking
>>>> part of the total energy of your system without regard for the
>>>> system as a
>>>> whole. There are several variants of WHAM for analyzing replica
>>>> exchange
>>>> data in the literature that should be applicable to your problem.
>>>> Also, the
>>>> namd replica exchange scripts don't contain a pressure correction
>>>> (Chem.
>>>> Phys. Lett. 335:435439.), so you should be running each simulation at
>>>> constant volume unless you've added it...
>>>> Best,
>>>> Peter
>>>>
>>>> Steven Samuel Plotkin wrote:
>>>>
>>>> Dear NAMD users,
>>>>
>>>> I have run namdEnergy on a dcd file, printing out All energies.
>>>> First I
>>>> noticed the kinetic energy was not printed, then I compared the
>>>> "TOTAL"
>>>> potential energy from namdEnergy with the TOTAL energy minus the
>>>> KINETIC
>>>> energy in my log file. The two are quite different -- about 370
>>>> kcal/mol on
>>>> average. I'm confused by this, I'm not sure what is being calculated
>>>> differently? I've provided links to my dcd, log, namdEnergy
>>>> settings, and
>>>> conf file are below. I've also posted a long version of my question
>>>> below to
>>>> help put it in context.
>>>>
>>>> (I'm using namd 2.6 and vmd 1.8.6).
>>>>
>>>> Thanks for any help,
>>>>
>>>> Steve P.
>>>>
>>>>
>>>> FILES:
>>>> log and dcd file:
>>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/fold_dse2.job0.0.log
>>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/fold_dse2.job0.0.dcd
>>>>
>>>> Conf files:
>>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/fold_dse2.conf
>>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/dse2_base.namd
>>>>
>>>> Screenshot of my namdEnergy gui, with settings:
>>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/Screenshot-NAMDEnergy.png
>>>>
>>>> xsc file (if needed):
>>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/fold_dse2.job0.0.xsc
>>>>
>>>>
>>>>
>>>> /Long version of my question:/
>>>>
>>>> I have been able to modify the Alanin replica exchange tcl code to
>>>> run on a
>>>> small peptide which has been solvated and is in ionic solution. I
>>>> am using
>>>> namd 2.6 on an x86_64 amd cluster. I ran 16 replicas on 8 CPUs
>>>> across 2
>>>> nodes of a cluster, with temperatures ranging from 280K to 440K. My
>>>> conf
>>>> file has some new information not in the alanin example because of my
>>>> solvated box: I have cellBasisVectors, wrapWater, wrapAll, PME, and
>>>> Pressure
>>>> control options including langevinPiston. I was careful to choose the
>>>> langevinPistonTemp to be $NEWTEMP in replica_exchange.tcl.
>>>>
>>>> I want to construct a free energy profile for the small peptide.
>>>> However
>>>> this must be obtained at one (target) temperature, so I reweighted all
>>>> states by what their Boltzmann weights should have been at the target
>>>> temperature. This doesn't work for me: because the system includes the
>>>> peptide plus the whole water box, after re-weighting, one
>>>> particular state
>>>> ends up having all the Boltzmann weight in the partition function.
>>>> I didn't
>>>> think this was feasible.
>>>>
>>>> So then what I must use is the total energy (KE +PE) of the protein
>>>> itself,
>>>> which includes self interactions /plus/ interactions with water and
>>>> ions.
>>>> Then I can reweight these energies at the target temperature and
>>>> get a new
>>>> partition function and free energy profile. I thought namdEnergy
>>>> would be
>>>> the tool for this.
>>>>
>>>> As a sanity check, I ran namdEnergy on one of my dcd files,
>>>> printing out
>>>> 'All' energies. First I noticed the kinetic energy was not printed,
>>>> then I
>>>> compared the "TOTAL" potential energy with the TOTAL energy minus the
>>>> KINETIC energy in my log file. The two are quite different -- about
>>>> 370
>>>> kcal/mol on average. However the average total potential energy is
>>>> about
>>>> 4460 kcal/mol, so the mean error is about 8%. Am I doing something
>>>> wrong
>>>> here? Or is this to be expected?
>>>>
>>>> At the end of the day what I believe I want is simply the
>>>> namdEnergy numbers
>>>> for selection 1= PROTEIN , selection 2= "everything else", added to
>>>> the
>>>> PROTEIN-PROTEIN namdEnergies. However the errors present in my
>>>> sanity check
>>>> are giving me pause that the namdEnergies are trustable.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:48:43 CST