Re: namdEnergy gives significantly different values from those in log file?

From: Steven Samuel Plotkin (steve_at_physics.ubc.ca)
Date: Sat Nov 29 2008 - 21:39:07 CST

Dear Peter, Jerome,

Sampling using potential energy alone is an interesting suggestion. If I
were doing MC simulation I would not question it of course, however the
MD system has KE, and the Boltzmann weights in principle are not
exp(-PE_i/kT) but are exp(-E_i/kT). So I suppose the physical motivation
behind this must be that one can trace over the momenta or velocities
independent of the coordinates. I think this is almost always OK so long
as the system is in equilibrium (and potentials are not
velocity-dependent). I decided to check whether the KEs and PEs were
correlated, (which I suppose they could be if the system was not fully
in equilibrium). If they were correlated then one would have to use
E_tot I suppose:
http://www.physics.ubc.ca/~steve/NAMD/Repx/PEvsKE.jpg
But I would call that uncorrelated.

So I think you can generally use either PE or E_total for sampling
purposes. PE will have larger negative numbers which might make things
more difficult in practice. I don't think one is fundamentally better
than the other though, would you agree?

To be sure, I plotted my preliminary Free energy surface for both E_tot
and PE. They seem quite close, relatively speaking:
http://www.physics.ubc.ca/~steve/NAMD/Repx/FvsRMSDprelim2.jpg
I think if my sampling were better they would be even closer. The
ruggedness in my free E surface seems to indicate sampling is too
sparse. I thought 8000 data points would have been enough for this 1-d
surface, but it is apparently not.

Thanks again for your feedback,

Steve

Peter Freddolino wrote:
> Oops... I completely read past the fact that Steve was including the
> kinetic energies. I should have been more clear and said "total
> potential energy"; my point was that one should not just take the
> potential energy of the protein (or some other component of the
> system). Sorry for any confusion that generated; I shouldn't have
> missed that, but got caught up in the namdenergy problem of
> calculating the potential energies of interactions.
>
> And yes, it is now clear that Steve is properly accumulating
> microstates for analysis.
>
> Thanks,
> Peter
>
> Jerome Henin wrote:
>> Hi everyone,
>> Peter: to answer your question about a frame being a "state", here
>> Steve is using the word in the sense of microstate.
>>
>> Steve: since you are interested in statistics in conformation space,
>> if I were you I'd try reweighting based on the potential energy only
>> (using the target temperature as you did: I don't think the
>> instantaneous "temperature" is meaningful). You can always think of
>> your MD trajectories as if they were MC.
>>
>> Jerome
>>
>> On Wed, Nov 26, 2008 at 1:18 AM, Steven Samuel Plotkin
>> <steve_at_physics.ubc.ca> wrote:
>>
>>> Hi Peter,
>>>
>>> Yes I am careful to get good equilibrium sampling by writing each
>>> "state" to
>>> the log and dcd files every 20ps. The reweighting of states should be
>>> independent of whatever reduced coordinate I choose, I think.
>>> However there
>>> are 2 ways to do the reweighting, and I'm not sure which is the
>>> appropriate
>>> one. One is to use T_real for the current temperature of the state, the
>>> other is to use T_targ for the current temperature of the state. I
>>> noticed
>>> that the state that dominated my statistics had an anomalously high
>>> T_real.
>>> Here is a segment of the Temperatures T_real (blue) and T_targ
>>> (green) vs
>>> time:
>>>
>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/TtargAndTreal900-950.jpg
>>>
>>> The blue peak in the graph corresponds to the state that dominated my
>>> statistics. I think the reweighting is best done with T_targ rather
>>> than
>>> T_real, which gives a more robust answer. Does that make sense? The
>>> T_real's
>>> fluctuate about the target temperatures. ( I have to apologize for my
>>> confusing notation previously, where I used T_target for the
>>> temperature I
>>> want to obtain my free energy surface at. )
>>>
>>> Using T_targ instead of T_real alleviated but did not eliminate the
>>> problem.
>>> I think I simply need to collect more statistics. I thought 8000 states
>>> would be enough for a decent 1-dimensional free energy surface but
>>> apparently not in this case (the free E vs RMSD is quite rugged).
>>>
>>> The NVT simulations seem to be running fine.
>>>
>>> Thanks for your latest advice-- I do realize this thread has lately
>>> had more
>>> to do with data analysis than with the workings of NAMD.
>>>
>>> Cheers,
>>>
>>> Steve
>>>
>>>
>>> Peter Freddolino wrote:
>>>
>>> Hi Steve,
>>>
>>> I'm a little unclear on what you're doing... are you labeling as a
>>> "state"
>>> each single frame from your trajectory? This is unlikely to give good
>>> statistics as you need to average over solvent conformations and
>>> such (not
>>> to mention local fluctuations in the protein). Or are you defining
>>> some set
>>> of reduced coordinates to calculate your free energy surface? In the
>>> latter
>>> case I'd recommend looking over some papers in the literature with
>>> complete
>>> solutions for applying WHAM-style analysis to replica exchange
>>> simulations:
>>> J. Phys. Chem. B, 2005, 109 (14), pp 67226731
>>> JCTC, 2007, 3, pp 26-41
>>>
>>> All you should need to do to run at constant volume is turn off
>>> langevinPiston and comment out the settings following it.
>>>
>>> Best,
>>> Peter
>>>
>>> Steven Samuel Plotkin wrote:
>>>
>>> Hi Peter,
>>>
>>> I think you are right, and this was why I took the total energies
>>> initially,
>>> which includes both KE and PE of the whole peptide plus water. This
>>> is what
>>> governs the original sampling probabilities, so I have discarded the
>>> idea of
>>> cherry-picking parts of the system. To find the re-weighted
>>> probabilities
>>> for each state I have been simply reweighting each state by the
>>> factor:
>>> exp( E_i ( beta_i - beta_targ) )
>>> where E_i is the total energy of the state, beta_i is 1 over kB
>>> times the
>>> temperature at that sample time, and beta_targ is 1 over kB times
>>> the target
>>> temperature. This should be fine I think, at least for const volume
>>> simulations.
>>>
>>> After doing this re-weighting however, the new exponents in the
>>> Boltzmann
>>> factors are (when sorted from highest)
>>> 1673.3
>>> 1657
>>> 1638
>>> 1623.2
>>> 1616.6
>>> 1606.1
>>> 1596.6
>>> .
>>> .
>>> .
>>> so one particular state ends up having all the Boltzmann weight in the
>>> partition function (i.e. exp(16.3) more weight than the next weightiest
>>> state). I didn't think this was reasonable, it says entropy isnt
>>> important
>>> in the partition function. What are the solutions for this problem?
>>> Perhaps
>>> more sampled configurations, but I have 8000 now which I thought
>>> should be
>>> enough to at least get a rough idea what the potential looks like.
>>> Maybe I
>>> need a narrower temperature range- the beta_i - beta_targ factors
>>> are making
>>> the exponents too large. However previous studies of REMD seem to
>>> have gone
>>> to 500K without difficulty... I must be missing something.
>>>
>>> I did not add a pressure correction to the replica exchange scripts. I
>>> hadn't thought this through carefully enough and didn't know the REMD
>>> scripts had not accounted for it. According to the reference you
>>> mentioned,
>>> there is an extra term in the transition matrix, and an extra term
>>> in the
>>> probabilities ~ Vi^N exp(-beta P V_i ). I will have to check how
>>> large this
>>> factor is. I don't think the volume fluctuations should be very
>>> large but I
>>> could be wrong. Then I would have to rerun my simulations at constant
>>> volume. I assume from reading some previous posts on the forum that
>>> removing
>>> all statements pertaining to pressure and variable volume such as
>>>
>>> useGroupPressure yes ;# needed for rigidBonds
>>> useFlexibleCell no
>>> useConstantArea no
>>> langevinPiston on
>>> langevinPistonTarget 1.01325 ;# in bar -> 1 atm
>>> langevinPistonPeriod 100.
>>> langevinPistonDecay 50.
>>>
>>> from the conf file should do the trick.
>>>
>>>
>>> Thanks for any help,
>>> Steve
>>>
>>>
>>>
>>> Peter Freddolino wrote:
>>>
>>> By the way, regarding the text below... I do not think that you
>>> should be
>>> taking the energies of a subset of your system to use in the
>>> analysis. You
>>> really ought to be using the total energy of your system for doing
>>> the type
>>> of analysis that you're doing; otherwise you're basically
>>> cherry-picking
>>> part of the total energy of your system without regard for the
>>> system as a
>>> whole. There are several variants of WHAM for analyzing replica
>>> exchange
>>> data in the literature that should be applicable to your problem.
>>> Also, the
>>> namd replica exchange scripts don't contain a pressure correction
>>> (Chem.
>>> Phys. Lett. 335:435439.), so you should be running each simulation at
>>> constant volume unless you've added it...
>>> Best,
>>> Peter
>>>
>>> Steven Samuel Plotkin wrote:
>>>
>>> Dear NAMD users,
>>>
>>> I have run namdEnergy on a dcd file, printing out All energies. First I
>>> noticed the kinetic energy was not printed, then I compared the "TOTAL"
>>> potential energy from namdEnergy with the TOTAL energy minus the
>>> KINETIC
>>> energy in my log file. The two are quite different -- about 370
>>> kcal/mol on
>>> average. I'm confused by this, I'm not sure what is being calculated
>>> differently? I've provided links to my dcd, log, namdEnergy
>>> settings, and
>>> conf file are below. I've also posted a long version of my question
>>> below to
>>> help put it in context.
>>>
>>> (I'm using namd 2.6 and vmd 1.8.6).
>>>
>>> Thanks for any help,
>>>
>>> Steve P.
>>>
>>>
>>> FILES:
>>> log and dcd file:
>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/fold_dse2.job0.0.log
>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/fold_dse2.job0.0.dcd
>>>
>>> Conf files:
>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/fold_dse2.conf
>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/dse2_base.namd
>>>
>>> Screenshot of my namdEnergy gui, with settings:
>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/Screenshot-NAMDEnergy.png
>>>
>>> xsc file (if needed):
>>> http://www.physics.ubc.ca/~steve/NAMD/Repx/fold_dse2.job0.0.xsc
>>>
>>>
>>>
>>> /Long version of my question:/
>>>
>>> I have been able to modify the Alanin replica exchange tcl code to
>>> run on a
>>> small peptide which has been solvated and is in ionic solution. I am
>>> using
>>> namd 2.6 on an x86_64 amd cluster. I ran 16 replicas on 8 CPUs across 2
>>> nodes of a cluster, with temperatures ranging from 280K to 440K. My
>>> conf
>>> file has some new information not in the alanin example because of my
>>> solvated box: I have cellBasisVectors, wrapWater, wrapAll, PME, and
>>> Pressure
>>> control options including langevinPiston. I was careful to choose the
>>> langevinPistonTemp to be $NEWTEMP in replica_exchange.tcl.
>>>
>>> I want to construct a free energy profile for the small peptide.
>>> However
>>> this must be obtained at one (target) temperature, so I reweighted all
>>> states by what their Boltzmann weights should have been at the target
>>> temperature. This doesn't work for me: because the system includes the
>>> peptide plus the whole water box, after re-weighting, one particular
>>> state
>>> ends up having all the Boltzmann weight in the partition function. I
>>> didn't
>>> think this was feasible.
>>>
>>> So then what I must use is the total energy (KE +PE) of the protein
>>> itself,
>>> which includes self interactions /plus/ interactions with water and
>>> ions.
>>> Then I can reweight these energies at the target temperature and get
>>> a new
>>> partition function and free energy profile. I thought namdEnergy
>>> would be
>>> the tool for this.
>>>
>>> As a sanity check, I ran namdEnergy on one of my dcd files, printing
>>> out
>>> 'All' energies. First I noticed the kinetic energy was not printed,
>>> then I
>>> compared the "TOTAL" potential energy with the TOTAL energy minus the
>>> KINETIC energy in my log file. The two are quite different -- about 370
>>> kcal/mol on average. However the average total potential energy is
>>> about
>>> 4460 kcal/mol, so the mean error is about 8%. Am I doing something
>>> wrong
>>> here? Or is this to be expected?
>>>
>>> At the end of the day what I believe I want is simply the namdEnergy
>>> numbers
>>> for selection 1= PROTEIN , selection 2= "everything else", added to the
>>> PROTEIN-PROTEIN namdEnergies. However the errors present in my
>>> sanity check
>>> are giving me pause that the namdEnergies are trustable.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:48:43 CST