Re: Bug advisory and workaround: alchemical FEP with IDWS can give wrong comparison energies

From: Jérôme Hénin (jerome.henin_at_ibpc.fr)
Date: Fri Nov 05 2021 - 12:10:35 CDT

Hi Haochuan,

I think your patch is extremely useful, performance-wise. This raises the question: when doing MTS, does it make sense to compute (and output) energies at steps that are not multiples of fullElectFrequency? Should computeEnergies be mandated to be a multiple of fullElectFrequency, similar to the test I just proposed for alchoutFreq ( [ https://urldefense.com/v3/__https://gitlab.com/tcbgUIUC/namd/-/merge_requests/99__;!!DZ3fjg!svAWPsI-po1zufW5U2-okMFQQ0MohRSuWLldZQNzCMg9GL2HgmLRP_etGXips2Edxw$ | https://urldefense.com/v3/__https://gitlab.com/tcbgUIUC/namd/-/merge_requests/99__;!!DZ3fjg!svAWPsI-po1zufW5U2-okMFQQ0MohRSuWLldZQNzCMg9GL2HgmLRP_etGXips2Edxw$ ] )?

About fepout files vs. standard output, I think the main benefit of a separate file is to have a large number of samples in a compact file. Right now the fepout format is too verbose. I am thinking of what would be a desirable new format for the fepout, which would be smaller and easier to parse. This could be introduced as an option, keeping the default to the legacy file format for compatibility. If you have ideas about desirable properties for such a format, I'd be interested in hearing them.

Best,
Jérôme

----- On 5 Nov 21, at 17:53, yjcoshc <yjcoshc_at_gmail.com> wrote:

> Hi Jérôme,

> In the development branch, I submitted a patch adding an option "
> computeEnergies " to control the period of computing energy terms, which
> defaults to be the " outputEnergies ", so if someone uses outputenergies 20
> fullElectFrequency 4

> and that will trigger an error to stop the simulation at start. You may see [
> https://charm.cs.illinois.edu/gerrit/c/namd/+/5514 |
> https://charm.cs.illinois.edu/gerrit/c/namd/+/5514 ] for more details.

> As for alchOutFreq , I personally agree to enforce it to be some multiple of "
> outputenergies " for the GPU version, but I am not sure whether the .fepout
> file will be deprecated since some people are in favor of the log from stdout.
> In the case that alchOutFreq is not some multiples of computeEnergies , NAMD3
> will show a warning like this:

> Warning: The period of outputting energies relating to alchemical
> transformations (alchOutFreq = 10) is not a multiple of the period of computing
> energies (computeEnergies = 20). If alchOutFreq is smaller than outputEnergies
> and computeEnergies is not defined, a better solution is to set comp
> uteEnergies explicitly and keep it the same as alchOutFreq. The simulation will
> use the greatest common divisor of computeEnergies and alchOutFreq as the
> period of energy evaluation.

> I don't know whether the NAMD should just stop here or not, since the greatest
> common divisor of " alchOutFreq " and " computeEnergies " is 10, not a multiple
> of " fullElectFrequency ". Hopefully you may propose a better solution to this
> mess.

> Thanks,

> Haochuan Chen

> On 11/5/21 11:12, Jérôme Hénin wrote:

>> Hi all,

>> Please be aware of a bug in current NAMD versions that can affect alchemical
>> free energies computed using IDWS-style alchemical perturbations. Below I give
>> workarounds to get correct free energy values from existing data affected by
>> the issue, and to make any new simulation exempt from the issue.

>> Affected versions of NAMD:
>> 2.14, 3.x (non-CUDA version), and development versions until now.

>> Condition triggering the issue:
>> The issue occurs if multiple time-step is used and alchOutFreq is greater than,
>> but not a multiple of, fullElectFreq.
>> Unfortunately, the current default value of alchOutFreq (5) is likely to fulfill
>> this condition.

>> Symptoms:
>> In this situation, some of the energy differences are computed erroneously -
>> specifically, dE values computed at time steps that are not multiples of the
>> outer time step include long-range dE values for the wrong target lambda.
>> More generally, even without IDWS, alchOutFreq should probably always be a
>> multiple of fullElectFreq to ensure that comparison energies are as accurate as
>> possible, and do not include stale long-range terms.

>> Fix and workaround:
>> For future simulations: define alchOutFreq as either 1 or a multiple of
>> fullElectFreq.
>> For existing simulations: re-analyze, discarding any data point from the fepout
>> file where the time step number is not a multiple of fullElectFreq.

>> Diagnosis:
>> In doubt, the issue can be diagnosed for a specific dataset by histogramming
>> forward and backward values of the electrostatic dE for an individual IDWS
>> window. The bug will result in a characteristic bimodal distribution.

>> I will submit a code patch very soon to fix this in a final way. In the
>> meantime, please circulate the information to users who might be affected.
>> Thanks to Ezry St. Iago-McRae (Brannigan lab, Rutgers Camden) for reporting the
>> issue.

>> Jérôme

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:12 CST