Re: BUG: ReplicaUniformPatchGrid namd-2.14

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Tue May 31 2022 - 10:35:48 CDT

Hi Norman, can you send to the Urbana folks a minimal input deck (small
system, few replicas, few steps) that can reproduce the problem to share,
possibly with a pre-compiled build?

I'm not sure that this would necessarily guarantee that the issue will be
fixed soon, but it would be important for others to confirm. I've recently
noticed (and reported to them) that "reinitatoms" invalidates the last
value of "soluteScalingFactor" provided in the script. In contrast with
what the doc says, "reinitatoms" does a lot more than just changing
positions, velocities and unit cell. I'm wondering if your issue is
related?

If it's unrelated, and the in-memory checkpointing is buggy, may I suggest
writing to /dev/shm/ (if you're using Linux) to achieve the same goal of
bypassing the disk?

Giacomo

On Tue, May 31, 2022 at 8:40 AM Geist, Norman <
norman.geist_at_uni-greifswald.de> wrote:

> Hey again,
>
> after trying various things (the whole day) I was unable to resolve what
> the problem with the checkpoints is. While there are no energy spikes
> during the simulation, the DCD and restart files clearly contain very close
> contacts (water-water and water-protein). Upon restarting energies are
> basically inf. I now switched from CheckPointStore/Load to
> ReplicaAtomSendRecv to exchange the states between different hamiltonians
> to do the same thing and this solves the issue.
>
> I still want to stress that something is broken with the checkpoints,
> though. Most likely a race condition in collecting the patches when storing
> the checkpoints. It may be, that either patches get mixed between replicas,
> or more likely, patches are outdated when being merged.
>
> Bests
> Norman
>
> Am Dienstag, den 31-05-2022 um 08:29 schrieb Geist, Norman:
>
> Hey there, I've reported this before for a beta of NAMD-2.13 and the
> problem is still present in 2.14:
>
>
> It seems something is generally going wrong with
> "ReplicaUniformPatchGrid". I’ve observed this with my own variant of
>
> H-REMD where I used CheckpointStore and CheckpointLoad to swap the
> coordinates between the Hamiltonians. This runs ok for the first “job” but
> the restart files that are written in-between (as in original REMD) using
> "output"
>
> are broken as they contain overlapping water molecules. Not only that,
> even the coordinates in the DCD files contain very close contacts of below
> 1A for water-water and protein-water contacts.
>
>
> I’ve worked around it using just “output” and “reinitatoms” for coordinate
> swapping, but the in-memory solution with global checkpoints would of
> course be the cleaner solution, as accessive use of the output command
> often overwhelms parallel filesystems such as lustre or beegeefs.
>
>
>
> For clarity, it was only a dihedral scaling H-REMD, so a scaling of VDW
> interactions is not the reason for the overlapping waters, which rather
> seem to be a problem with collecting the patches that are probably mixed
> between replicas.
>
>
> Any thoughts?
>
>
>
> Norman Geist
>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST