Re: Decreasing performance of cluster running FEP

From: Brian Radak (brian.radak_at_gmail.com)
Date: Sat Jul 14 2018 - 09:24:47 CDT

Follow up - what are the values of alchElectLambdaStart and
alchvdWLambaEnd? The former in particular may change the cost of PME,
especially if you have alchDecouple on.

On Thu, Jul 12, 2018, 9:24 AM Brian Radak <brian.radak_at_gmail.com> wrote:

> Determining if colvars or FEP is the culprit here is a necessary first
> step. We need a minimal example that reproduces the issue.
>
> Does the slowdown only occur on the cluster? When running on multiple
> nodes? Does the problem occur sooner if you run fewer steps per lambda or
> does it occur after a set walltime?
>
> On Thu, Jul 12, 2018, 3:00 AM Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>
>> I was also perplexed at the performance degrading as the lambda changes,
>> which occurred soon or later, not always at the same lambda value, and to
>> the same extent when either one (ligand alone) or four nodes
>> (ligand+protein) are involved.
>>
>> As I said, the rmsd is good, in particular the structure and pose of the
>> ligand (a polycyclic diterpenoid with a mobile side chain, rather exotic
>> structure) is well conserved during FEP. The ligand was parameterized
>> charmm36 with dih fitting at HF/6-31G* level and MD equilibration was
>> pretty long (>100ns) with absolutely flat rmsd/frame.
>>
>> The only I can do (actually I am just doing that) is decreasing the number
>> of steps per lambda in order to keep the calculation within 70 hours
>> (which
>> still requires a special permission at the cluster). Hopefully it will not
>> bring the calculation out of pseudo-convergence. Which occurred, as
>> expected, when I tried by decreasing the number of windows, while
>> increasing the number of steps per window.
>>
>> Unfortunately there is little specific recent literature with namd/FEP
>> for
>> complicated organic ligands. This is why I asked you about topogromacs to
>> compare with gromacs running charmm36. However, even the literature of FEP
>> with gromacs is limited to rather simple organic ligands and, what
>> surprised me very much, in accordance with experiments while the ligands
>> had been parameterized with gaff ff at semiempirical level. Probably I'll
>> see all these affairs with a different eye when my experience is ripe.
>>
>> francesco
>>
>> On Thu, Jul 12, 2018 at 1:23 AM Vermaas, Joshua <Joshua.Vermaas_at_nrel.gov>
>> wrote:
>>
>> > Colvars are indeed driven by a single CPU. Most of the colvars perform
>> > well if the number of atoms involved isn't too big, and bond lengths and
>> > angles are typical examples of that. But if you are asking for colvars
>> that
>> > involve many atoms in a complicated relationship, performance isn't all
>> > that good. To me, the weird thing is that the performance degrades only
>> as
>> > the lambda changes. Are you getting any absurd bonds as the trajectory
>> > progresses?
>> >
>> > -Josh
>> >
>> >
>> >
>> > On 2018-07-11 15:32:12-06:00 Francesco Pietra wrote:
>> >
>> > Thanks for your answer.
>> >
>> > 36 core Intel® Xeon® Broadwell/node, memory 115Gb/node, so that the
>> > problems are to look for elsewhere.
>> >
>> > 50555 atoms, including waters, whereby 4 nodes proved to be the best
>> > choice for MD
>> > where the performance was excellent.
>> >
>> > With the ligand alone in water the best choice proved to be one node.
>> >
>> > In retrospect, are colvars driven by a single CPU? Is that the problem?
>> I
>> > could not set less colvars that I described in order
>> > to maintain the ligand in place.
>> > francesco
>> >
>> > On Wed, Jul 11, 2018 at 7:21 PM Vermaas, Joshua <
>> Joshua.Vermaas_at_nrel.gov>
>> > wrote:
>> >
>> >> What is the hardware on your cluster? FEP is not accelerated with GPUs.
>> >> Neither are colvars, which is I think where the problem may actually
>> be.
>> >> How many atoms are in your colvar definitions?
>> >>
>> >> -Josh
>> >>
>> >>
>> >>
>> >> On 2018-07-10 23:36:51-06:00 owner-namd-l_at_ks.uiuc.edu wrote:
>> >>
>> >> Hello:
>> >> I am observing a marked decrease in the performance of a NextScale
>> >> cluster running a FEP for protein-ligand, previously equilibrated for
>> over
>> >> 100ns. No such problems when running MD equilibration on the same
>> system
>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:04 CST