Re: Overflow in LJcorrection?

From: Jason Swails (jason.swails_at_gmail.com)
Date: Thu Dec 03 2015 - 16:04:28 CST

The tail correction is calculated as part of setup (it appears) in lines
9127 to 9133 in Molecule.C:

      if (simParams->switchingActive) {
        tail_corr_ener = tail_corr_virial =
(16*numAtoms*numAtoms*PI*(-105*LJAvgB*rcut5*rswitch5 + LJAvgA*(3*rcut4 +
9*rcut3*rswitch + 11*rcut2*rswitch2 + 9*rcut*rswitch3 + 3*rswitch4)))/
(315*rcut5*rswitch5*((rcut + rswitch)*(rcut + rswitch)*(rcut + rswitch)));
      }
      else {
        tail_corr_virial = -4 * numAtoms * numAtoms * PI * (3 * LJAvgB *
rcut3 * rcut3 - 2 * LJAvgA) / (9 * rcut3 * rcut3 * rcut3);
        tail_corr_ener = 2 * numAtoms * numAtoms * PI * (LJAvgA - 3 *
LJAvgB * rcut3 * rcut3) / (9 * rcut3 * rcut3 * rcut3);
      }

The problem here is that 16 and -4 are both integers, as is numAtoms. So
when the switching function is off, it starts computing
16*numAtoms*numAtoms*PI -- since multiplication is L->R associative,
16*numAtoms*numAtoms all gets stored in an integer, which overflows for
large particle counts. Or when switching is off, -4*numAtoms*numAtoms,
which also eventually overflows (it takes well less than 100K atoms for
this to happen).

If you moved PI to the *beginning* of the expression, or changed 16, 4, and
2 to doubles via 16.0., -4.0l, and 2.0l, then this problem should probably
disappear. The way I fixed this in OpenMM was to explicitly cast numAtoms
to double in this expression. But that's probably not necessary since if
the leading factor is double the ints should get promoted automatically.

Brian -- can you try that?

All the best,
Jason

On Thu, Dec 3, 2015 at 4:55 PM, Peter Freddolino <petefred_at_umich.edu> wrote:

> Hi Brian,
> Out of curiosity, is this dependent on having FEP or related settings on,
> or no?
> Thanks,
> Peter
>
> > On Dec 3, 2015, at 4:45 PM, Brian Radak <bradak_at_anl.gov> wrote:
> >
> > I've been debugging/testing alchemical corrections to the LJcorrection
> when I stumbled upon this weird behavior:
> >
> > For largish systems (I just threw and asp in a big box, 163737 atoms),
> the average A and B coefficients comes back negative. This seems to go away
> (sometimes?) if I zero out the hydrogen LJ epsilon parameter in the
> parameter file. Some strategic print statements seem to suggest that the
> "sumOfAs" and "sumOfBs" counters go negative when the next increment is
> above ~1e12?
> >
> > Here is output for vdw type 60 (OT from TIP3P) and the counters for
> sumOfAs and sumOfBs when it first goes negative.
> > The increments should be (54469-1)*54469*A/B
> >
> > pair : 60/60 : A = 581924 : B = 595.015 | counts : 54469/54469
> > 5.0349e+13-->-7.22533e+14
> > 5.86584e+10-->-7.3161e+11
> >
> > The second column should be: 2.229940435e+14 and 1.8239593e+12
> >
> > Does this look like overflow? I can't imagine a double precision value
> having trouble here, although the atom counts are 32 bit signed integers
> (or whatever the compiler understands "int" to be).
>
>
>

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:22:17 CST