Re: Overflow in LJcorrection?

From: Brian Radak (bradak_at_anl.gov)
Date: Thu Dec 03 2015 - 16:10:59 CST

The error actually happens well before that, since LJAvgA and LJAvgB are
first accumulated as sumOfAs, sumOfBs, and count.

"count" and the N^2 terms are now "unsigned long int"'s. These seems to
fix everything.

This might be too kludgy for some, but why not just hardcode to use the
A and B coefficients of a single type (e.g. TIP3P) if the number of
atoms of that kind is so much larger than all others? I had heard that
GROMACS did something like this and just considered it a "solvent
dispersion correction." That might be more stable long term than an arms
race in terms of memory footprint and hardly distinguishable in practice...

Brian

On 12/03/2015 04:04 PM, Jason Swails wrote:
> The tail correction is calculated as part of setup (it appears) in
> lines 9127 to 9133 in Molecule.C:
>
> if (simParams->switchingActive) {
> tail_corr_ener = tail_corr_virial =
> (16*numAtoms*numAtoms*PI*(-105*LJAvgB*rcut5*rswitch5 + LJAvgA*(3*rcut4
> + 9*rcut3*rswitch + 11*rcut2*rswitch2 + 9*rcut*rswitch3 +
> 3*rswitch4)))/ (315*rcut5*rswitch5*((rcut + rswitch)*(rcut +
> rswitch)*(rcut + rswitch)));
> }
> else {
> tail_corr_virial = -4 * numAtoms * numAtoms * PI * (3 * LJAvgB
> * rcut3 * rcut3 - 2 * LJAvgA) / (9 * rcut3 * rcut3 * rcut3);
> tail_corr_ener = 2 * numAtoms * numAtoms * PI * (LJAvgA - 3 *
> LJAvgB * rcut3 * rcut3) / (9 * rcut3 * rcut3 * rcut3);
> }
>
> The problem here is that 16 and -4 are both integers, as is numAtoms.
> So when the switching function is off, it starts computing
> 16*numAtoms*numAtoms*PI -- since multiplication is L->R associative,
> 16*numAtoms*numAtoms all gets stored in an integer, which overflows
> for large particle counts. Or when switching is off,
> -4*numAtoms*numAtoms, which also eventually overflows (it takes well
> less than 100K atoms for this to happen).
>
> If you moved PI to the *beginning* of the expression, or changed 16,
> 4, and 2 to doubles via 16.0., -4.0l, and 2.0l, then this problem
> should probably disappear. The way I fixed this in OpenMM was to
> explicitly cast numAtoms to double in this expression. But that's
> probably not necessary since if the leading factor is double the ints
> should get promoted automatically.
>
> Brian -- can you try that?
>
> All the best,
> Jason
>
> On Thu, Dec 3, 2015 at 4:55 PM, Peter Freddolino <petefred_at_umich.edu
> <mailto:petefred_at_umich.edu>> wrote:
>
> Hi Brian,
> Out of curiosity, is this dependent on having FEP or related
> settings on, or no?
> Thanks,
> Peter
>
> > On Dec 3, 2015, at 4:45 PM, Brian Radak <bradak_at_anl.gov
> <mailto:bradak_at_anl.gov>> wrote:
> >
> > I've been debugging/testing alchemical corrections to the
> LJcorrection when I stumbled upon this weird behavior:
> >
> > For largish systems (I just threw and asp in a big box, 163737
> atoms), the average A and B coefficients comes back negative. This
> seems to go away (sometimes?) if I zero out the hydrogen LJ
> epsilon parameter in the parameter file. Some strategic print
> statements seem to suggest that the "sumOfAs" and "sumOfBs"
> counters go negative when the next increment is above ~1e12?
> >
> > Here is output for vdw type 60 (OT from TIP3P) and the counters
> for sumOfAs and sumOfBs when it first goes negative.
> > The increments should be (54469-1)*54469*A/B
> >
> > pair : 60/60 : A = 581924 : B = 595.015 | counts : 54469/54469
> > 5.0349e+13-->-7.22533e+14
> > 5.86584e+10-->-7.3161e+11
> >
> > The second column should be: 2.229940435e+14 and 1.8239593e+12
> >
> > Does this look like overflow? I can't imagine a double precision
> value having trouble here, although the atom counts are 32 bit
> signed integers (or whatever the compiler understands "int" to be).
>
>
>
>
>
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher

-- 
Brian Radak
Theta Early Science Program Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
9700 South Cass Avenue
Building 240, 1.D.16
Lemont, IL 60439-4871
Tel: 630/252-8643
email: bradak_at_anl.gov

This archive was generated by hypermail 2.1.6 : Tue Dec 27 2016 - 23:21:36 CST