Re: Overflow in LJcorrection?

From: Brian Radak (brian.radak.accts_at_gmail.com)
Date: Thu Dec 03 2015 - 16:26:19 CST

Yes? But in a different way. Now LJAvgA and LJAvgB are extremely large
and the correction becomes effectively infinite. I guess that means the
overflow is in the double now?

On 12/03/2015 04:18 PM, Jason Swails wrote:
>
>
> On Thu, Dec 3, 2015 at 5:15 PM, Brian Radak <bradak_at_anl.gov
> <mailto:bradak_at_anl.gov>> wrote:
>
> Scratch that fix. Things blow up again if I include non-zero TIP3P
> hydrogens, even with "unsigned long long int"
>
>
> ​In the same place?
>
> ​
>
>
>
> On 12/03/2015 04:10 PM, Brian Radak wrote:
>> The error actually happens well before that, since LJAvgA and
>> LJAvgB are first accumulated as sumOfAs, sumOfBs, and count.
>>
>> "count" and the N^2 terms are now "unsigned long int"'s. These
>> seems to fix everything.
>>
>> This might be too kludgy for some, but why not just hardcode to
>> use the A and B coefficients of a single type (e.g. TIP3P) if the
>> number of atoms of that kind is so much larger than all others? I
>> had heard that GROMACS did something like this and just
>> considered it a "solvent dispersion correction." That might be
>> more stable long term than an arms race in terms of memory
>> footprint and hardly distinguishable in practice...
>>
>> Brian
>>
>> On 12/03/2015 04:04 PM, Jason Swails wrote:
>>> The tail correction is calculated as part of setup (it appears)
>>> in lines 9127 to 9133 in Molecule.C:
>>>
>>> if (simParams->switchingActive) {
>>> tail_corr_ener = tail_corr_virial =
>>> (16*numAtoms*numAtoms*PI*(-105*LJAvgB*rcut5*rswitch5 +
>>> LJAvgA*(3*rcut4 + 9*rcut3*rswitch + 11*rcut2*rswitch2 +
>>> 9*rcut*rswitch3 + 3*rswitch4)))/ (315*rcut5*rswitch5*((rcut +
>>> rswitch)*(rcut + rswitch)*(rcut + rswitch)));
>>> }
>>> else {
>>> tail_corr_virial = -4 * numAtoms * numAtoms * PI * (3 *
>>> LJAvgB * rcut3 * rcut3 - 2 * LJAvgA) / (9 * rcut3 * rcut3 * rcut3);
>>> tail_corr_ener = 2 * numAtoms * numAtoms * PI * (LJAvgA
>>> - 3 * LJAvgB * rcut3 * rcut3) / (9 * rcut3 * rcut3 * rcut3);
>>> }
>>>
>>> The problem here is that 16 and -4 are both integers, as is
>>> numAtoms. So when the switching function is off, it starts
>>> computing 16*numAtoms*numAtoms*PI -- since multiplication is
>>> L->R associative, 16*numAtoms*numAtoms all gets stored in an
>>> integer, which overflows for large particle counts. Or when
>>> switching is off, -4*numAtoms*numAtoms, which also eventually
>>> overflows (it takes well less than 100K atoms for this to happen).
>>>
>>> If you moved PI to the *beginning* of the expression, or changed
>>> 16, 4, and 2 to doubles via 16.0., -4.0l, and 2.0l, then this
>>> problem should probably disappear. The way I fixed this in
>>> OpenMM was to explicitly cast numAtoms to double in this
>>> expression. But that's probably not necessary since if the
>>> leading factor is double the ints should get promoted automatically.
>>>
>>> Brian -- can you try that?
>>>
>>> All the best,
>>> Jason
>>>
>>> On Thu, Dec 3, 2015 at 4:55 PM, Peter Freddolino
>>> <petefred_at_umich.edu <mailto:petefred_at_umich.edu>> wrote:
>>>
>>> Hi Brian,
>>> Out of curiosity, is this dependent on having FEP or related
>>> settings on, or no?
>>> Thanks,
>>> Peter
>>>
>>> > On Dec 3, 2015, at 4:45 PM, Brian Radak <bradak_at_anl.gov
>>> <mailto:bradak_at_anl.gov>> wrote:
>>> >
>>> > I've been debugging/testing alchemical corrections to the
>>> LJcorrection when I stumbled upon this weird behavior:
>>> >
>>> > For largish systems (I just threw and asp in a big box,
>>> 163737 atoms), the average A and B coefficients comes back
>>> negative. This seems to go away (sometimes?) if I zero out
>>> the hydrogen LJ epsilon parameter in the parameter file.
>>> Some strategic print statements seem to suggest that the
>>> "sumOfAs" and "sumOfBs" counters go negative when the next
>>> increment is above ~1e12?
>>> >
>>> > Here is output for vdw type 60 (OT from TIP3P) and the
>>> counters for sumOfAs and sumOfBs when it first goes negative.
>>> > The increments should be (54469-1)*54469*A/B
>>> >
>>> > pair : 60/60 : A = 581924 : B = 595.015 | counts : 54469/54469
>>> > 5.0349e+13-->-7.22533e+14
>>> > 5.86584e+10-->-7.3161e+11
>>> >
>>> > The second column should be: 2.229940435e+14 and 1.8239593e+12
>>> >
>>> > Does this look like overflow? I can't imagine a double
>>> precision value having trouble here, although the atom
>>> counts are 32 bit signed integers (or whatever the compiler
>>> understands "int" to be).
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Jason M. Swails
>>> BioMaPS,
>>> Rutgers University
>>> Postdoctoral Researcher
>>
>> --
>> Brian Radak
>> Theta Early Science Program Postdoctoral Appointee
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> 9700 South Cass Avenue
>> Building 240, 1.D.16
>> Lemont, IL 60439-4871
>> Tel: 630/252-8643 <tel:630%2F252-8643>
>> email: bradak_at_anl.gov <mailto:bradak_at_anl.gov>
>
> --
> Brian Radak
> Theta Early Science Program Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
>
> 9700 South Cass Avenue
> Building 240, 1.D.16
> Lemont, IL 60439-4871
> Tel: 630/252-8643 <tel:630%2F252-8643>
> email: bradak_at_anl.gov <mailto:bradak_at_anl.gov>
>
>
>
>
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher

-- 
Brian Radak
Postdoctoral Scholar
Gordon Center for Integrative Science, W323A
Department of Biochemistry & Molecular Biology
University of Chicago
929 E. 57th St.
Chicago, IL 60637-1454
Tel: 773/834-2812
email: radak_at_uchicago.edu

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:22:17 CST