Re: Overflow in LJcorrection?

From: Jason Swails (jason.swails_at_gmail.com)
Date: Thu Dec 03 2015 - 16:18:40 CST

Next message: Brian Radak: "Re: Overflow in LJcorrection?"
Previous message: Brian Radak: "Re: Overflow in LJcorrection?"
In reply to: Brian Radak: "Re: Overflow in LJcorrection?"
Next in thread: Brian Radak: "Re: Overflow in LJcorrection?"
Reply: Brian Radak: "Re: Overflow in LJcorrection?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

On Thu, Dec 3, 2015 at 5:15 PM, Brian Radak <bradak_at_anl.gov> wrote:

> Scratch that fix. Things blow up again if I include non-zero TIP3P
> hydrogens, even with "unsigned long long int"
>

In the same place?

>
>
> On 12/03/2015 04:10 PM, Brian Radak wrote:
>
> The error actually happens well before that, since LJAvgA and LJAvgB are
> first accumulated as sumOfAs, sumOfBs, and count.
>
> "count" and the N^2 terms are now "unsigned long int"'s. These seems to
> fix everything.
>
> This might be too kludgy for some, but why not just hardcode to use the A
> and B coefficients of a single type (e.g. TIP3P) if the number of atoms of
> that kind is so much larger than all others? I had heard that GROMACS did
> something like this and just considered it a "solvent dispersion
> correction." That might be more stable long term than an arms race in terms
> of memory footprint and hardly distinguishable in practice...
>
> Brian
>
> On 12/03/2015 04:04 PM, Jason Swails wrote:
>
> The tail correction is calculated as part of setup (it appears) in lines
> 9127 to 9133 in Molecule.C:
>
> if (simParams->switchingActive) {
> tail_corr_ener = tail_corr_virial =
> (16*numAtoms*numAtoms*PI*(-105*LJAvgB*rcut5*rswitch5 + LJAvgA*(3*rcut4 +
> 9*rcut3*rswitch + 11*rcut2*rswitch2 + 9*rcut*rswitch3 + 3*rswitch4)))/
> (315*rcut5*rswitch5*((rcut + rswitch)*(rcut + rswitch)*(rcut + rswitch)));
> }
> else {
> tail_corr_virial = -4 * numAtoms * numAtoms * PI * (3 * LJAvgB *
> rcut3 * rcut3 - 2 * LJAvgA) / (9 * rcut3 * rcut3 * rcut3);
> tail_corr_ener = 2 * numAtoms * numAtoms * PI * (LJAvgA - 3 *
> LJAvgB * rcut3 * rcut3) / (9 * rcut3 * rcut3 * rcut3);
> }
>
> The problem here is that 16 and -4 are both integers, as is numAtoms. So
> when the switching function is off, it starts computing
> 16*numAtoms*numAtoms*PI -- since multiplication is L->R associative,
> 16*numAtoms*numAtoms all gets stored in an integer, which overflows for
> large particle counts. Or when switching is off, -4*numAtoms*numAtoms,
> which also eventually overflows (it takes well less than 100K atoms for
> this to happen).
>
> If you moved PI to the *beginning* of the expression, or changed 16, 4,
> and 2 to doubles via 16.0., -4.0l, and 2.0l, then this problem should
> probably disappear. The way I fixed this in OpenMM was to explicitly cast
> numAtoms to double in this expression. But that's probably not necessary
> since if the leading factor is double the ints should get promoted
> automatically.
>
> Brian -- can you try that?
>
> All the best,
> Jason
>
> On Thu, Dec 3, 2015 at 4:55 PM, Peter Freddolino < <petefred_at_umich.edu>
> petefred_at_umich.edu> wrote:
>
>> Hi Brian,
>> Out of curiosity, is this dependent on having FEP or related settings on,
>> or no?
>> Thanks,
>> Peter
>>
>> > On Dec 3, 2015, at 4:45 PM, Brian Radak < <bradak_at_anl.gov>
>> bradak_at_anl.gov> wrote:
>> >
>> > I've been debugging/testing alchemical corrections to the LJcorrection
>> when I stumbled upon this weird behavior:
>> >
>> > For largish systems (I just threw and asp in a big box, 163737 atoms),
>> the average A and B coefficients comes back negative. This seems to go away
>> (sometimes?) if I zero out the hydrogen LJ epsilon parameter in the
>> parameter file. Some strategic print statements seem to suggest that the
>> "sumOfAs" and "sumOfBs" counters go negative when the next increment is
>> above ~1e12?
>> >
>> > Here is output for vdw type 60 (OT from TIP3P) and the counters for
>> sumOfAs and sumOfBs when it first goes negative.
>> > The increments should be (54469-1)*54469*A/B
>> >
>> > pair : 60/60 : A = 581924 : B = 595.015 | counts : 54469/54469
>> > 5.0349e+13-->-7.22533e+14
>> > 5.86584e+10-->-7.3161e+11
>> >
>> > The second column should be: 2.229940435e+14 and 1.8239593e+12
>> >
>> > Does this look like overflow? I can't imagine a double precision value
>> having trouble here, although the atom counts are 32 bit signed integers
>> (or whatever the compiler understands "int" to be).
>>
>>
>>
>
>
> --
> Jason M. Swails
> BioMaPS,
> Rutgers University
> Postdoctoral Researcher
>
>
> --
> Brian Radak
> Theta Early Science Program Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
>
> 9700 South Cass Avenue
> Building 240, 1.D.16
> Lemont, IL 60439-4871
> Tel: 630/252-8643
> email: bradak_at_anl.gov
>
>
> --
> Brian Radak
> Theta Early Science Program Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
>
> 9700 South Cass Avenue
> Building 240, 1.D.16
> Lemont, IL 60439-4871
> Tel: 630/252-8643
> email: bradak_at_anl.gov
>

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

Next message: Brian Radak: "Re: Overflow in LJcorrection?"
Previous message: Brian Radak: "Re: Overflow in LJcorrection?"
In reply to: Brian Radak: "Re: Overflow in LJcorrection?"
Next in thread: Brian Radak: "Re: Overflow in LJcorrection?"
Reply: Brian Radak: "Re: Overflow in LJcorrection?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:22:17 CST