Re: Line minimizerfailure because of IMPR?

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Thu Apr 05 2012 - 03:57:03 CDT

Yes, CUDA 2.8 of course, in order to compare.

The 2.9b2 CUDA is the night build of 2012-03-30.

Just to be sure that no hardware modification has intervened, I am now
running MD with the above 2.9b2. On another metalloprotein
(parametrization of the metal cluster carried out by myself, albeit
under the parm7 amber ff) that I was studying in the last few days. MD
is going on regularly. 600,000 steps are planned like in previous
cases.

Are you suggesting that a CUDA bug can come out with a particular
ensemble? In this case I am parameterizating with charmm 22/27. For
min, I am using 0.1fs timestep and, of course, rigid bonds for water
only.

francesco

On Thu, Apr 5, 2012 at 10:28 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:
> Was this a CUDA build?
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag von Francesco Pietra
>> Gesendet: Donnerstag, 5. April 2012 10:02
>> An: Norman Geist; NAMD
>> Betreff: Fwd: namd-l: Line minimizerfailure because of IMPR?
>>
>> Hello:
>>
>> I tried the same files with stable namd 2.8, getting partly the same
>> error message. I.e., now the minimization halted, at least for 20
>> minutes, without getting the Linux prompt
>>
>> Start/end of log file:
>>
>> ETITLE:      TS           BOND          ANGLE          DIHED
>> IMPRP               ELECT            VDW       BOUNDARY           MISC
>>        KINETIC               TOTAL           TEMP      POTENTIAL
>>   TOTAL3        TEMPAVG            PRESSURE      GPRESSURE
>> VOLUME       PRESSAVG      GPRESSAVG
>>
>> ENERGY:       0    131516.3834     15951.4182      1089.1031
>> 80.7094        -208823.8451   4021786.0851         0.0000
>> 0.0000         0.0000        3961599.8541         0.0000
>> 3961599.8541   3961599.8541         0.0000        1373635.0459
>> 1388330.3068    672033.8185   1373635.0459   1388330.3068
>> .................................
>> ENERGY:    2131      9062.5391      6886.1278      1019.7076
>> 53.3039        -304342.7936     31732.7361         0.0000
>> 0.0000         0.0000        -255588.3792         0.0000
>> -255588.3792   -255588.3792         0.0000          -5055.1933
>> -4902.4727    672033.8185     -5055.1933     -4902.4727
>>
>> LINE MINIMIZER BRACKET: DX 0.000500355 0.00100071 DU -2.41055 117.048
>> DUDX -44805.5 35123.5 200226
>>
>> ---------------------------
>> Confusing enough, the gradient trend was now better:
>>
>> MINIMIZER STARTING CONJUGATE GRADIENT ALGORITHM
>> LINE MINIMIZER REDUCING GRADIENT FROM 9.95266e+08 TO 995266
>> MINIMIZER RESTARTING CONJUGATE GRADIENT ALGORITHM
>> ....................
>> LINE MINIMIZER REDUCING GRADIENT FROM 128305 TO 128.305
>> LINE MINIMIZER REDUCING GRADIENT FROM 131691 TO 131.691
>> LINE MINIMIZER REDUCING GRADIENT FROM 123700 TO 123.7
>>
>> ----------------------
>> Could you please suggest  how to check from where the high VdW and
>> IMPR come? (I mean in terms of interatomic clashes, however, having no
>> indication as what atoms are flying out, I do not know where to look
>> for). This means that I believe that these problems come from a still
>> incorrect parameter files.
>>
>> thanks a lot
>>
>> francesco
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret_at_gmail.com>
>> Date: Thu, Apr 5, 2012 at 8:59 AM
>> Subject: Re: namd-l: Line minimizerfailure because of IMPR?
>> To: Norman Geist <norman.geist_at_uni-greifswald.de>
>>
>>
>> Hello Norman:
>> I was thinking to IMPR because the value is increasing, as you may
>> notice from what I reported. As it is a new (complex) parameterization
>> ..
>> Also, you may have noticed that VdW decreases in the first couple of
>> minimization steps, then no more, until the simulation crashes.
>>
>> I changed a little the IMPR parameters, getting the same error on
>> minimization, this type at step 209.
>>
>> At any event - although 2.9b2 in my hands and same hardware and
>> min.conf proved quite OK with another metalloprotein -  your
>> suggestion to try with a stable version of NAMD has to be followed.
>> I'll come back soon.
>>
>> thanks
>>
>> francesco
>>
>> On Thu, Apr 5, 2012 at 8:08 AM, Norman Geist
>> <norman.geist_at_uni-greifswald.de> wrote:
>> > Hi,
>> >
>> > it's possible that there's a bug in the new implementation of the
>> minimization on the gpu. But I have seen this error before on a broken
>> gpu. Just to be sure, does this error occur on different gpus and only
>> with 2.9b2? I don't think it's a wrong setting for IMPR because that
>> would IMHO cause a unstable simulation and the simulation would just
>> crash with a message like "Simulation has become unstable", but this
>> message shows a missed answer from a gpu and could indicate a hardware
>> issue, or it was just happenstance. Is the error reproducible?
>> >
>> > Best wishes
>> > Norman Geist.
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> >> Auftrag von Francesco Pietra
>> >> Gesendet: Mittwoch, 4. April 2012 17:02
>> >> An: NAMD
>> >> Betreff: namd-l: Line minimizerfailure because of IMPR?
>> >>
>> >> Hi:
>> >> with cuda v 2.9b2 and 27 FF, I was trying to minimize a new system
>> >> comprising a new transition metal cluster. Minimization failed as
>> >> indicated by the  start/end of .log file and gradient trend:
>> >>
>> >> .log file:
>> >>
>> >> ETITLE:      TS           BOND          ANGLE          DIHED
>> >> IMPRP               ELECT            VDW       BOUNDARY
>> MISC
>> >>        KINETIC               TOTAL           TEMP      POTENTIAL
>> >>   TOTAL3        TEMPAVG            PRESSURE      GPRESSURE
>> >> VOLUME       PRESSAVG      GPRESSAVG
>> >>
>> >> ENERGY:       0    131516.3834     15951.4182      1089.1031
>> >> 80.7094        -208823.8452   4021786.0851         0.0000
>> >> 0.0000         0.0000        3961599.8541         0.0000
>> >> 3961599.8541   3961599.8541         0.0000        1373635.0459
>> >> 1388330.3068    672033.8185   1373635.0459   1388330.3068
>> >> ..................................................................
>> >>
>> >> LINE MINIMIZER BRACKET: DX 1.62748e-48 1.80832e-43 DU -2.26438e-05
>> >> 8.74043e-06 DUDX 521930 521930 521930
>> >> ENERGY:     608    121296.7924     15556.9336      1102.0259
>> >> 130.0208        -214468.6419     18560.8579         0.0000
>> >> 0.0000         0.0000         -57822.0113         0.0000
>> >> -57822.0113    -57822.0113         0.0000         -14967.9037
>> >> -1012.7549    672033.8185    -14967.9037     -1012.7549
>> >>
>> >> LINE MINIMIZER BRACKET: DX 1.62748e-49 1.80832e-43 DU -1.45298e-05
>> >> 8.74043e-06 DUDX 521930 521930 521930
>> >> LINE MINIMIZER REDUCING GRADIENT FROM 4.44643e+08 TO 444643
>> >> FATAL ERROR: cuda_check_remote_progress polled 1000000 times over
>> >> 101.723663 s on step 609
>> >>
>> >>
>> >>
>> >>
>> >> Gradient:
>> >>
>> >> MINIMIZER STARTING CONJUGATE GRADIENT ALGORITHM
>> >> LINE MINIMIZER REDUCING GRADIENT FROM 9.95266e+08 TO 995266
>> >> ............................
>> >> LINE MINIMIZER REDUCING GRADIENT FROM 4.44643e+08 TO 444643
>> >>
>> >>
>> >> The structure, at the end of the crashed minimization, does not show
>> >> any major distortion. From the above files my impression is of a
>> wrong
>> >> setting of IMPR. I would be very grateful for confirming my feeling
>> ,
>> >> or suggesting otherwise.
>> >>
>> >> francesco pietra
>> >
>> >
>
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:24 CST