From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Thu Apr 05 2012 - 03:57:03 CDT
Yes, CUDA 2.8 of course, in order to compare.
The 2.9b2 CUDA is the night build of 2012-03-30.
Just to be sure that no hardware modification has intervened, I am now
running MD with the above 2.9b2. On another metalloprotein
(parametrization of the metal cluster carried out by myself, albeit
under the parm7 amber ff) that I was studying in the last few days. MD
is going on regularly. 600,000 steps are planned like in previous
cases.
Are you suggesting that a CUDA bug can come out with a particular
ensemble? In this case I am parameterizating with charmm 22/27. For
min, I am using 0.1fs timestep and, of course, rigid bonds for water
only.
francesco
On Thu, Apr 5, 2012 at 10:28 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:
> Was this a CUDA build?
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag von Francesco Pietra
>> Gesendet: Donnerstag, 5. April 2012 10:02
>> An: Norman Geist; NAMD
>> Betreff: Fwd: namd-l: Line minimizerfailure because of IMPR?
>>
>> Hello:
>>
>> I tried the same files with stable namd 2.8, getting partly the same
>> error message. I.e., now the minimization halted, at least for 20
>> minutes, without getting the Linux prompt
>>
>> Start/end of log file:
>>
>> ETITLE: TS BOND ANGLE DIHED
>> IMPRP ELECT VDW BOUNDARY MISC
>> KINETIC TOTAL TEMP POTENTIAL
>> TOTAL3 TEMPAVG PRESSURE GPRESSURE
>> VOLUME PRESSAVG GPRESSAVG
>>
>> ENERGY: 0 131516.3834 15951.4182 1089.1031
>> 80.7094 -208823.8451 4021786.0851 0.0000
>> 0.0000 0.0000 3961599.8541 0.0000
>> 3961599.8541 3961599.8541 0.0000 1373635.0459
>> 1388330.3068 672033.8185 1373635.0459 1388330.3068
>> .................................
>> ENERGY: 2131 9062.5391 6886.1278 1019.7076
>> 53.3039 -304342.7936 31732.7361 0.0000
>> 0.0000 0.0000 -255588.3792 0.0000
>> -255588.3792 -255588.3792 0.0000 -5055.1933
>> -4902.4727 672033.8185 -5055.1933 -4902.4727
>>
>> LINE MINIMIZER BRACKET: DX 0.000500355 0.00100071 DU -2.41055 117.048
>> DUDX -44805.5 35123.5 200226
>>
>> ---------------------------
>> Confusing enough, the gradient trend was now better:
>>
>> MINIMIZER STARTING CONJUGATE GRADIENT ALGORITHM
>> LINE MINIMIZER REDUCING GRADIENT FROM 9.95266e+08 TO 995266
>> MINIMIZER RESTARTING CONJUGATE GRADIENT ALGORITHM
>> ....................
>> LINE MINIMIZER REDUCING GRADIENT FROM 128305 TO 128.305
>> LINE MINIMIZER REDUCING GRADIENT FROM 131691 TO 131.691
>> LINE MINIMIZER REDUCING GRADIENT FROM 123700 TO 123.7
>>
>> ----------------------
>> Could you please suggest how to check from where the high VdW and
>> IMPR come? (I mean in terms of interatomic clashes, however, having no
>> indication as what atoms are flying out, I do not know where to look
>> for). This means that I believe that these problems come from a still
>> incorrect parameter files.
>>
>> thanks a lot
>>
>> francesco
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret_at_gmail.com>
>> Date: Thu, Apr 5, 2012 at 8:59 AM
>> Subject: Re: namd-l: Line minimizerfailure because of IMPR?
>> To: Norman Geist <norman.geist_at_uni-greifswald.de>
>>
>>
>> Hello Norman:
>> I was thinking to IMPR because the value is increasing, as you may
>> notice from what I reported. As it is a new (complex) parameterization
>> ..
>> Also, you may have noticed that VdW decreases in the first couple of
>> minimization steps, then no more, until the simulation crashes.
>>
>> I changed a little the IMPR parameters, getting the same error on
>> minimization, this type at step 209.
>>
>> At any event - although 2.9b2 in my hands and same hardware and
>> min.conf proved quite OK with another metalloprotein - your
>> suggestion to try with a stable version of NAMD has to be followed.
>> I'll come back soon.
>>
>> thanks
>>
>> francesco
>>
>> On Thu, Apr 5, 2012 at 8:08 AM, Norman Geist
>> <norman.geist_at_uni-greifswald.de> wrote:
>> > Hi,
>> >
>> > it's possible that there's a bug in the new implementation of the
>> minimization on the gpu. But I have seen this error before on a broken
>> gpu. Just to be sure, does this error occur on different gpus and only
>> with 2.9b2? I don't think it's a wrong setting for IMPR because that
>> would IMHO cause a unstable simulation and the simulation would just
>> crash with a message like "Simulation has become unstable", but this
>> message shows a missed answer from a gpu and could indicate a hardware
>> issue, or it was just happenstance. Is the error reproducible?
>> >
>> > Best wishes
>> > Norman Geist.
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> >> Auftrag von Francesco Pietra
>> >> Gesendet: Mittwoch, 4. April 2012 17:02
>> >> An: NAMD
>> >> Betreff: namd-l: Line minimizerfailure because of IMPR?
>> >>
>> >> Hi:
>> >> with cuda v 2.9b2 and 27 FF, I was trying to minimize a new system
>> >> comprising a new transition metal cluster. Minimization failed as
>> >> indicated by the start/end of .log file and gradient trend:
>> >>
>> >> .log file:
>> >>
>> >> ETITLE: TS BOND ANGLE DIHED
>> >> IMPRP ELECT VDW BOUNDARY
>> MISC
>> >> KINETIC TOTAL TEMP POTENTIAL
>> >> TOTAL3 TEMPAVG PRESSURE GPRESSURE
>> >> VOLUME PRESSAVG GPRESSAVG
>> >>
>> >> ENERGY: 0 131516.3834 15951.4182 1089.1031
>> >> 80.7094 -208823.8452 4021786.0851 0.0000
>> >> 0.0000 0.0000 3961599.8541 0.0000
>> >> 3961599.8541 3961599.8541 0.0000 1373635.0459
>> >> 1388330.3068 672033.8185 1373635.0459 1388330.3068
>> >> ..................................................................
>> >>
>> >> LINE MINIMIZER BRACKET: DX 1.62748e-48 1.80832e-43 DU -2.26438e-05
>> >> 8.74043e-06 DUDX 521930 521930 521930
>> >> ENERGY: 608 121296.7924 15556.9336 1102.0259
>> >> 130.0208 -214468.6419 18560.8579 0.0000
>> >> 0.0000 0.0000 -57822.0113 0.0000
>> >> -57822.0113 -57822.0113 0.0000 -14967.9037
>> >> -1012.7549 672033.8185 -14967.9037 -1012.7549
>> >>
>> >> LINE MINIMIZER BRACKET: DX 1.62748e-49 1.80832e-43 DU -1.45298e-05
>> >> 8.74043e-06 DUDX 521930 521930 521930
>> >> LINE MINIMIZER REDUCING GRADIENT FROM 4.44643e+08 TO 444643
>> >> FATAL ERROR: cuda_check_remote_progress polled 1000000 times over
>> >> 101.723663 s on step 609
>> >>
>> >>
>> >>
>> >>
>> >> Gradient:
>> >>
>> >> MINIMIZER STARTING CONJUGATE GRADIENT ALGORITHM
>> >> LINE MINIMIZER REDUCING GRADIENT FROM 9.95266e+08 TO 995266
>> >> ............................
>> >> LINE MINIMIZER REDUCING GRADIENT FROM 4.44643e+08 TO 444643
>> >>
>> >>
>> >> The structure, at the end of the crashed minimization, does not show
>> >> any major distortion. From the above files my impression is of a
>> wrong
>> >> setting of IMPR. I would be very grateful for confirming my feeling
>> ,
>> >> or suggesting otherwise.
>> >>
>> >> francesco pietra
>> >
>> >
>
>
This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:21:50 CST