Re: cuda_check_local_progress polled 1000000 times

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Tue Jun 05 2012 - 02:28:30 CDT

Hi Norman:

Neither applies. I used the final 2.9 and cuda-memtest did not reveal
anomalies with the GPUs. I'll change to cuda version 2.8. If not back
here, it means no problems for me with cuda version 2.8.

This was a known issue with 2.9 beta versions, although - as far as I
am concerned - it was limited to minimization. This is the first time
that I met those problems with MD. This time I am at amber parm7,
while in the past I was at charmm27, if it is relevant at all.

cheers
francesco

On Tue, Jun 5, 2012 at 7:29 AM, Norman Geist
<norman.geist_at_uni-greifswald.de> wrote:
> Hi Francesco,
>
> there are two possibilitys in my mind why this error occurs.
>
> 1. You are using a beta and the issue is fixed with the final 2.9 <-- more likely
> 2. You GPU is broken. <-- more unlikely
>
> Regards
>
> Norman Geist.
>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag von Francesco Pietra
>> Gesendet: Montag, 4. Juni 2012 19:21
>> An: NAMD
>> Betreff: Re: namd-l: cuda_check_local_progress polled 1000000 times
>>
>> Hello:
>> Now, with amber parm7 regular files, the system (protein in a water
>> box at 0.M NaCl concentration, and a few calcium++ ions), was
>> minimized with namd 2.9b.3 multicore, then heated gradually to 285K
>> with namd-cuda 2.9 (20,000 steps). Equilibration at such temp,, 1atm,
>> crashed with same error "namd-l: cuda_check_local_progress polled
>> 1000000 times" at step 18400, out of planned 500,000. Setting of the
>> conf file was the same as for successful MD/amber parm 7 with
>> namd-cuda 2.8 in the past.
>>
>> francesco pietra
>>
>> On Sat, Jun 2, 2012 at 9:06 PM, Francesco Pietra
>> <chiendarret_at_gmail.com> wrote:
>> > Hello:
>> > With namd-cuda 2.9 on a shared-mem machine with two GTX-580 (Debian
>> > amd64) minimization (ts 0.1fs, wrap all) on a new system of a protein
>> > in a water box, crashed at step 2,296 out of planned 10,000. Changing
>> > to 2.9b3 multicore, the minimization worked well, ending at grad 1.5.
>> > I did not notice if this known issue at the time of beta tests had
>> > been fixed.
>> >
>> > Thanks
>> > francesco pietra
>> >
>> >
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:04 CST