Re: AW: AW: AW: Atoms moving too fast only with CUDA version.

From: amin_at_imtech.res.in
Date: Thu Jun 28 2012 - 01:13:07 CDT

Just after the minimization step, I have a heating step from 0 to 300K in 3000
steps where I have kept the CAs restrained. This restraint is absent in all the
other steps. Can this be the reason? Also I found that although the production
run started well with the CUDA version, after about 1.5 million steps, it
stopped writing any output but the all the processes where still running. I
waited for around 4 hours and then killed the run. I restarted the run and this
time I got segmentation fault after about half a million steps. I have restarted
again and right now its running at around half a million steps. I hope it turns
out to be a temporary issue.
Amin.

> I only use GPU versions of namd. For all systems, for all states of simulation
> and I never observed something like that, but I could imagine that you could
> have used a feature that is maybe currently broken in the GPU version. Have you
> used something special that you turned off after the equilibration run like
> restraints?
>
> Norman Geist.
>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag von amin_at_imtech.res.in
>> Gesendet: Mittwoch, 27. Juni 2012 16:41
>> An: Norman Geist
>> Cc: broomsday_at_gmail.com; namd-l_at_ks.uiuc.edu
>> Betreff: Re: AW: AW: namd-l: Atoms moving too fast only with CUDA
>> version.
>>
>> I completed the equilibration run on CPU and then tried the production
>> run using
>> NAMD2.9-CUDA and now it works without any error.Also my GPU memory
>> tests showed
>> no errors. So I believe the robustness of the integrator is the
>> key.Thanks for
>> the replies
>>
>> Amin.
>>
>>
>> > Also, as it is the initial step of your simulation, you could try to
>> remove
>> the
>> > restraint stuff and constant pressure and fixed atoms if you have and
>> see if
>> it's working. I remember someone with the same problem and that was due
>> to false
>> > defined restraints.
>> >
>> > Norman Geist.
>> >
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: Norman Geist [mailto:norman.geist_at_uni-greifswald.de]
>> >> Gesendet: Mittwoch, 27. Juni 2012 10:23
>> >> An: 'amin_at_imtech.res.in'
>> >> Cc: Namd Mailing List (namd-l_at_ks.uiuc.edu)
>> >> Betreff: AW: AW: namd-l: Atoms moving too fast only with CUDA
>> version.
>> >> > -----Ursprüngliche Nachricht-----
>> >> > Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag
>> von amin_at_imtech.res.in
>> >> > Gesendet: Mittwoch, 27. Juni 2012 09:10
>> >> > An: Norman Geist
>> >> > Cc: namd-l_at_ks.uiuc.edu
>> >> > Betreff: Re: AW: namd-l: Atoms moving too fast only with CUDA
>> >> version.
>> >> >
>> >> > I have only one GPU. I get the error after all the minimization
>> steps are
>> completed, just at the first heating step.
>> >> Yes, same for me. Minimization doesn't compute velocities, only
>> forces and
>> energies that get optimized. It's no real atom movement. It just moves
>> atoms
>> randomly a little amount, compute energies, see if total energy is
>> lower than
>> before. If it is lower it keeps the new positions, if not it goes back.
>> Than it
>> starts over. So a error computation during minimization causes only
>> that the
>> minimizer thinks it has done a bad move, but does not break the
>> simulation. A
>> too high force computed during molecular dynamic causes unusual
>> behavior and to
>> strong
>> >> velocities that break the simulation. You should try that memtest
>> thing. But
>> if it is a GPU or PCIE-BUS(on GPU) error, the memory test won't show up
>> I think.
>> The best would be to try another GPU. Bad that you only have one.
>> >> Also, does other molecular systems break the same way on the GPU?
>> Maybe try
>> some of the test systems from the namd site.
>> >> > Thanks.
>> >> > Amin.
>> >> >
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > >
>> >> > >
>> >> > > I had the same problem when I had a broken GPU. If you have
>> >> multiple
>> >> > GPUs, try
>> >> > > them separately to see if it only crashes when a special GPU
>> >> > participates.
>> >> > >
>> >> > > Also it would be important if you get the error directly at
>> start
>> >> or
>> >> > later.
>> >> > >
>> >> > >
>> >> > >
>> >> > > Good luck
>> >> > >
>> >> > >
>> >> > >
>> >> > > Norman Geist.
>> >> > >
>> >> > >
>> >> > >
>> >> > > Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu]
>> Im
>> >> > Auftrag von
>> >> > > Aron Broom
>> >> > > Gesendet: Mittwoch, 27. Juni 2012 07:52
>> >> > > An: amin_at_imtech.res.in
>> >> > > Cc: namd-l_at_ks.uiuc.edu
>> >> > > Betreff: Re: namd-l: Atoms moving too fast only with CUDA
>> version.
>> >> > >
>> >> > >
>> >> > >
>> >> > > I'm not sure you necessarily did anything wrong. I would
>> suggest
>> >> > that your
>> >> > > system even after 50,000 steps still has some kind of problems,
>> but
>> >> > the CPU
>> >> > > integrator is robust enough to muscle through it, whereas the
>> CUDA
>> >> > one is not.
>> >> > >
>> >> > > You should consider slowly heating your system from say 100K or
>> >> > something of the
>> >> > > sort, as I would imagine you have jumped straight to 300K which
>> >> > generally works,
>> >> > > but requires a decent starting point.
>> >> > >
>> >> > > Keep in mind that even though the minimizer in NAMD is smarter
>> than
>> >> > just
>> >> > > steepest descent, it will still be easily trapped in local
>> minima,
>> >> so
>> >> > doing more
>> >> > > minimization without some kind of dynamics is unlikely to get
>> you
>> >> > closer to the
>> >> > > global minimum and away from whatever problems you have.
>> >> > >
>> >> > > Did you have a look at the structure also, and which atoms are
>> >> > causing the
>> >> > > problem?
>> >> > >
>> >> > > ~Aron
>> >> > >
>> >> > > On Wed, Jun 27, 2012 at 1:37 AM, <amin_at_imtech.res.in> wrote:
>> >> > >
>> >> > > Dear all,
>> >> > > I am trying to run an equilibration using NAMD 2.9-CUDA on
>> >> Linux.
>> >> > However,
>> >> > > I keep getting "Atoms moving too fast error".I increased the
>> >> > minimization
>> >> > > upto 50000 steps but it doesn't work. But when I tried to run
>> the
>> >> > exact
>> >> > > same config file using the non-CUDA version it ran without any
>> >> error
>> >> > even
>> >> > > at 10000 minimization steps.And the error is reproducible. Can
>> >> > someone
>> >> > > please tell me what may have gone wrong.
>> >> > >
>> >> > > Regards.
>> >> > >
>> >> > > Amin.
>> >> > >
>> >> > >
>> >> >
>> >>
>> ______________________________________________________________________
>> >> > > सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक
>> >> औद्योगिक
>> >> > अनुसंधान परिषद)
>> >> > > Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT
>> OF
>> >> > CSIR)
>> >> > > सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh पिन
>> कोड/PIN CODE :160036
>> दूरभाष/EPABX :0172 6665 201-202
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > Aron Broom M.Sc
>> >> > > PhD Student
>> >> > > Department of Chemistry
>> >> > > University of Waterloo
>> >> > >
>> >> > >
>> >> >
>> >> >
>> >> >
>> >>
>> ______________________________________________________________________
>> >> > सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक
>> औद्योगिक अनुसंधान परिषद)
>> Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT OF
>> >> CSIR)
>> >> > सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh पिन
>> कोड/PIN CODE :160036
>> दूरभाष/EPABX :0172 6665 201-202
>> >
>> >
>> >
>>
>>
>>
>>
>>
>> ______________________________________________________________________
>> सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक औद्योगिक
>> अनुसंधान परिषद)
>> Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT OF CSIR)
>> सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh
>> पिन कोड/PIN CODE :160036
>> दूरभाष/EPABX :0172 6665 201-202
>
>

______________________________________________________________________
सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक औद्योगिक अनुसंधान परिषद)
Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT OF CSIR)
सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh
पिन कोड/PIN CODE :160036
दूरभाष/EPABX :0172 6665 201-202

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:11 CST