Re: NAMD-2.12 CUDA2 and PMECUDA problems

From: David Hardy (dhardy_at_ks.uiuc.edu)
Date: Wed Aug 30 2017 - 11:05:47 CDT

Dear Norman,

Please send me your NAMD config file and also the log file and backtrace produced by the initialization error.

Setting "useCUDA2 off" should be using the older short-range nonbonded CUDA kernels. Maybe also try setting "PMEOffload off" to see if that eliminates the simulation crashes. An earlier note I saw from Jim said that the old PMEOffload only works with a single GPU per node, something about how the new kernels ignore +devices and grab every GPU available.

Thanks,
Dave

--
David J. Hardy, Ph.D.
Theoretical and Computational Biophysics
Beckman Institute, University of Illinois
dhardy_at_ks.uiuc.edu
http://www.ks.uiuc.edu/~dhardy/
> On Aug 30, 2017, at 3:47 AM, Norman Geist <norman.geist_at_uni-greifswald.de> wrote:
> 
> I exclude a hardware problem since especially this initialization problem it
> happens on all of 6 GPU nodes and on another cluster we have access to,
> containing 10 GPU nodes. I should also mention that we have a 4fs timestep
> using the hydrogen mass repartitioning method trough the parmed utily
> modifying the amber parm7 file. But this is no explanation for the
> initialization error, maybe for the stability issues, but still it working
> fine with 2.10 and 2.11.
> 
> Thanks so far
> 
> ;)
> 
>> -----Ursprüngliche Nachricht-----
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag von Nicholas M Glykos
>> Gesendet: Mittwoch, 30. August 2017 10:05
>> An: Norman Geist <norman.geist_at_uni-greifswald.de>
>> Cc: namd-l_at_ks.uiuc.edu; glykos_at_mbg.duth.gr
>> Betreff: Re: AW: namd-l: NAMD-2.12 CUDA2 and PMECUDA problems
>> 
>> 
>> 
>>> Yes, it is the nightly build. It's weird that I get such a backtrace
>>> during the CUDA initialization already and nobody seems to have
>>> encountered the same. I also get similar errors for GBIS with 2.11,
>>> where CUDA acceleration has been changed for implicit solvent.
>>> 
>>> If I disable useCUDA2 some of the systems run for a while, but most of
>>> them crash later with e.g. segfault or by instability. Sometimes also
>>> lot's of margin warnings occur inbetween. There's must still be a bug
>>> somewhere in the new CUDA kernels.
>> 
>> Yes, it is weird. Being a pessimist, I usually connect weirdness with
>> hardware issues but you could be right that this is indeed a software
>> problem. For the record I have used the new cuda kernels on machines with
>> Xeon E5-2660v3 plus 2 x K40 without stability problems. Ditto for
>> workstations with i7-6800 + GT1070. Good luck with it, I'm out of my depth
>> here.
>> 
>> 
>> 
>> --
>> 
>> 
>>            Nicholas M. Glykos, Department of Molecular Biology
>>     and Genetics, Democritus University of Thrace, University Campus,
>>  Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620,
>>    Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/glykos/
> 
> 

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:36 CST