From: David Hardy (dhardy_at_ks.uiuc.edu)
Date: Wed Aug 30 2017 - 11:05:47 CDT
Please send me your NAMD config file and also the log file and backtrace produced by the initialization error.
Setting "useCUDA2 off" should be using the older short-range nonbonded CUDA kernels. Maybe also try setting "PMEOffload off" to see if that eliminates the simulation crashes. An earlier note I saw from Jim said that the old PMEOffload only works with a single GPU per node, something about how the new kernels ignore +devices and grab every GPU available.
-- David J. Hardy, Ph.D. Theoretical and Computational Biophysics Beckman Institute, University of Illinois dhardy_at_ks.uiuc.edu http://www.ks.uiuc.edu/~dhardy/ > On Aug 30, 2017, at 3:47 AM, Norman Geist <norman.geist_at_uni-greifswald.de> wrote: > > I exclude a hardware problem since especially this initialization problem it > happens on all of 6 GPU nodes and on another cluster we have access to, > containing 10 GPU nodes. I should also mention that we have a 4fs timestep > using the hydrogen mass repartitioning method trough the parmed utily > modifying the amber parm7 file. But this is no explanation for the > initialization error, maybe for the stability issues, but still it working > fine with 2.10 and 2.11. > > Thanks so far > > ;) > >> -----Ursprüngliche Nachricht----- >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im >> Auftrag von Nicholas M Glykos >> Gesendet: Mittwoch, 30. August 2017 10:05 >> An: Norman Geist <norman.geist_at_uni-greifswald.de> >> Cc: namd-l_at_ks.uiuc.edu; glykos_at_mbg.duth.gr >> Betreff: Re: AW: namd-l: NAMD-2.12 CUDA2 and PMECUDA problems >> >> >> >>> Yes, it is the nightly build. It's weird that I get such a backtrace >>> during the CUDA initialization already and nobody seems to have >>> encountered the same. I also get similar errors for GBIS with 2.11, >>> where CUDA acceleration has been changed for implicit solvent. >>> >>> If I disable useCUDA2 some of the systems run for a while, but most of >>> them crash later with e.g. segfault or by instability. Sometimes also >>> lot's of margin warnings occur inbetween. There's must still be a bug >>> somewhere in the new CUDA kernels. >> >> Yes, it is weird. Being a pessimist, I usually connect weirdness with >> hardware issues but you could be right that this is indeed a software >> problem. For the record I have used the new cuda kernels on machines with >> Xeon E5-2660v3 plus 2 x K40 without stability problems. Ditto for >> workstations with i7-6800 + GT1070. Good luck with it, I'm out of my depth >> here. >> >> >> >> -- >> >> >> Nicholas M. Glykos, Department of Molecular Biology >> and Genetics, Democritus University of Thrace, University Campus, >> Dragana, 68100 Alexandroupolis, Greece, Tel/Fax (office) +302551030620, >> Ext.77620, Tel (lab) +302551030615, http://utopia.duth.gr/glykos/ > >
This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:36 CST