Re: Unexplained segmentation faults in NAMD 2.9 using CUDA and GBIS

From: Aron Broom (broomsday_at_gmail.com)
Date: Thu Nov 29 2012 - 17:01:23 CST

Is there a chance it could be due to memory size? I've run a number of
CUDA GBIS simulations with NAMD 2.9 on C2070s without any problems. But my
system is a single small protein domain (~2k atoms). If I recall from the
AMBER website (not sure how this correlates with NAMD), implicit solvent
simulations take a fair amount of memory. I think the C2070s have 3GB?

~Aron

On Wed, Nov 28, 2012 at 6:40 AM, Tristan Croll <tristan.croll_at_qut.edu.au>wrote:

> Hi all,
>
> As per the subject line, I've been getting segmentation faults at
> seemingly random intervals when running implicit solvent simulations in the
> CUDA version of NAMD 2.9. Unlike most crash situations, this one doesn't
> throw up any error message other than "Segmentation Fault". Possibly
> related, I've also had a number of cases of simulations crashing during
> energy minimisation due to polling of the CUDA cards timing out.
>
> Relevant specs: I've seen the problem on two different machines, running
> different flavours of Linux. One is an 8-core Xeon (Nehalem) workstation
> with a single Tesla C2050, the other is a blade on our cluster (16-core
> Sandy Bridge Xeon with two C2070s).
>
> The simulation itself is of a rather large glycoprotein (glycans using the
> new forcefield parameters from the MacKerell lab). There are some fairly
> clear misfoldings in two domains (crystallisation artefacts or threading
> errors) which makes me suspect that the problem may be an energy term going
> out of range and being mishandled. On the other hand, continuing from the
> restart files after a crash (without reinitialising velocities) usually
> *doesn't* replicate the crash.
>
> The one thing I can clearly say is that it definitely seems to be the
> combination of GBIS and CUDA that is the problem – explicit solvent works
> fine (but is a poor choice for the TMD simulations I want to run), as does
> GBIS in the non-CUDA version of NAMD (but it's agonisingly slow for the
> system I'm simulating). I'd go with multi-node simulations, but the recent
> upgrade of our cluster seems to have broken its compatibility with the
> ibverbs NAMD build (the guys in charge of the cluster are working on that).
>
> Sorry to give you such a vague list of symptoms, but hopefully something
> in there will help.
>
> Cheers,
>
> Tristan
>

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:48 CST