Re: ibverbs crash

From: Bjoern Olausson (bjoern.olausson_at_biochemtech.uni-halle.de)
Date: Wed Nov 04 2009 - 04:06:48 CST

On Tuesday 03 November 2009 21:17:17 David A. Horita wrote:
> Hi,
> I compiled the 2009/10/30 CVS with ibverbs, following the recent note from
> Bjoern Olausson, compile seems fine for charm 6.1.3 using intel 11.0.84
> and the apoaI benchmarks run fine. However, somewhere after 10,000 steps
> in an MD run, I'll get:
>
> ========
>
So what did you do now? Compiling 6.1.3 or CVS?
6.1.{2,3} will compile but there is a memory leak (not sure about 6.1.3 but I
would guess so) so it will crash after some time.

CVS will compile an run (It still crashes for me, but I am currently looking
into that with Gengbin Zheng [a charm++ developer])

> I had been writing restarts every 10,000 steps so I'm not exactly sure
> where it crashes. In an SMD run, it made it to 10,600 steps (writing
> every 200). Any ideas? Is this our hardware, a memory leak, or a bad
> compile of charm or namd? If it's hardware, any suggestions as to where
> to point our sysadmin?
>

Please verify that your simulation an hardware runs fine.
Compile a (for you) stable charm++ and namd version (egg. with mvapich or
whatever MPI implementation you prefer) and run your simulation to see if the
simulation and hardware are running stable before jumping on charm++ and
ibverbs.

If everything runs stable, use the latest NAMD CVS and CHARM CVS.

If you have a stable test system and are willing to help testing, I can add
you to CC for the discussion between me an Gengbin

By the way, on what hardware are you running charm++?

Cheers
Bjoern

-- 
Bjoern Olausson
Martin-Luther-Universität Halle-Wittenberg 
Fachbereich Biochemie/Biotechnologie
Kurt-Mothes-Str. 3
06120 Halle/Saale
Phone: +49-345-55-24942

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:53:27 CST