Re: NAMD job dies on 2-quad core server

From: Gengbin Zheng (gzheng_at_illinois.edu)
Date: Thu Apr 09 2009 - 10:22:04 CDT

Vivek,

  When NAMD is busy doing communication (sending messages), or doing
load balancing, it may appear idle, or only one processor shows busy for
a short period of time. Also check if your job is running close to the
memory capacity (you can see that from "top"). Operating system may be
busy swapping your NAMD job to/from disk which also causes idle time.

Gengbin

vivek.viv.sharma_at_gmail.com wrote:
> Helllo Axel and all,
>
> Well, it was my mistake to mention in my previous post that 'job
> dies'. Well, in fact my 'this' statement was based on observing the
> 'top' command on the console. The job is running fine on all the 8
> processors. Its not dying, but every now and then the 'top' command
> shows that none of the processor is being used by NAMD. Like the NAMD
> process goes in background and comes back running again with all 8
> processors in use, as observed in the 'top' command. When I check with
> 'ps -e' all the NAMD jobs are there. Can anyone please throw some
> light on this, that why such a behaviour is being observed, that NAMD
> jobs appear-go in background-re-appear in the top command (?)
>
> Now, am I right in thinking that 'this-way' running of jobs will take
> more time than it should (?). I assume here that when 'top' comamnd
> does not show NAMD running, simulation is not running. (could be wrong
> though, might be something else is being done in this time).
>
> Axel, thanks for your points, I should have observed more before
> posting. Secondly, I really admire the beauty of this simple NAMD
> command which can be used to run the simulation without much
> installation work to be done.
>
> thanks and regards,
>
> Vivek
>
> On Apr 6, 2009 7:14pm, Axel Kohlmeyer <akohlmey_at_cmm.chem.upenn.edu> wrote:
> > On Mon, 2009-04-06 at 04:45 +0000, vivek.viv.sharma_at_gmail.com wrote:
> >
> > > Hello everyone,
> >
> > >
> >
> > > We have recently bought a machine with the following configuration:
> >
> > >
> >
> > > 2 quad core processors each with 2.33GHz clock rate.
> >
> > > 8 GB RAM
> >
> > > 500GB total hard disk
> >
> > >
> >
> > > I have simply used the "NAMD_2.6_Linux-i686" binaries. And, started
> >
> > > the simulation (membrane protein with membrane, water, ions.). The
> >
> > > simulation starts fine with the command
> >
> > >
> >
> > > ./charmrun ++local +p 8 ./namd2 config.txt > config.log &
> >
> > >
> >
> > > But after 4390 steps the job dies, without giving any error message.
> >
> > > Would you please suggest what is happening? Do I need to install
> >
> >
> >
> > how should anybody know??? does your input run fine elsewhere? have
> >
> > you looked at the trajectory? have you looked at the machine logs?
> >
> > does your os have restrictive limits for interactive use or stack
> >
> > memory? can you run the same job with less processors? how is the
> >
> > CPU temperature? is the crash reproducable? ...
> >
> >
> >
> > this list can go on for much longer. so please keep in mind that
> >
> > the kind of suggestion you can receive from a mailing list is
> >
> > directly proportional to the kind and quality of information
> >
> > you provide. in you case, you just say "it doesn't work". and
> >
> > only for one specific configuration. that is _very_ little.
> >
> >
> >
> > > it(NAMD) from scratch?
> >
> >
> >
> > why? first you have to find out what happens.
> >
> > blind activism never helps!
> >
> >
> >
> > >
> >
> > > NAMD in log file shows clearly:
> >
> > >
> >
> > > >> Info: Running on 8 processors.
> >
> > >
> >
> > > I observed from 'top', indeed simulation runs on all 8 processors,
> >
> > > using more or less efficiently all the processors.
> >
> >
> >
> > there has to be more output, and i am pretty certain that there is
> >
> > some output that indicates what is going wrong.
> >
> >
> >
> > > All your suggestions will be very helpful.
> >
> >
> >
> > well, you got a ton of them already. the most important
> >
> > one is to include more relevant information. there are
> >
> > many, many cases on this mailing list where people ask
> >
> > for help with problems, and you can easily derive from
> >
> > the dialog what information is needed and what _you_
> >
> > can do beforehand to verify and you are seeing a real
> >
> > problem and what information is need to narrow it down.
> >
> >
> >
> > cheers,
> >
> > axel.
> >
> >
> >
> >
> >
> > > thanks and regards,
> >
> > >
> >
> > > Vivek
> >
> > > IMM, India.
> >
> >
> >
> > --
> >
> > =======================================================================
> >
> > Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu
> >
> > Center for Molecular Modeling -- University of Pennsylvania
> >
> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
> >
> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
> >
> > =======================================================================
> >
> > If you make something idiot-proof, the universe creates a better idiot.
> >
> >
> >

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:50:44 CST