Re: Re: NAMD job dies on 2-quad core server

From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Thu Apr 09 2009 - 09:03:24 CDT

On Thu, 2009-04-09 at 06:44 +0000, vivek.viv.sharma_at_gmail.com wrote:
> Helllo Axel and all,

vivek,

> Well, it was my mistake to mention in my previous post that 'job
> dies'. Well, in fact my 'this' statement was based on observing the
> 'top' command on the console. The job is running fine on all the 8
> processors. Its not dying, but every now and then the 'top' command
> shows that none of the processor is being used by NAMD. Like the NAMD
> process goes in background and comes back running again with all 8
> processors in use, as observed in the 'top' command. When I check with
> 'ps -e' all the NAMD jobs are there. Can anyone please throw some
> light on this, that why such a behaviour is being observed, that NAMD
> jobs appear-go in background-re-appear in the top command (?)

of course this is not supposed to happen, but then again, please
consider a more systematic approach, if you want any help. without
any details or a sufficent description on how to reproduce the behavior,
the only answer as to why this happens is:

_you_ must be doing something wrong.

please first verify that you can run a serial job (best one of the
tutorial examples) and then try a (small) parallel job.

> Now, am I right in thinking that 'this-way' running of jobs will take
> more time than it should (?). I assume here that when 'top' comamnd
> does not show NAMD running, simulation is not running. (could be wrong
> though, might be something else is being done in this time).

since none of us here can "look into" your machine. if you are not
certain about how the machine is being run, please contact a _local_
expert to have a look. there is only so much diagnosis that can be
done from remote.

cheers,
   axel.

>
> Axel, thanks for your points, I should have observed more before
> posting. Secondly, I really admire the beauty of this simple NAMD
> command which can be used to run the simulation without much
> installation work to be done.
>
> thanks and regards,
>
> Vivek
>
> On Apr 6, 2009 7:14pm, Axel Kohlmeyer <akohlmey_at_cmm.chem.upenn.edu>
> wrote:
> > On Mon, 2009-04-06 at 04:45 +0000, vivek.viv.sharma_at_gmail.com wrote:
> >
> > > Hello everyone,
> >
> > >
> >
> > > We have recently bought a machine with the following
> configuration:
> >
> > >
> >
> > > 2 quad core processors each with 2.33GHz clock rate.
> >
> > > 8 GB RAM
> >
> > > 500GB total hard disk
> >
> > >
> >
> > > I have simply used the "NAMD_2.6_Linux-i686" binaries. And,
> started
> >
> > > the simulation (membrane protein with membrane, water, ions.). The
> >
> > > simulation starts fine with the command
> >
> > >
> >
> > > ./charmrun ++local +p 8 ./namd2 config.txt > config.log &
> >
> > >
> >
> > > But after 4390 steps the job dies, without giving any error
> message.
> >
> > > Would you please suggest what is happening? Do I need to install
> >
> >
> >
> > how should anybody know??? does your input run fine elsewhere? have
> >
> > you looked at the trajectory? have you looked at the machine logs?
> >
> > does your os have restrictive limits for interactive use or stack
> >
> > memory? can you run the same job with less processors? how is the
> >
> > CPU temperature? is the crash reproducable? ...
> >
> >
> >
> > this list can go on for much longer. so please keep in mind that
> >
> > the kind of suggestion you can receive from a mailing list is
> >
> > directly proportional to the kind and quality of information
> >
> > you provide. in you case, you just say "it doesn't work". and
> >
> > only for one specific configuration. that is _very_ little.
> >
> >
> >
> > > it(NAMD) from scratch?
> >
> >
> >
> > why? first you have to find out what happens.
> >
> > blind activism never helps!
> >
> >
> >
> > >
> >
> > > NAMD in log file shows clearly:
> >
> > >
> >
> > > >> Info: Running on 8 processors.
> >
> > >
> >
> > > I observed from 'top', indeed simulation runs on all 8 processors,
> >
> > > using more or less efficiently all the processors.
> >
> >
> >
> > there has to be more output, and i am pretty certain that there is
> >
> > some output that indicates what is going wrong.
> >
> >
> >
> > > All your suggestions will be very helpful.
> >
> >
> >
> > well, you got a ton of them already. the most important
> >
> > one is to include more relevant information. there are
> >
> > many, many cases on this mailing list where people ask
> >
> > for help with problems, and you can easily derive from
> >
> > the dialog what information is needed and what _you_
> >
> > can do beforehand to verify and you are seeing a real
> >
> > problem and what information is need to narrow it down.
> >
> >
> >
> > cheers,
> >
> > axel.
> >
> >
> >
> >
> >
> > > thanks and regards,
> >
> > >
> >
> > > Vivek
> >
> > > IMM, India.
> >
> >
> >
> > --
> >
> >
> =======================================================================
> >
> > Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu
> >
> > Center for Molecular Modeling -- University of Pennsylvania
> >
> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA
> 19104-6323
> >
> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
> >
> >
> =======================================================================
> >
> > If you make something idiot-proof, the universe creates a better
> idiot.
> >
> >
> >

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:35 CST