Re: Re: NAMD job dies on 2-quad core server

From: vivek.viv.sharma_at_gmail.com
Date: Thu Apr 09 2009 - 22:32:12 CDT

Hello Gengbin,

Thanks for your reply.

Now, NAMD seems to be working fine, all I did was to use the
binary "NAMD_2.7b1_Linux-x86-TCP", . In my upgradation from NAMD2.6 to
NAMD2.7b1, I tried the other binary "NAMD_2.7b1_Linux-x86" but it showed
the same behaviour as the previous one showed (NAMD2.6). I do not know what
exactly are the differences between the two "NAMD_2.7b1_Linux-x86"
AND "NAMD_2.7b1_Linux-x86-TCP". Would you please say few words about the
difference between the two?

thanks again,

Vivek

On Apr 9, 2009 8:52pm, Gengbin Zheng <gzheng_at_illinois.edu> wrote:

> Vivek,

> When NAMD is busy doing communication (sending messages), or doing load
> balancing, it may appear idle, or only one processor shows busy for a
> short period of time. Also check if your job is running close to the
> memory capacity (you can see that from "top"). Operating system may be
> busy swapping your NAMD job to/from disk which also causes idle time.

> Gengbin

> vivek.viv.sharma_at_gmail.com wrote:

> Helllo Axel and all,

> Well, it was my mistake to mention in my previous post that 'job dies'.
> Well, in fact my 'this' statement was based on observing the 'top'
> command on the console. The job is running fine on all the 8 processors.
> Its not dying, but every now and then the 'top' command shows that none
> of the processor is being used by NAMD. Like the NAMD process goes in
> background and comes back running again with all 8 processors in use, as
> observed in the 'top' command. When I check with 'ps -e' all the NAMD
> jobs are there. Can anyone please throw some light on this, that why such
> a behaviour is being observed, that NAMD jobs appear-go in
> background-re-appear in the top command (?)

> Now, am I right in thinking that 'this-way' running of jobs will take
> more time than it should (?). I assume here that when 'top' comamnd does
> not show NAMD running, simulation is not running. (could be wrong though,
> might be something else is being done in this time).

> Axel, thanks for your points, I should have observed more before posting.
> Secondly, I really admire the beauty of this simple NAMD command which
> can be used to run the simulation without much installation work to be
> done.

> thanks and regards,

> Vivek

> On Apr 6, 2009 7:14pm, Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu> wrote:

> > On Mon, 2009-04-06 at 04:45 +0000, vivek.viv.sharma_at_gmail.com wrote:

> >

> > > Hello everyone,

> >

> > >

> >

> > > We have recently bought a machine with the following configuration:

> >

> > >

> >

> > > 2 quad core processors each with 2.33GHz clock rate.

> >

> > > 8 GB RAM

> >

> > > 500GB total hard disk

> >

> > >

> >

> > > I have simply used the "NAMD_2.6_Linux-i686" binaries. And, started

> >

> > > the simulation (membrane protein with membrane, water, ions.). The

> >

> > > simulation starts fine with the command

> >

> > >

> >

> > > ./charmrun ++local +p 8 ./namd2 config.txt > config.log &

> >

> > >

> >

> > > But after 4390 steps the job dies, without giving any error message.

> >

> > > Would you please suggest what is happening? Do I need to install

> >

> >

> >

> > how should anybody know??? does your input run fine elsewhere? have

> >

> > you looked at the trajectory? have you looked at the machine logs?

> >

> > does your os have restrictive limits for interactive use or stack

> >

> > memory? can you run the same job with less processors? how is the

> >

> > CPU temperature? is the crash reproducable? ...

> >

> >

> >

> > this list can go on for much longer. so please keep in mind that

> >

> > the kind of suggestion you can receive from a mailing list is

> >

> > directly proportional to the kind and quality of information

> >

> > you provide. in you case, you just say "it doesn't work". and

> >

> > only for one specific configuration. that is _very_ little.

> >

> >

> >

> > > it(NAMD) from scratch?

> >

> >

> >

> > why? first you have to find out what happens.

> >

> > blind activism never helps!

> >

> >

> >

> > >

> >

> > > NAMD in log file shows clearly:

> >

> > >

> >

> > > >> Info: Running on 8 processors.

> >

> > >

> >

> > > I observed from 'top', indeed simulation runs on all 8 processors,

> >

> > > using more or less efficiently all the processors.

> >

> >

> >

> > there has to be more output, and i am pretty certain that there is

> >

> > some output that indicates what is going wrong.

> >

> >

> >

> > > All your suggestions will be very helpful.

> >

> >

> >

> > well, you got a ton of them already. the most important

> >

> > one is to include more relevant information. there are

> >

> > many, many cases on this mailing list where people ask

> >

> > for help with problems, and you can easily derive from

> >

> > the dialog what information is needed and what _you_

> >

> > can do beforehand to verify and you are seeing a real

> >

> > problem and what information is need to narrow it down.

> >

> >

> >

> > cheers,

> >

> > axel.

> >

> >

> >

> >

> >

> > > thanks and regards,

> >

> > >

> >

> > > Vivek

> >

> > > IMM, India.

> >

> >

> >

> > --

> >

> > =======================================================================

> >

> > Axel Kohlmeyer akohlmey_at_cmm.chem.upenn.edu http://www.cmm.upenn.edu

> >

> > Center for Molecular Modeling -- University of Pennsylvania

> >

> > Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323

> >

> > tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425

> >

> > =======================================================================

> >

> > If you make something idiot-proof, the universe creates a better idiot.

> >

> >

> >

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:35 CST