RE: IMD on MPI versions of NAMD

From: Brian Bennion (brian_at_youkai.llnl.gov)
Date: Mon Dec 01 2003 - 16:58:16 CST

Okay, now I see where you are coming from. We use a modified version of
the quadrics rms queueing system. Once the job has nodes allocated I can
access the machine list (its an environmental variable) so I can then test
the assumption that the first cpu on the first node is probably started
process 0.

Thanks for your guidance

Brian

On Mon, 1 Dec 2003, Cesar Delgado wrote:

> Are you starting up NAMD in some sort of queue? We use PBS as our queue
> and node 0 (zero) is determined by the queue and I have to extract that
> information.
>
> When MPI processes are started most MPI implementations (I use MPICH)
> use a machine file that names the machines that are to be used to run
> the jobs. Process 0 (zero) is, @ least in MPICH, the first computer in
> the machine file. I would ask your sys-admin which version of MPI you
> are using and how this machine file is created.
>
> -Cesar Delgado
> ---------------------------------------------
> Research Computing Facility @ UNL
> http://rcf.unl.edu
> cdelgad2_at_bigred.unl.edu, beettlle_at_hotmail.com
>
> > -----Original Message-----
> > From: Brian Bennion [mailto:brian_at_youkai.llnl.gov]
> > Sent: Monday, December 01, 2003 2:00 PM
> > To: Cesar Delgado
> > Subject: Re: namd-l: IMD on MPI versions of NAMD
> >
> >
> > hmmm...my knowledge here is not very substantial, how would I find out
> > which node started the first mpi process?
> >
> > could I assume that it is the first node assigned by the batch
> handler?
> >
> > Brian
> >
> > On Mon, 1 Dec 2003, Cesar Delgado wrote:
> >
> > > The master process is the one you have to connect to in order to do
> IMD
> > with
> > > the MPI version of NAMD. Basicaly MPI's process 0 (zero).
> > >
> > >
> > > -Cesar Delgado
> > > ---------------------------------------------
> > > Research Computing Facility @ UNL
> > > http://rcf.unl.edu
> > > cdelgad2_at_bigred.unl.edu, beettlle_at_hotmail.com
> > >
> > >
> > >
> > >
> > >
> > > >From: Brian Bennion <brian_at_youkai.llnl.gov>
> > > >To: Fangqiang Zhu <fzhu_at_ks.uiuc.edu>
> > > >CC: "David E. Konerding" <dekonerding_at_lbl.gov>,
> <namd-l_at_ks.uiuc.edu>
> > > >Subject: namd-l: IMD on MPI versions of NAMD
> > > >Date: Mon, 1 Dec 2003 10:19:28 -0800 (PST)
> > > >Hello
> > > >
> > > >Has anyone had experience with VMD-IMD-NAMD on a massively parallel
> > > >system?
> > > >In my case I have NAMD working in an MPI environment on a couple of
> > > >thousand processors. Each job is started with a native 'srun'
> command
> > > >which is similar to the prun command noted in the docs.
> > > >
> > > >Which host would I tell the IMD interface to connect with?
> > > >
> > > >Thank you for your comments.
> > > >Brian
> > > >
> > > >
> > > >On Mon, 1 Dec 2003, Fangqiang Zhu wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Please find comments from Justin Gullingsrud below. Hope it
> helps.
> > > > >
> > > > > Zhu
> > > > >
> > > > >
> > > > >
> > > > > From: "David E. Konerding" <dekonerding_at_lbl.gov>
> > > > > Date: November 26, 2003 12:09:10 PM PST
> > > > > To: namd-l_at_ks.uiuc.edu
> > > > > Subject: namd-l: updating coordinates using IMD
> > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > I am interested in using IMD, I've gone through the code
> and
> > > >managed
> > > > > to write a simple Python extension (yes, I know about the python
> IMD
> > > > > wrappers, but
> > > > > I wanted a pure python version). It works OK in most
> cases,
> > > >although
> > > > > I find that the NAMD side is a bit flaky; sometimes it
> disconnects
> > > >telling
> > > > > me my
> > > > > VMD version number is too old. When it does work I can
> collect
> > > >forces
> > > > > and energies.
> > > > >
> > > > >
> > > > > What is probably happening is that you are not sending the GO
> > message
> > > >soon
> > > > > enough after receiving the handshake message from NAMD. If NAMD
> > doesn't
> > > > > receive a
> > > > > response within one second it drops the connection with an error
> > message
> > > > > about incompatible IMD versions. That message could admittedly
> be a
> > bit
> > > > > more helpful.
> > > > >
> > > > >
> > > > > I have a few questions:
> > > > >
> > > > > 1) Is the IMD socket protocol truly asynchronous: the read
> and
> > > >write
> > > > > messages are totally distinct from each other (I'm more
> accustomed
> > to
> > > > > call/response style
> > > > > protocols).
> > > > >
> > > > >
> > > > > Yes, neither VMD nor NAMD wait for a response from each other.
> The
> > idea
> > > >is
> > > > > that there is no point in resending coordinates or forces if
> > something
> > > >was
> > > > > lost in transmission.
> > > > > VMD is continually redrawing the screen based on the most recent
> > > > > coordinates, and, when steering atoms with something like a
> haptic
> > > >device,
> > > > > forces are continually changing.
> > > > > Making either system wait would decrease the responsiveness.
> > > > >
> > > > > 2) IMD doesn't seem to have any support for changing the
> > > >coordinates
> > > > > directly and getting an energy evaluation. IE, I want to NAMD to
> > remain
> > > > > paused, I
> > > > > update all atomic coords to new ones, get back the energy
> > given
> > > >those
> > > > > coords,
> > > > > and NAMD goes back to a paused state.
> > > > >
> > > > >
> > > > > Yes, there is currently support only for sending forces to a
> > dynamics
> > > > > simulation.
> > > > >
> > > > > My thinking was to add a new message, IMD_COORDS, which
> the
> > client
> > > > > could send to NAMD and update the coordinates. Then, another
> > message,
> > > > > IMD_GO_ONCE, would cause NAMD to evaluate forces/energies,
> > send
> > > >one
> > > > > IMD_ENERGIES and one IMD_FCOORDS, and return to paused.
> > > > >
> > > > >
> > > > > IMD_GO_ONCE shouldn't be any problem. For technical reasons I
> think
> > it
> > > > > would be difficult to make NAMD accept a new set of coordinate
> while
> > the
> > > > > simulation was running.
> > > > > Jim Phillips could comment more precisely, but I think one would
> > need to
> > > > > suspend the simulation momentarily, apply the new coordinates,
> then
> > > >resume,
> > > > > and I'm not sure if the
> > > > > current interface is capable of that.
> > > > >
> > > > > Cheers,
> > > > > Justin
> > > > >
> > > > >
> > > >
> > > >--
> > > >*****************************************************************
> > > >**Brian Bennion, Ph.D. **
> > > >**Computational and Systems Biology Division **
> > > >**Biology and Biotechnology Research Program **
> > > >**Lawrence Livermore National Laboratory **
> > > >**P.O. Box 808, L-448 bennion1_at_llnl.gov **
> > > >**7000 East Avenue phone: (925) 422-5722 **
> > > >**Livermore, CA 94550 fax: (925) 424-6605 **
> > > >*****************************************************************
> > > >
> > > >
> > >
> > > _________________________________________________________________
> > > STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
> > > http://join.msn.com/?page=features/junkmail
> > >
> >
> > --
> > *****************************************************************
> > **Brian Bennion, Ph.D. **
> > **Computational and Systems Biology Division **
> > **Biology and Biotechnology Research Program **
> > **Lawrence Livermore National Laboratory **
> > **P.O. Box 808, L-448 bennion1_at_llnl.gov **
> > **7000 East Avenue phone: (925) 422-5722 **
> > **Livermore, CA 94550 fax: (925) 424-6605 **
> > *****************************************************************
> >
>

-- 
*****************************************************************
**Brian Bennion, Ph.D.                                         **
**Computational and Systems Biology Division                   **
**Biology and Biotechnology Research Program                   **
**Lawrence Livermore National Laboratory                       **
**P.O. Box 808, L-448    bennion1_at_llnl.gov                     **
**7000 East Avenue       phone: (925) 422-5722                 **
**Livermore, CA  94550   fax:   (925) 424-6605                 **
*****************************************************************

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:37:11 CST