Re: lamboot -v nodelist

From: Axel Kohlmeyer (akohlmey_at_cmm.chem.upenn.edu)
Date: Wed Jun 03 2009 - 10:27:06 CDT

On Wed, 2009-06-03 at 16:49 +0200, Yogesh Aher wrote:
> Dear NAMD-users,

dear yogesh,

this looks like you don't have a working passwordless ssh access
between nodes and back(!).

the detailed message from LAM already gave you a suggestion
what you should try. have you followed it? and what was the
outcome of the ssh command?

cheers,
   axel.

> I stuck again with the working of NAMD using LAM. I installed ssh, openssh,
> libaio and other necessary libraries, but still get the following error. If
> anybody came across such error, will you please let me know about how to
> resolve it.
>
>
> [sam_at_xyz Linux-i686-MPI]$ lamboot -v /home/sam/.nodelist
>
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<962> ssi:boot:base:linear: booting n0 (100.120.10.04)
> n-1<962> ssi:boot:base:linear: booting n1 (100.120.10.41)
> -----------------------------------------------------------------------------
> LAM failed to execute a process on the remote node "100.120.10.41".
> LAM was not trying to invoke any LAM-specific commands yet -- we were
> simply trying to determine what shell was being used on the remote
> host.
>
> LAM tried to use the remote agent command "/bin/ssh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This usually indicates an authentication problem with the remote
> agent, some other configuration type of error in your .cshrc or
> .profile file, or you were unable to executable a command on the
> remote node for some other reason. The following is a list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> /bin/ssh -x 100.120.10.41 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<962> ssi:boot:base:linear: Failed to boot n1 (100.120.10.41)
> n-1<962> ssi:boot:base:linear: aborted!
> n-1<967> ssi:boot:base:linear: booting n0 (100.120.10.04)
> n-1<967> ssi:boot:base:linear: booting n1 (100.120.10.41)
> -----------------------------------------------------------------------------

-- 
=======================================================================
Axel Kohlmeyer   akohlmey_at_cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:53 CST