Re: mpi problems on opteron

From: Kyle Gustafson (kgustaf_at_umd.edu)
Date: Mon Jul 25 2005 - 19:04:56 CDT

Leandro,

Thanks for the reply. I've not decided between ssh and rsh
yet. Do you run with MPI?

Kyle

---- Original message ----
>Date: Mon, 25 Jul 2005 21:00:04 -0300
>From: Leandro Martínez <leandromartinez98_at_gmail.com>
>Subject: Re: namd-l: mpi problems on opteron
>To: Kyle Gustafson <kgustaf_at_umd.edu>
>Cc: namd-l_at_ks.uiuc.edu
>
>Hi Kyle,
>We have a cluster similar to yours, but running fedora.
Probably the
>problem is that you need to set ssh to be used without passwords
>between the nodes. We are actually using rsh in our nodes instead
>because it was easier to configure. You need to put in your
>home directory a file named .rhosts containing
>
>143.106.51.147 username
>127.0.0.1 username
>192.168.0.100 username
>192.168.0.101 username
>192.168.0.102 username
>.
>.
>
>and this file shoud have the permisions changed by
>
>chown chmod og-rwx .rhosts
>
>This file must be in your home directory in all nodes (in our
case all
>nodes share the same /home, so it was simpler)
>
>You can search for better documentation on the web on that,
I'm not
>quite and expert on this subject, I only did what was
necessary to get
>namd running.
>
>Leandro.
>
>
>
>--------------------------------------------------------------------
>Leandro Martinez
>Institute of Chemistry
>State University of Campinas
>http://www.ime.unicamp.br/~martinez/packmol
>--------------------------------------------------------------------
>
>
>
>On 7/25/05, Kyle Gustafson <kgustaf_at_umd.edu> wrote:
>> Hi all,
>>
>> I have an 18 opteron cluster running SuSE 2.4.21-143-numa
>> I'm trying to install NAMD, which requires me to install
charm++
>>
>> After ./build charm++ mpi-linux-amd64 -nobs -O -DCMK_OPTIMIZE
>> I ran megatest. !!All of the one processor tests work fine!!,
>> but with +p2, I get the error below, where it looks like
>> charmrun is unable to use ssh. I can ssh back and forth from
>> any one node to any other, so I don't understand how this
>> problem could occur, because I don't know enough about ssh and
>> charm++. It seems like charm++ doesn't have access to the ssh
>> keys, but this seems crazy. My .nodelist file reads, where
>> head is the master and node00x is a slave. The nodelist file
>> is located in the HOME/charm directory, but I also tried
>> putting .nodelist in the megatest directory.
>>
>> group main
>> host head ++shell ssh
>> host node001 ++shell ssh
>> host node002 ++shell ssh
>> host node003 ++shell ssh
>> host node004 ++shell ssh
>> host node005 ++shell ssh
>> host node006 ++shell ssh
>> host node007 ++shell ssh
>> host node008 ++shell ssh
>>
>>
>> This is the error when I charmrun.
>>
>> I greatly appreciate your attention.
>>
>>
>> head:/home/namd2/NAMD_2.5_Source/charm/tests/charm++/megatest
>> # ./charmrun +p2 ./pgm
>>
>> Running on 2 processors: ./pgm
>> 26005: ssh_exchange_identification: Connection closed by
>> remote host
>> p0_26000: p4_error: Child process exited while making
>> connection to remote process on head: 0
>>
>> Kyle B. Gustafson
>> Department of Physics
>> University of Maryland
>> Box 45
>> 082 Regents Drive
>> College Park, MD 20742
>>

Kyle B. Gustafson
Department of Physics
University of Maryland
College Park, MD USA

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:44 CST