mpi problems on opteron

From: Kyle Gustafson (kgustaf_at_umd.edu)
Date: Mon Jul 25 2005 - 16:05:02 CDT

Hi all,

I have an 18 opteron cluster running SuSE 2.4.21-143-numa
I'm trying to install NAMD, which requires me to install charm++

After ./build charm++ mpi-linux-amd64 -nobs -O -DCMK_OPTIMIZE
I ran megatest. !!All of the one processor tests work fine!!,
but with +p2, I get the error below, where it looks like
charmrun is unable to use ssh. I can ssh back and forth from
any one node to any other, so I don't understand how this
problem could occur, because I don't know enough about ssh and
charm++. It seems like charm++ doesn't have access to the ssh
keys, but this seems crazy. My .nodelist file reads, where
head is the master and node00x is a slave. The nodelist file
is located in the HOME/charm directory, but I also tried
putting .nodelist in the megatest directory.

group main
host head ++shell ssh
host node001 ++shell ssh
host node002 ++shell ssh
host node003 ++shell ssh
host node004 ++shell ssh
host node005 ++shell ssh
host node006 ++shell ssh
host node007 ++shell ssh
host node008 ++shell ssh

This is the error when I charmrun.

I greatly appreciate your attention.

head:/home/namd2/NAMD_2.5_Source/charm/tests/charm++/megatest
# ./charmrun +p2 ./pgm

Running on 2 processors: ./pgm
26005: ssh_exchange_identification: Connection closed by
remote host
p0_26000: p4_error: Child process exited while making
connection to remote process on head: 0

Kyle B. Gustafson
Department of Physics
University of Maryland
Box 45
082 Regents Drive
College Park, MD 20742

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:59 CST