Re: mpi problems on opteron

From: Leandro Martínez (leandromartinez98_at_gmail.com)
Date: Mon Jul 25 2005 - 19:11:13 CDT

No, we are running namd with charm++ and the performance is
really good, at least for my simulations. We have tried to run
gromacs with mpi and we could not obtain the same scalability.
If charm++ uses mpi or something like that and my answer was
stupid, I'm sorry.
Leandro.

On 7/25/05, Kyle Gustafson <kgustaf_at_umd.edu> wrote:
> Leandro,
>
> Thanks for the reply. I've not decided between ssh and rsh
> yet. Do you run with MPI?
>
> Kyle
>
> ---- Original message ----
> >Date: Mon, 25 Jul 2005 21:00:04 -0300
> >From: Leandro Martínez <leandromartinez98_at_gmail.com>
> >Subject: Re: namd-l: mpi problems on opteron
> >To: Kyle Gustafson <kgustaf_at_umd.edu>
> >Cc: namd-l_at_ks.uiuc.edu
> >
> >Hi Kyle,
> >We have a cluster similar to yours, but running fedora.
> Probably the
> >problem is that you need to set ssh to be used without passwords
> >between the nodes. We are actually using rsh in our nodes instead
> >because it was easier to configure. You need to put in your
> >home directory a file named .rhosts containing
> >
> >143.106.51.147 username
> >127.0.0.1 username
> >192.168.0.100 username
> >192.168.0.101 username
> >192.168.0.102 username
> >.
> >.
> >
> >and this file shoud have the permisions changed by
> >
> >chown chmod og-rwx .rhosts
> >
> >This file must be in your home directory in all nodes (in our
> case all
> >nodes share the same /home, so it was simpler)
> >
> >You can search for better documentation on the web on that,
> I'm not
> >quite and expert on this subject, I only did what was
> necessary to get
> >namd running.
> >
> >Leandro.
> >
> >
> >
> >--------------------------------------------------------------------
> >Leandro Martinez
> >Institute of Chemistry
> >State University of Campinas
> >http://www.ime.unicamp.br/~martinez/packmol
> >--------------------------------------------------------------------
> >
> >
> >
> >On 7/25/05, Kyle Gustafson <kgustaf_at_umd.edu> wrote:
> >> Hi all,
> >>
> >> I have an 18 opteron cluster running SuSE 2.4.21-143-numa
> >> I'm trying to install NAMD, which requires me to install
> charm++
> >>
> >> After ./build charm++ mpi-linux-amd64 -nobs -O -DCMK_OPTIMIZE
> >> I ran megatest. !!All of the one processor tests work fine!!,
> >> but with +p2, I get the error below, where it looks like
> >> charmrun is unable to use ssh. I can ssh back and forth from
> >> any one node to any other, so I don't understand how this
> >> problem could occur, because I don't know enough about ssh and
> >> charm++. It seems like charm++ doesn't have access to the ssh
> >> keys, but this seems crazy. My .nodelist file reads, where
> >> head is the master and node00x is a slave. The nodelist file
> >> is located in the HOME/charm directory, but I also tried
> >> putting .nodelist in the megatest directory.
> >>
> >> group main
> >> host head ++shell ssh
> >> host node001 ++shell ssh
> >> host node002 ++shell ssh
> >> host node003 ++shell ssh
> >> host node004 ++shell ssh
> >> host node005 ++shell ssh
> >> host node006 ++shell ssh
> >> host node007 ++shell ssh
> >> host node008 ++shell ssh
> >>
> >>
> >> This is the error when I charmrun.
> >>
> >> I greatly appreciate your attention.
> >>
> >>
> >> head:/home/namd2/NAMD_2.5_Source/charm/tests/charm++/megatest
> >> # ./charmrun +p2 ./pgm
> >>
> >> Running on 2 processors: ./pgm
> >> 26005: ssh_exchange_identification: Connection closed by
> >> remote host
> >> p0_26000: p4_error: Child process exited while making
> >> connection to remote process on head: 0
> >>
> >> Kyle B. Gustafson
> >> Department of Physics
> >> University of Maryland
> >> Box 45
> >> 082 Regents Drive
> >> College Park, MD 20742
> >>
>
> Kyle B. Gustafson
> Department of Physics
> University of Maryland
> College Park, MD USA
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:44 CST