Re: Waiting for 0-th client to connect.

From: Leandro Martínez (leandromartinez98_at_gmail.com)
Date: Tue Oct 24 2006 - 13:15:28 CDT

Actually, if I run namd2 from the nodes, even using another
nodes and the master (as a node),
everything runs ok. The only problem
is that the nodes seem not to respond when the simulation
is started from the master. Therefore, the problem cannot
be missing libraries on the nodes, it must be something related
to the connection between the master node and the nodes.
Maybe some rsh configuration issue. But I cannot get any
hint on what it may be.
Thanks,
Leandro.

On 10/24/06, Leandro Martínez <leandromartinez98_at_gmail.com> wrote:
>
>
> Hi all,
> I'm trying to run namd2 in a recently configured linux cluster of 8 nodes.
>
> I can run the program in any node independently, they share the same
> home directory. However, if I try to run remotely the simulation
> does not start. Running with ++verbose returns me following
> information. The program gets stuck on that.
>
> The output below is from a test. The nodelist contains a single node
> that is not the one where the simulation is started. If the node in the
> nodelist is the node where the simulation is started the simulation
> runs fine, the problem is in the remote run.
>
> Charmrun> charmrun started...
> Charmrun> using ./nodelist2 as nodesfile
> Charmrun> adding client 0: " 192.168.0.101", IP:192.168.0.101
> Charmrun> adding client 1: "192.168.0.101", IP: 192.168.0.101
> Charmrun> Charmrun = alehpo.iqm.unicamp.br, port = 42645
> Charmrun> Sending "0 alehpo.iqm.unicamp.br 42645 17029 0" to client 0.
> Charmrun> find the node program
> "/home/lmartinez/./NAMD_2.6b2_Linux-amd64/namd2" at "/home/lmartinez" for 0.
> Charmrun> Starting rsh 192.168.0.101 -l lmartinez /bin/sh -f
> Charmrun> rsh (192.168.0.101:0) started
> Charmrun> Sending "1 alehpo.iqm.unicamp.br 42645 17029 0" to client 1.
> Charmrun> find the node program
> "/home/lmartinez/./NAMD_2.6b2_Linux-amd64/namd2" at "/home/lmartinez" for 1.
>
> Charmrun> Starting rsh 192.168.0.101 -l lmartinez /bin/sh -f
> Charmrun> rsh (192.168.0.101:1) started
> Charmrun> node programs all started
> Charmrun> waiting for rsh (192.168.0.101:0), pid 17030
> Charmrun rsh(192.168.0.101.0)> remote responding...
> Charmrun rsh(192.168.0.101.1)> remote responding...
> Charmrun rsh( 192.168.0.101.0)> starting node-program...
> Charmrun rsh(192.168.0.101.0)> rsh phase successful.
> Charmrun rsh(192.168.0.101.1)> starting node-program...
> Charmrun rsh(192.168.0.101.1)> rsh phase successful.
> Charmrun> waiting for rsh (192.168.0.101:1), pid 17031
> Charmrun> Waiting for 0-th client to connect.
> Timeout waiting for node-program to connect
>
>
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:44:06 CST