Timeout waiting for node-program to connect

From: Nicolas Floquet (nicolas.floquet_at_univ-montp1.fr)
Date: Wed Jul 21 2010 - 10:43:02 CDT

Dear NAMD users, i try to dispatch a MD calculation on the local
machines connected by the local network.
I declare each machine in the nodelist file as follow ("machines" file)

group main ++pathfix ./
host (here is the ip adresse of machine 1)
host (here is the ip adresse of machine 2)
host (here is the ip adresse of machine 3)

my launching script is like this :
#!/bin/csh
nohup [namd dir]/charmrun ++verbose +p12 ++nodelist
[workingdir]/machines [namd dir]/namd2 [workingdir]/equi1.conf >
[workingdir]/output&

Here is the result in the output file:
Charmrun remote shell(192.168.212.68.0)> remote responding...
Charmrun remote shell(192.168.212.68.0)> starting node-program...
Charmrun remote shell(192.168.212.68.0)> rsh phase successful.
Charmrun remote shell(192.168.212.68.3)> remote responding...
Charmrun remote shell(192.168.212.68.3)> starting node-program...
Charmrun remote shell(192.168.212.68.3)> rsh phase successful.
Charmrun remote shell(192.168.212.68.6)> remote responding...
Charmrun remote shell(192.168.212.68.6)> starting node-program...
Charmrun remote shell(192.168.212.68.6)> rsh phase successful.
Charmrun remote shell(192.168.212.68.9)> remote responding...
Charmrun remote shell(192.168.212.68.9)> starting node-program...
Charmrun remote shell(192.168.212.68.9)> rsh phase successful.
Charmrun remote shell(192.168.212.100.5)> remote responding...
Charmrun remote shell(192.168.212.100.5)> starting node-program...
Charmrun remote shell(192.168.212.100.5)> rsh phase successful.
Charmrun remote shell(192.168.212.100.2)> remote responding...
Charmrun remote shell(192.168.212.100.2)> starting node-program...
Charmrun remote shell(192.168.212.100.11)> remote responding...
Charmrun remote shell(192.168.212.100.8)> remote responding...
Charmrun remote shell(192.168.212.100.11)> starting node-program...
Charmrun remote shell(192.168.212.100.8)> starting node-program...
Charmrun remote shell(192.168.212.100.8)> rsh phase successful.
Charmrun remote shell(192.168.212.97.7)> remote responding...
Charmrun remote shell(192.168.212.100.11)> rsh phase successful.
Charmrun remote shell(192.168.212.97.7)> starting node-program...
Charmrun remote shell(192.168.212.100.2)> rsh phase successful.
Charmrun remote shell(192.168.212.97.1)> remote responding...
Charmrun remote shell(192.168.212.97.7)> rsh phase successful.
Charmrun remote shell(192.168.212.97.1)> starting node-program...
Charmrun remote shell(192.168.212.97.1)> rsh phase successful.
Charmrun remote shell(192.168.212.97.10)> remote responding...
Charmrun remote shell(192.168.212.97.4)> remote responding...
Charmrun remote shell(192.168.212.97.4)> starting node-program...
Charmrun remote shell(192.168.212.97.4)> rsh phase successful.
Charmrun remote shell(192.168.212.97.10)> starting node-program...
Charmrun remote shell(192.168.212.97.10)> rsh phase successful.
Charmrun> adding client 0: "192.168.212.68", IP:192.168.212.68
Charmrun> adding client 1: "192.168.212.97", IP:192.168.212.97
Charmrun> adding client 2: "192.168.212.100", IP:192.168.212.100
Charmrun> adding client 3: "192.168.212.68", IP:192.168.212.68
Charmrun> adding client 4: "192.168.212.97", IP:192.168.212.97
Charmrun> adding client 5: "192.168.212.100", IP:192.168.212.100
Charmrun> adding client 6: "192.168.212.68", IP:192.168.212.68
Charmrun> adding client 7: "192.168.212.97", IP:192.168.212.97
Charmrun> adding client 8: "192.168.212.100", IP:192.168.212.100
Charmrun> adding client 9: "192.168.212.68", IP:192.168.212.68
Charmrun> adding client 10: "192.168.212.97", IP:192.168.212.97
Charmrun> adding client 11: "192.168.212.100", IP:192.168.212.100
Charmrun> Charmrun = 127.0.1.1, port = 33244
Charmrun> Sending "0 127.0.1.1 33244 23431 0" to client 0.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 0.
Charmrun> Starting ssh 192.168.212.68 -l nicolas /bin/sh -f
Charmrun> Sending "1 127.0.1.1 33244 23431 0" to client 1.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 1.
Charmrun> Starting ssh 192.168.212.97 -l nicolas /bin/sh -f
Charmrun> Sending "2 127.0.1.1 33244 23431 0" to client 2.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 2.
Charmrun> Starting ssh 192.168.212.100 -l nicolas /bin/sh -f
Charmrun> Sending "3 127.0.1.1 33244 23431 0" to client 3.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 3.
Charmrun> Starting ssh 192.168.212.68 -l nicolas /bin/sh -f
Charmrun> Sending "4 127.0.1.1 33244 23431 0" to client 4.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 4.
Charmrun> Starting ssh 192.168.212.97 -l nicolas /bin/sh -f
Charmrun> Sending "5 127.0.1.1 33244 23431 0" to client 5.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 5.
Charmrun> Starting ssh 192.168.212.100 -l nicolas /bin/sh -f
Charmrun> Sending "6 127.0.1.1 33244 23431 0" to client 6.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 6.
Charmrun> Starting ssh 192.168.212.68 -l nicolas /bin/sh -f
Charmrun> Sending "7 127.0.1.1 33244 23431 0" to client 7.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 7.
Charmrun> Starting ssh 192.168.212.97 -l nicolas /bin/sh -f
Charmrun> Sending "8 127.0.1.1 33244 23431 0" to client 8.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 8.
Charmrun> Starting ssh 192.168.212.100 -l nicolas /bin/sh -f
Charmrun> Sending "9 127.0.1.1 33244 23431 0" to client 9.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 9.
Charmrun> Starting ssh 192.168.212.68 -l nicolas /bin/sh -f
Charmrun> Sending "10 127.0.1.1 33244 23431 0" to client 10.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 10.
Charmrun> Starting ssh 192.168.212.97 -l nicolas /bin/sh -f
Charmrun> Sending "11 127.0.1.1 33244 23431 0" to client 11.
Charmrun> find the node program "/home/nicolas/PROGRAMMES/NAMD/namd2" at
"/home/nicolas/CALCULS/NAMD_TEST" for 11.
Charmrun> Starting ssh 192.168.212.100 -l nicolas /bin/sh -f
Charmrun> Waiting for 0-th client to connect.
Charmrun> Waiting for 1-th client to connect.
Charmrun> Waiting for 2-th client to connect.
Charmrun> Waiting for 3-th client to connect.
Charmrun> client 0 connected (IP=192.168.212.68 data_port=39548)
Charmrun> client 3 connected (IP=192.168.212.68 data_port=37785)
Charmrun> client 6 connected (IP=192.168.212.68 data_port=54621)
Charmrun> client 9 connected (IP=192.168.212.68 data_port=51703)
Charmrun> Waiting for 4-th client to connect.

It is the same using ssh after including setenv CONV_RSH ssh in the
running script.
Connections between the machines work fine with ssh or rsh without password.

Do we need to share the working directory or not ?

ty for any help !!!!!!!!

nicolas

-- 
---------------------------
Dr. Nicolas FLOQUET
Chargé de Recherches CNRS
Institut des Biomolécules Max Mousseron (IBMM UMR5247)
Faculté de Pharmacie, 15 av. Charles Flahault
B.P. 14491
34093 MONTPELLIER CEDEX 5
---------------------------
Tél: 04 67 54 85 50
Fax: 04 67 54 86 54
---------------------------

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:55:58 CST