Re: AW: AW: Using nodelist file causes namd to hang

From: Douglas Houston (DouglasR.Houston_at_ed.ac.uk)
Date: Wed Apr 09 2014 - 06:16:35 CDT

We may be getting somewhere. The following command now runs:

/usr/people/douglas/programs/NAMD_2.9_Linux-x86/charmrun +p1
/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2 ++verbose
mdrun.conf

With options +p1 to +p10 it works. At +p11 or +p12 (each node has 12
processors) I get:

Charmrun> charmrun started...
Charmrun> using /usr/people/douglas/.nodelist as nodesfile
Charmrun> adding client 0: "localhost", IP:127.0.0.1
Charmrun> adding client 1: "localhost", IP:127.0.0.1
Charmrun> adding client 2: "localhost", IP:127.0.0.1
Charmrun> adding client 3: "localhost", IP:127.0.0.1
Charmrun> adding client 4: "localhost", IP:127.0.0.1
Charmrun> adding client 5: "localhost", IP:127.0.0.1
Charmrun> adding client 6: "localhost", IP:127.0.0.1
Charmrun> adding client 7: "localhost", IP:127.0.0.1
Charmrun> adding client 8: "localhost", IP:127.0.0.1
Charmrun> adding client 9: "localhost", IP:127.0.0.1
Charmrun> adding client 10: "localhost", IP:127.0.0.1
Charmrun> Charmrun = 129.215.237.187, port = 58561
start_nodes_rsh
Charmrun> Sending "0 129.215.237.187 58561 2981 0" to client 0.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
0.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:0) started
Charmrun> Sending "1 129.215.237.187 58561 2981 0" to client 1.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
1.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:1) started
Charmrun> Sending "2 129.215.237.187 58561 2981 0" to client 2.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
2.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:2) started
Charmrun> Sending "3 129.215.237.187 58561 2981 0" to client 3.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
3.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:3) started
Charmrun> Sending "4 129.215.237.187 58561 2981 0" to client 4.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
4.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:4) started
Charmrun> Sending "5 129.215.237.187 58561 2981 0" to client 5.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
5.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:5) started
Charmrun> Sending "6 129.215.237.187 58561 2981 0" to client 6.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
6.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:6) started
Charmrun> Sending "7 129.215.237.187 58561 2981 0" to client 7.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
7.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:7) started
Charmrun> Sending "8 129.215.237.187 58561 2981 0" to client 8.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
8.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:8) started
Charmrun> Sending "9 129.215.237.187 58561 2981 0" to client 9.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
9.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:9) started
Charmrun> Sending "10 129.215.237.187 58561 2981 0" to client 10.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
10.
Charmrun> Starting ssh localhost -l douglas /bin/sh -f
Charmrun> remote shell (localhost:10) started
Charmrun> node programs all started
ssh_exchange_identification: Connection closed by remote host
Charmrun remote shell(localhost.7)> remote responding...
Charmrun remote shell(localhost.7)> starting node-program...
Charmrun remote shell(localhost.7)> rsh phase successful.
Charmrun remote shell(localhost.5)> remote responding...
Charmrun remote shell(localhost.3)> remote responding...
Charmrun remote shell(localhost.5)> starting node-program...
Charmrun remote shell(localhost.5)> rsh phase successful.
Charmrun remote shell(localhost.10)> remote responding...
Charmrun remote shell(localhost.3)> starting node-program...
Charmrun remote shell(localhost.3)> rsh phase successful.
Charmrun remote shell(localhost.10)> starting node-program...
Charmrun remote shell(localhost.10)> rsh phase successful.
Charmrun remote shell(localhost.6)> remote responding...
Charmrun remote shell(localhost.0)> remote responding...
Charmrun remote shell(localhost.6)> starting node-program...
Charmrun remote shell(localhost.6)> rsh phase successful.
Charmrun remote shell(localhost.9)> remote responding...
Charmrun remote shell(localhost.0)> starting node-program...
Charmrun remote shell(localhost.0)> rsh phase successful.
Charmrun remote shell(localhost.8)> remote responding...
Charmrun remote shell(localhost.4)> remote responding...
Charmrun remote shell(localhost.9)> starting node-program...
Charmrun remote shell(localhost.9)> rsh phase successful.
Charmrun remote shell(localhost.4)> starting node-program...
Charmrun remote shell(localhost.4)> rsh phase successful.
Charmrun remote shell(localhost.8)> starting node-program...
Charmrun remote shell(localhost.8)> rsh phase successful.
Charmrun remote shell(localhost.1)> remote responding...
Charmrun remote shell(localhost.1)> starting node-program...
Charmrun remote shell(localhost.1)> rsh phase successful.
Charmrun> Error 255 returned from rsh (localhost:2)

Note the number in "localhost:#" in the last line above is variable,
it's not the same each time. Is there a limit on how many simultaneous
connections I can have?

Quoting Norman Geist <norman.geist_at_uni-greifswald.de> on Wed, 9 Apr
2014 12:53:50 +0200:

> This may be a hint. Your nodes must not only be able to logon to all nodes
> without password, but should also be able to logon to themselves via own IP
> address, localhost and 127.0.0.1
>
> You may want to delete the wrong entries in ~/.ssh/known_hosts on the nodes,
> and recreate by ssh to the targets mentioned above.
>
> Norman Geist.
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: Douglas Houston [mailto:DouglasR.Houston_at_ed.ac.uk]
>> Gesendet: Mittwoch, 9. April 2014 12:42
>> An: Norman Geist
>> Betreff: Re: AW: namd-l: Using nodelist file causes namd to hang
>>
>> The same command without the ++local causes the nodelist file to be
>> used, I have already posed the output from this.
>>
>> If I delete the nodelist file, the same command without the ++local
>> (which causes the file /usr/people/douglas/.nodelist to be used)
>> outputs:
>>
>>
>> Charmrun> charmrun started...
>> Charmrun> using /usr/people/douglas/.nodelist as nodesfile
>> Charmrun> adding client 0: "localhost", IP:127.0.0.1
>> Charmrun> Charmrun = 129.215.237.187, port = 35909
>> start_nodes_rsh
>> Charmrun> Sending "0 129.215.237.187 35909 27843 0" to client 0.
>> Charmrun> find the node program
>> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> " for
>> 0.
>> Charmrun> Starting ssh localhost -l douglas /bin/sh -f
>> Charmrun> remote shell (localhost:0) started
>> Charmrun> node programs all started
>> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
>> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
>> IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
>> Someone could be eavesdropping on you right now (man-in-the-middle
>> attack)!
>> It is also possible that the RSA host key has just been changed.
>> The fingerprint for the RSA key sent by the remote host is
>> 99:cb:e0:0a:77:8b:61:fd:19:01:57:93:ec:93:99:63.
>> Please contact your system administrator.
>> Add correct host key in /usr/people/douglas/.ssh/known_hosts to get
>> rid of this message.
>> Offending key in /usr/people/douglas/.ssh/known_hosts:47
>> RSA host key for localhost has changed and you have requested strict
>> checking.
>> Host key verification failed.
>> Charmrun> Error 255 returned from rsh (localhost:0)
>>
>>
>> The file /usr/people/douglas/.nodelist contains:
>> group main
>> host localhost
>>
>>
>>
>>
>>
>> Quoting Norman Geist <norman.geist_at_uni-greifswald.de> on Wed, 9 Apr
>> 2014 12:28:51 +0200:
>>
>> > Please try the same command without ++local and see if it still
>> works.
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> >> Auftrag von Douglas Houston
>> >> Gesendet: Mittwoch, 9. April 2014 11:49
>> >> An: ramya narasimhan
>> >> Cc: Namd Mailing List
>> >> Betreff: Re: namd-l: Using nodelist file causes namd to hang
>> >>
>> >> The result is the same whichever order the nodes are present in the
>> >> list.
>> >>
>> >> What exactly is Charmrun waiting for at the "Waiting for 0-th client
>> >> to connect." stage? Presumably the 0th client is the first in
>> >> nodelist, and that a process is supposed to start on that node, then
>> >> "connect" to Charmrun on the host machine?
>> >
>> > Charmrun is just spawning the namd processes and now is waiting for
>> them to
>> > start to talk.
>> >
>> >>
>> >> Using the command top I see no evidence of anything new starting on
>> >> the node, despite all the "starting node-program" and "rsh phase
>> >> successful" messages that are output.
>> >>
>> >> Using "ps -u douglas" on the node shows a whole bunch of tcsh and sh
>> >> shells and sleep processes starting then dying but nothing else.
>> >>
>> >> What does the line "Sending "0 129.215.237.187 57453 26737 0" to
>> >> client 0" mean? How is this "sending" achieved? I see "port 57453"
>> is
>> >> mentioned in the output ...
>> >
>> > Seems like being part of the parallel startup, where the spawned
>> processes
>> > get the information about each other.
>> >
>> >>
>> >>
>> >>
>> >>
>> >> Quoting ramya narasimhan <ramya_jln_at_yahoo.co.in> on Wed, 9 Apr 2014
>> >> 11:51:52 +0800 (SGT):
>> >>
>> >> > Just change the hostname [IP of the system] order in the
>> >> > nodefile, so that the 0-th client will be itioc5 instead
>> of itioc1.
>> >> > To find whether the problem is with nodes.
>> >> >
>> >> >
>> >> > Dr. Ramya.L.
>> >> > On Tuesday, 8 April 2014 7:23 PM, Douglas Houston
>> >> > <DouglasR.Houston_at_ed.ac.uk> wrote:
>> >> >
>> >> > Yes, with ping all the nodes resolve to full hostnames and IP
>> >> > addresses. I tried putting IP addresses into nodelist instead of
>> >> > hostnames but it still times out at "Waiting for 0-th client to
>> >> connect"
>> >> >
>> >> >
>> >> > Quoting Norman Geist <norman.geist_at_uni-greifswald.de> on Tue, 8
>> Apr
>> >> > 2014 14:30:15 +0200:
>> >> >
>> >> >> On all the nodes? Otherwise try a nodelist with IP adresses
>> instead
>> >> of
>> >> >> hostnames. If that works, you got a problem with local DNS.
>> >> >>
>> >> >> Norman Geist.
>> >> >>
>> >> >>
>> >> >>> -----Ursprüngliche Nachricht-----
>> >> >>> Von: Douglas Houston [mailto:DouglasR.Houston_at_ed.ac.uk]
>> >> >>> Gesendet: Dienstag, 8. April 2014 14:14
>> >> >>> An: Norman Geist
>> >> >>> Cc: Namd Mailing List
>> >> >>> Betreff: Re: AW: AW: namd-l: Using nodelist file causes namd to
>> >> hang
>> >> >>>
>> >> >>> Thanks Norman. I had found that thread after my searches but it
>> did
>> >> >>> not seem to apply to my problem.
>> >> >>>
>> >> >>> "You can check this while doing a ping to the hostname, while
>> you
>> >> are
>> >> >>> logged in at a compute node "ping hostname". If this returns an
>> >> >>> 127.x.x.x address, your local DNS configuration is not suitable
>> for
>> >> >>> charmrun"
>> >> >>>
>> >> >>> My ping returns the full name and IP address of the node, not
>> >> >>> 127.x.x.x.
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> Quoting Norman Geist <norman.geist_at_uni-greifswald.de> on Tue, 8
>> Apr
>> >> >>> 2014 13:22:41 +0200:
>> >> >>>
>> >> >>> > Now I remember that I already posted a solution for this some
>> >> weeks
>> >> >>> ago, you
>> >> >>> > could have found it by using google.de. Maybe this helps you.
>> >> >>> >
>> >> >>> > http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2012-
>> >> >>> 2013/2645.html
>> >> >>> >
>> >> >>> > Norman Geist.
>> >> >>> >
>> >> >>> >
>> >> >>> >> -----Ursprüngliche Nachricht-----
>> >> >>> >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-
>> l_at_ks.uiuc.edu]
>> >> Im
>> >> >>> >> Auftrag von Douglas Houston
>> >> >>> >> Gesendet: Dienstag, 8. April 2014 12:53
>> >> >>> >> An: Norman Geist
>> >> >>> >> Cc: Namd Mailing List
>> >> >>> >> Betreff: Re: AW: namd-l: Using nodelist file causes namd to
>> hang
>> >> >>> >>
>> >> >>> >> Thanks for the tip Norman, but if I change my command to the
>> >> >>> following
>> >> >>> >> it still hangs at the same point:
>> >> >>> >>
>> >> >>> >> /usr/people/douglas/programs/NAMD_2.9_Linux-x86/charmrun +p12
>> >> >>> >> ++remote-shell ssh
>> >> >>> >> /usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2
>> ++verbose
>> >> >>> >> mdrun.conf
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> Quoting Norman Geist <norman.geist_at_uni-greifswald.de> on Tue,
>> 8
>> >> Apr
>> >> >>> >> 2014 12:06:03 +0200:
>> >> >>> >>
>> >> >>> >> > Try the charmrun option "++remote-shell ssh".
>> >> >>> >> >
>> >> >>> >> > Norman Geist.
>> >> >>> >> >
>> >> >>> >> >> -----Ursprüngliche Nachricht-----
>> >> >>> >> >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-
>> >> l_at_ks.uiuc.edu]
>> >> >>> Im
>> >> >>> >> >> Auftrag von Douglas Houston
>> >> >>> >> >> Gesendet: Dienstag, 8. April 2014 11:30
>> >> >>> >> >> An: namd-l_at_ks.uiuc.edu
>> >> >>> >> >> Betreff: namd-l: Using nodelist file causes namd to hang
>> >> >>> >> >>
>> >> >>> >> >> I have two nodes connected via ethernet: itioc5 and itioc1
>> >> >>> >> >>
>> >> >>> >> >> I have the following in my nodelist file:
>> >> >>> >> >>
>> >> >>> >> >> group main
>> >> >>> >> >> host itioc1
>> >> >>> >> >> host itioc5
>> >> >>> >> >>
>> >> >>> >> >> I am using the following command:
>> >> >>> >> >>
>> >> >>> >> >> /usr/people/douglas/programs/NAMD_2.9_Linux-x86/charmrun
>> +p12
>> >> >>> >> >> /usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2
>> >> ++verbose
>> >> >>> >> >> mdrun.conf
>> >> >>> >> >>
>> >> >>> >> >> I get the following output:
>> >> >>> >> >>
>> >> >>> >> >> Charmrun> charmrun started...
>> >> >>> >> >> Charmrun> using ./nodelist as nodesfile
>> >> >>> >> >> Charmrun> adding client 0: "itioc1", IP:129.215.137.21
>> >> >>> >> >> Charmrun> adding client 1: "itioc5", IP:129.215.237.186
>> >> >>> >> >> Charmrun> adding client 2: "itioc1", IP:129.215.137.21
>> >> >>> >> >> Charmrun> adding client 3: "itioc5", IP:129.215.237.186
>> >> >>> >> >> Charmrun> adding client 4: "itioc1", IP:129.215.137.21
>> >> >>> >> >> Charmrun> adding client 5: "itioc5", IP:129.215.237.186
>> >> >>> >> >> Charmrun> adding client 6: "itioc1", IP:129.215.137.21
>> >> >>> >> >> Charmrun> adding client 7: "itioc5", IP:129.215.237.186
>> >> >>> >> >> Charmrun> adding client 8: "itioc1", IP:129.215.137.21
>> >> >>> >> >> Charmrun> adding client 9: "itioc5", IP:129.215.237.186
>> >> >>> >> >> Charmrun> adding client 10: "itioc1", IP:129.215.137.21
>> >> >>> >> >> Charmrun> adding client 11: "itioc5", IP:129.215.237.186
>> >> >>> >> >> Charmrun> Charmrun = 129.215.237.187, port = 58330
>> >> >>> >> >> start_nodes_rsh
>> >> >>> >> >> Charmrun> Sending "0 129.215.237.187 58330 19205 0" to
>> client
>> >> 0.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 0.
>> >> >>> >> >> Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc1:0) started
>> >> >>> >> >> Charmrun> Sending "1 129.215.237.187 58330 19205 0" to
>> client
>> >> 1.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 1.
>> >> >>> >> >> Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc5:1) started
>> >> >>> >> >> Charmrun> Sending "2 129.215.237.187 58330 19205 0" to
>> client
>> >> 2.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 2.
>> >> >>> >> >> Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc1:2) started
>> >> >>> >> >> Charmrun> Sending "3 129.215.237.187 58330 19205 0" to
>> client
>> >> 3.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 3.
>> >> >>> >> >> Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc5:3) started
>> >> >>> >> >> Charmrun> Sending "4 129.215.237.187 58330 19205 0" to
>> client
>> >> 4.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 4.
>> >> >>> >> >> Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc1:4) started
>> >> >>> >> >> Charmrun> Sending "5 129.215.237.187 58330 19205 0" to
>> client
>> >> 5.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 5.
>> >> >>> >> >> Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc5:5) started
>> >> >>> >> >> Charmrun> Sending "6 129.215.237.187 58330 19205 0" to
>> client
>> >> 6.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 6.
>> >> >>> >> >> Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc1:6) started
>> >> >>> >> >> Charmrun> Sending "7 129.215.237.187 58330 19205 0" to
>> client
>> >> 7.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 7.
>> >> >>> >> >> Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc5:7) started
>> >> >>> >> >> Charmrun> Sending "8 129.215.237.187 58330 19205 0" to
>> client
>> >> 8.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 8.
>> >> >>> >> >> Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc1:8) started
>> >> >>> >> >> Charmrun> Sending "9 129.215.237.187 58330 19205 0" to
>> client
>> >> 9.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 9.
>> >> >>> >> >> Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc5:9) started
>> >> >>> >> >> Charmrun> Sending "10 129.215.237.187 58330 19205 0" to
>> >> client
>> >> >>> 10.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 10.
>> >> >>> >> >> Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc1:10) started
>> >> >>> >> >> Charmrun> Sending "11 129.215.237.187 58330 19205 0" to
>> >> client
>> >> >>> 11.
>> >> >>> >> >> Charmrun> find the node program
>> >> >>> >> >> "/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
>> >> >>> >> >>
>> >> >>> >>
>> >> >>>
>> >>
>> "/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc
>> >> >>> >> >> " for
>> >> >>> >> >> 11.
>> >> >>> >> >> Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
>> >> >>> >> >> Charmrun> remote shell (itioc5:11) started
>> >> >>> >> >> Charmrun> node programs all started
>> >> >>> >> >> Charmrun remote shell(itioc5.3)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc5.5)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc5.3)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc5.5)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc5.3)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc5.5)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc5.9)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc5.7)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc5.11)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc5.1)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc5.9)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc5.7)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc5.9)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc5.7)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc5.11)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc5.1)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc5.11)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc5.1)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc1.10)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc1.0)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc1.4)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc1.10)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc1.10)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc1.0)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc1.0)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc1.4)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc1.4)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc1.2)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc1.6)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc1.8)> remote responding...
>> >> >>> >> >> Charmrun remote shell(itioc1.2)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc1.2)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc1.6)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc1.6)> rsh phase successful.
>> >> >>> >> >> Charmrun remote shell(itioc1.8)> starting node-program...
>> >> >>> >> >> Charmrun remote shell(itioc1.8)> rsh phase successful.
>> >> >>> >> >> Charmrun> Waiting for 0-th client to connect.
>> >> >>> >> >> Charmrun> error 0 attaching to node:
>> >> >>> >> >> Timeout waiting for node-program to connect
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> I'm not sure but I think the "Starting ssh itioc5 -l
>> douglas
>> >> >>> /bin/sh
>> >> >>> >> >> -f" lines has something to do with it. If I run the
>> command
>> >> "ssh
>> >> >>> >> >> itioc5 -l douglas /bin/sh -f" it also hangs. If I run "ssh
>> >> itioc5
>> >> >>> -l
>> >> >>> >> >> douglas" then it logs me in just fine (without asking for
>> a
>> >> >>> >> password).
>> >> >>> >> >> Similarly the command "ssh itioc5 -l douglas -f pwd" works
>> >> fine,
>> >> >>> >> with
>> >> >>> >> >> the expected directory name returned.
>> >> >>> >> >>
>> >> >>> >> >> What exactly is happening at the "Waiting for 0-th client
>> to
>> >> >>> >> connect."
>> >> >>> >> >> stage?
>> >> >>> >> >>
>> >> >>> >> >> Many thanks in advance for your thoughts.
>> >> >>> >> >>
>> >> >>> >> >> cheers,
>> >> >>> >> >>
>> >> >>> >> >> Doug
>> >> >>> >> >>
>> >> >>> >> >> _____________________________________________________
>> >> >>> >> >> Dr. Douglas R. Houston
>> >> >>> >> >> Lecturer
>> >> >>> >> >> Institute of Structural and Molecular Biology
>> >> >>> >> >> Room 3.23, Michael Swann Building
>> >> >>> >> >> King's Buildings
>> >> >>> >> >> University of Edinburgh
>> >> >>> >> >> Edinburgh, EH9 3JR, UK
>> >> >>> >> >> Tel. 0131 650 7358
>> >> >>> >> >> http://tinyurl.com/douglasrhouston
>> >> >>> >> >>
>> >> >>> >> >> --
>> >> >>> >> >> The University of Edinburgh is a charitable body,
>> registered
>> >> in
>> >> >>> >> >> Scotland, with registration number SC005336.
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > ---
>> >> >>> >> > Diese E-Mail ist frei von Viren und Malware, denn der
>> avast!
>> >> >>> >> > Antivirus Schutz ist aktiv.
>> >> >>> >> > http://www.avast.com
>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> _____________________________________________________
>> >> >>> >> Dr. Douglas R. Houston
>> >> >>> >> Lecturer
>> >> >>> >> Institute of Structural and Molecular Biology
>> >> >>> >> Room 3.23, Michael Swann Building
>> >> >>> >> King's Buildings
>> >> >>> >> University of Edinburgh
>> >> >>> >> Edinburgh, EH9 3JR, UK
>> >> >>> >> Tel. 0131 650 7358
>> >> >>> >> http://tinyurl.com/douglasrhouston
>> >> >>> >>
>> >> >>> >> --
>> >> >>> >> The University of Edinburgh is a charitable body, registered
>> in
>> >> >>> >> Scotland, with registration number SC005336.
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > ---
>> >> >>> > Diese E-Mail ist frei von Viren und Malware, denn der avast!
>> >> >>> > Antivirus Schutz ist aktiv.
>> >> >>> > http://www.avast.com
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> _____________________________________________________
>> >> >>> Dr. Douglas R. Houston
>> >> >>> Lecturer
>> >> >>> Institute of Structural and Molecular Biology
>> >> >>> Room 3.23, Michael Swann Building
>> >> >>> King's Buildings
>> >> >>> University of Edinburgh
>> >> >>> Edinburgh, EH9 3JR, UK
>> >> >>> Tel. 0131 650 7358
>> >> >>> http://tinyurl.com/douglasrhouston
>> >> >>>
>> >> >>> --
>> >> >>> The University of Edinburgh is a charitable body, registered in
>> >> >>> Scotland, with registration number SC005336.
>> >> >>
>> >> >>
>> >> >>
>> >> >> ---
>> >> >> Diese E-Mail ist frei von Viren und Malware, denn der avast!
>> >> >> Antivirus Schutz ist aktiv.
>> >> >> http://www.avast.com
>> >> >>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > _____________________________________________________
>> >> > Dr. Douglas R. Houston
>> >> > Lecturer
>> >> > Institute of Structural and Molecular Biology
>> >> > Room 3.23, Michael Swann Building
>> >> > King's Buildings
>> >> > University of Edinburgh
>> >> > Edinburgh, EH9 3JR, UK
>> >> > Tel. 0131 650 7358
>> >> > http://tinyurl.com/douglasrhouston
>> >> >
>> >> > --
>> >> > The University of Edinburgh is a charitable body, registered in
>> >> > Scotland, with registration number SC005336.
>> >>
>> >>
>> >>
>> >>
>> >> _____________________________________________________
>> >> Dr. Douglas R. Houston
>> >> Lecturer
>> >> Institute of Structural and Molecular Biology
>> >> Room 3.23, Michael Swann Building
>> >> King's Buildings
>> >> University of Edinburgh
>> >> Edinburgh, EH9 3JR, UK
>> >> Tel. 0131 650 7358
>> >> http://tinyurl.com/douglasrhouston
>> >>
>> >> --
>> >> The University of Edinburgh is a charitable body, registered in
>> >> Scotland, with registration number SC005336.
>> >
>> >
>> >
>> > ---
>> > Diese E-Mail ist frei von Viren und Malware, denn der avast!
>> > Antivirus Schutz ist aktiv.
>> > http://www.avast.com
>> >
>> >
>> >
>>
>>
>>
>>
>> _____________________________________________________
>> Dr. Douglas R. Houston
>> Lecturer
>> Institute of Structural and Molecular Biology
>> Room 3.23, Michael Swann Building
>> King's Buildings
>> University of Edinburgh
>> Edinburgh, EH9 3JR, UK
>> Tel. 0131 650 7358
>> http://tinyurl.com/douglasrhouston
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>
>
>
> ---
> Diese E-Mail ist frei von Viren und Malware, denn der avast!
> Antivirus Schutz ist aktiv.
> http://www.avast.com
>
>
>

_____________________________________________________
Dr. Douglas R. Houston
Lecturer
Institute of Structural and Molecular Biology
Room 3.23, Michael Swann Building
King's Buildings
University of Edinburgh
Edinburgh, EH9 3JR, UK
Tel. 0131 650 7358
http://tinyurl.com/douglasrhouston

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:20:41 CST