Using nodelist file causes namd to hang

From: Douglas Houston (DouglasR.Houston_at_ed.ac.uk)
Date: Tue Apr 08 2014 - 04:29:34 CDT

I have two nodes connected via ethernet: itioc5 and itioc1

I have the following in my nodelist file:

group main
host itioc1
host itioc5

I am using the following command:

/usr/people/douglas/programs/NAMD_2.9_Linux-x86/charmrun +p12
/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2 ++verbose
mdrun.conf

I get the following output:

Charmrun> charmrun started...
Charmrun> using ./nodelist as nodesfile
Charmrun> adding client 0: "itioc1", IP:129.215.137.21
Charmrun> adding client 1: "itioc5", IP:129.215.237.186
Charmrun> adding client 2: "itioc1", IP:129.215.137.21
Charmrun> adding client 3: "itioc5", IP:129.215.237.186
Charmrun> adding client 4: "itioc1", IP:129.215.137.21
Charmrun> adding client 5: "itioc5", IP:129.215.237.186
Charmrun> adding client 6: "itioc1", IP:129.215.137.21
Charmrun> adding client 7: "itioc5", IP:129.215.237.186
Charmrun> adding client 8: "itioc1", IP:129.215.137.21
Charmrun> adding client 9: "itioc5", IP:129.215.237.186
Charmrun> adding client 10: "itioc1", IP:129.215.137.21
Charmrun> adding client 11: "itioc5", IP:129.215.237.186
Charmrun> Charmrun = 129.215.237.187, port = 58330
start_nodes_rsh
Charmrun> Sending "0 129.215.237.187 58330 19205 0" to client 0.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
0.
Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
Charmrun> remote shell (itioc1:0) started
Charmrun> Sending "1 129.215.237.187 58330 19205 0" to client 1.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
1.
Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
Charmrun> remote shell (itioc5:1) started
Charmrun> Sending "2 129.215.237.187 58330 19205 0" to client 2.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
2.
Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
Charmrun> remote shell (itioc1:2) started
Charmrun> Sending "3 129.215.237.187 58330 19205 0" to client 3.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
3.
Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
Charmrun> remote shell (itioc5:3) started
Charmrun> Sending "4 129.215.237.187 58330 19205 0" to client 4.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
4.
Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
Charmrun> remote shell (itioc1:4) started
Charmrun> Sending "5 129.215.237.187 58330 19205 0" to client 5.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
5.
Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
Charmrun> remote shell (itioc5:5) started
Charmrun> Sending "6 129.215.237.187 58330 19205 0" to client 6.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
6.
Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
Charmrun> remote shell (itioc1:6) started
Charmrun> Sending "7 129.215.237.187 58330 19205 0" to client 7.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
7.
Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
Charmrun> remote shell (itioc5:7) started
Charmrun> Sending "8 129.215.237.187 58330 19205 0" to client 8.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
8.
Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
Charmrun> remote shell (itioc1:8) started
Charmrun> Sending "9 129.215.237.187 58330 19205 0" to client 9.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
9.
Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
Charmrun> remote shell (itioc5:9) started
Charmrun> Sending "10 129.215.237.187 58330 19205 0" to client 10.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
10.
Charmrun> Starting ssh itioc1 -l douglas /bin/sh -f
Charmrun> remote shell (itioc1:10) started
Charmrun> Sending "11 129.215.237.187 58330 19205 0" to client 11.
Charmrun> find the node program
"/usr/people/douglas/programs/NAMD_2.9_Linux-x86/namd2" at
"/usr/people/douglas/projects/UPS/targets/SCF/2AST/MD/parallelise_itioc" for
11.
Charmrun> Starting ssh itioc5 -l douglas /bin/sh -f
Charmrun> remote shell (itioc5:11) started
Charmrun> node programs all started
Charmrun remote shell(itioc5.3)> remote responding...
Charmrun remote shell(itioc5.5)> remote responding...
Charmrun remote shell(itioc5.3)> starting node-program...
Charmrun remote shell(itioc5.5)> starting node-program...
Charmrun remote shell(itioc5.3)> rsh phase successful.
Charmrun remote shell(itioc5.5)> rsh phase successful.
Charmrun remote shell(itioc5.9)> remote responding...
Charmrun remote shell(itioc5.7)> remote responding...
Charmrun remote shell(itioc5.11)> remote responding...
Charmrun remote shell(itioc5.1)> remote responding...
Charmrun remote shell(itioc5.9)> starting node-program...
Charmrun remote shell(itioc5.7)> starting node-program...
Charmrun remote shell(itioc5.9)> rsh phase successful.
Charmrun remote shell(itioc5.7)> rsh phase successful.
Charmrun remote shell(itioc5.11)> starting node-program...
Charmrun remote shell(itioc5.1)> starting node-program...
Charmrun remote shell(itioc5.11)> rsh phase successful.
Charmrun remote shell(itioc5.1)> rsh phase successful.
Charmrun remote shell(itioc1.10)> remote responding...
Charmrun remote shell(itioc1.0)> remote responding...
Charmrun remote shell(itioc1.4)> remote responding...
Charmrun remote shell(itioc1.10)> starting node-program...
Charmrun remote shell(itioc1.10)> rsh phase successful.
Charmrun remote shell(itioc1.0)> starting node-program...
Charmrun remote shell(itioc1.0)> rsh phase successful.
Charmrun remote shell(itioc1.4)> starting node-program...
Charmrun remote shell(itioc1.4)> rsh phase successful.
Charmrun remote shell(itioc1.2)> remote responding...
Charmrun remote shell(itioc1.6)> remote responding...
Charmrun remote shell(itioc1.8)> remote responding...
Charmrun remote shell(itioc1.2)> starting node-program...
Charmrun remote shell(itioc1.2)> rsh phase successful.
Charmrun remote shell(itioc1.6)> starting node-program...
Charmrun remote shell(itioc1.6)> rsh phase successful.
Charmrun remote shell(itioc1.8)> starting node-program...
Charmrun remote shell(itioc1.8)> rsh phase successful.
Charmrun> Waiting for 0-th client to connect.
Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect

I'm not sure but I think the "Starting ssh itioc5 -l douglas /bin/sh
-f" lines has something to do with it. If I run the command "ssh
itioc5 -l douglas /bin/sh -f" it also hangs. If I run "ssh itioc5 -l
douglas" then it logs me in just fine (without asking for a password).
Similarly the command "ssh itioc5 -l douglas -f pwd" works fine, with
the expected directory name returned.

What exactly is happening at the "Waiting for 0-th client to connect." stage?

Many thanks in advance for your thoughts.

cheers,

Doug

_____________________________________________________
Dr. Douglas R. Houston
Lecturer
Institute of Structural and Molecular Biology
Room 3.23, Michael Swann Building
King's Buildings
University of Edinburgh
Edinburgh, EH9 3JR, UK
Tel. 0131 650 7358
http://tinyurl.com/douglasrhouston

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2015 - 23:20:40 CST