(solved) charmrun: ssh_exchange_identification: Connection closed by remote host

From: Leandro Martínez (leandromartinez98_at_gmail.com)
Date: Fri Sep 02 2011 - 14:44:24 CDT

Hello all, I'm just sharing the solution of this problem, so it remains
in the records.

I have a new cluster for which the communication between the master
and the nodes was ok, in the sense that I could log from every node to
the others without passwords, as it should be. Nevertheless, when running
namd with many threads, I systematically got the

ssh_exchange_identification: Connection closed by remote host

error (when running charmrun with the ++verbose option), and the simulation
crashed.

After some research and many possible solutions, I got to what actually
worked
in this case. There is a limit to the number of ssh connections coming from
the same machine that the sshd daemon allows. The default number is 10.
Therefore, if one runs namd and asks for more than 10 threads for each node,
the ssh connection will be refused, by default.

This option is set in the

/etc/ssh/sshd_config

file. One needs to change

MaxStartups 10

to something greater (greater than the number of threads per node), in my
case 24.
That option may be commented, in which case the default value of 10 is set.

By changing that parameter and restarting the sshd service, the problem goes
away.

Leandro.

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:24:22 CST