Re: CHARMRUN ERROR

From: Zeki Zeybek (zeki.zeybek_at_bilgiedu.net)
Date: Thu May 18 2017 - 13:38:26 CDT

Thank you so much Scott for your detailed answer. I will adjust it as you suggested. Meanwhile, can you tell me that how can I educate myself about this matter as in like whenever I come cross a problem I just google it and read the forums and filter out some irrelevant issues and blindly apply the codes they shared. However I don't really have a broad understanding of what is really going on. Simply how can I enhance my knowledge about those systems like clusters ssh supercomputers etc... Should I just pick a relevant book or something?. I am not a technophobic person or something but to be honest handling things in this environment is giving me a bit of hard time.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Scott Brozell <srb_at_osc.edu>
Sent: Thursday, May 18, 2017 9:28:52 PM
To: namd-l_at_ks.uiuc.edu; Zeki Zeybek
Subject: Re: namd-l: CHARMRUN ERROR

Hi,

Presumably your cluster is on a trusted network, nevertheless:

1. I would not use an automatic workaround. Instead apply the
scientific method - keep a record of these instances and report
them to your cluster support staff. These are unusual events
in my experience. Even in the most likely case that there is
nothing suspicious going on, your cluster should have a policy
and notification mechanism for the underlying issue (which is
possibly merely cluster maintenance).

2. If you use this automatic workaround then make the pattern
more specific to your cluster's hostname, ie, replace the asterisk
with yourhost.org

scott

On Thu, May 18, 2017 at 07:27:27AM +0000, Zeki Zeybek wrote:
> I somehow figured out a more crude way of handling the problem. Simply just open a new file specifically named
>
> as "config", file name must be config. Then add the following inside the file config. Make sure that the config file is located in your main account directory not scratch i.e. clustername/home/accountName/.ssh.
>
>
> Add those into the config file,
>
>
> Host *
> StrictHostKeyChecking no
>
>
> ________________________________
> From: Zeki Zeybek
> Sent: 12 May 2017 10:05:13
> To: Boonstra, S.; namd-l_at_ks.uiuc.edu
> Subject: Re: namd-l: CHARMRUN ERROR
>
> Thank you for your help and also for explaining the cause behind the problem but interestingly the problem is somehow solved by itself. I tried to start the simulation just after an hour or so it worked like a charm. Once again thank you for the insight about the issue.
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> ________________________________
> From: Boonstra, S. <s.boonstra_at_rug.nl>
> Sent: Thursday, May 11, 2017 11:03:38 PM
> To: namd-l_at_ks.uiuc.edu; Zeki Zeybek
> Subject: Re: namd-l: CHARMRUN ERROR
>
> Hi Zeki,
>
> I dealt with the same problem on our cluster just yesterday.
>
> Possibly, the RSA fingerprint of the node(s) has changed.
> See also http://www.ks.uiuc.edu/Research/namd/mailing_list/namd-l.2013-2014/2465.html
> and
> https://askubuntu.com/questions/45679/ssh-connection-problem-with-host-key-verification-failed-error
>
> You can renew the fingerprints (they end up in .ssh/known_hosts) of all the nodes (or nodes in $server_list)
> with a (bash) script like
>
> server_list=`sinfo -N --format="%N" | sort -u | grep tcn1[67]` #slurm specific
> for h in $server_list; do
> printf "$h " #verbose
> ip=$(dig +search +short $h)
> ssh-keygen -R $h
> ssh-keygen -R $ip
> ssh-keyscan -H $ip >> ~/.ssh/known_hosts
> ssh-keyscan -H $h >> ~/.ssh/known_hosts
> done
> print #verbose
>
>
> On Thu, May 11, 2017 at 9:39 AM, Zeki Zeybek <zeki.zeybek_at_bilgiedu.net<mailto:zeki.zeybek_at_bilgiedu.net>> wrote:
>
> Hi!
>
>
> Everything has been running smoothly till today. I did not change anything in the script or in the config file. The error output is;
>
> sardalya>> name of the partition in which I am trying to use the nodes
>
>
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed.^M
> Charmrun> Error 255 returned from remote shell (sardalya78:0)
> Charmrun> Reconnection attempt 1 of 3
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed.^M
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed.^M
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed.^M
> Charmrun> Error 255 returned from remote shell (sardalya79:1)
> Charmrun> Reconnection attempt 1 of 3
> Charmrun> Error 255 returned from remote shell (sardalya80:2)
> Charmrun> Reconnection attempt 1 of 3
> Charmrun> Error 255 returned from remote shell (sardalya81:3)
> Charmrun> Reconnection attempt 1 of 3
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed.^M
> Charmrun> Error 255 returned from remote shell (sardalya78:0)
> Charmrun> Reconnection attempt 2 of 3
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed.^M
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed.^M
> Charmrun> Error 255 returned from remote shell (sardalya79:1)
> Charmrun> Reconnection attempt 2 of 3
> Charmrun> Error 255 returned from remote shell (sardalya80:2)
> Charmrun> Reconnection attempt 2 of 3
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed.^M
> Host key verification failed.^M
> Charmrun> Error 255 returned from remote shell (sardalya81:3)
> Charmrun> Reconnection attempt 2 of 3
> Charmrun> Error 255 returned from remote shell (sardalya78:0)
> Charmrun> Reconnection attempt 3 of 3
> ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory^M
> Host key verification failed
> Charmrun> Error 255 returned from remote shell (sardalya81:3)
> Charmrun> Reconnection attempt 3 of 3
> Charmrun> Error 255 returned from remote shell (sardalya78:0)
> Charmrun> Too many reconnection attempts; bailing out

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:18 CST