[SOLVED] Charmrun> error x attaching to node

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Apr 30 2013 - 05:09:05 CDT

Hello NAMD users,


as a hint for all people having the mentioned problem while running NAMD in
parallel across multiple nodes :


Charmrun> error 0 attaching to node


with the same or other numbers for error, because there's no solution to
find out there so far and it is driving one nuts, I decided to tell you what
the most likely problem with your network configuration is. Very likely your
local DNS configuration from "/etc/hosts" on the compute nodes contains an
entry that resolves the hostname of the compute node to a loopback
interface. This often looks like: hostname

or hostname


You can check this while doing a ping to the hostname, while you are logged
in at a compute node "ping hostname". If this returns an 127.x.x.x address,
your local DNS configuration is not suitable for charmrun as for charmrun
it's important, that the hostname resolves to an outgoing IP address, best
choice should be the network you want to use for the computation
communication. Otherwise, the node will not be able to connect to the other
nodes, as it is caught within the internal loopback network. This is also
important for using IBverbs as charmrun needs to resolve the IPoIB IP
address to the real Infiniband HCA.

I hope this saves you spending a lot of time googling around without finding
a solution.


Good luck


Norman Geist


PS: Other errors can be, that NAMD is not installed on a shared drive and
has a different path on the compute nodes, ++verbose for charmrun should
point out then.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:11 CST