Re: clustermatic 4 and 5 Namd/charm

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Fri Dec 31 2004 - 12:10:17 CST

For clustermatic, a common problem is that the shared library used when
compiling on master node is not present in the slave compute node cause
slave node does not have a full installation of libraies and uses some
selective mapping of libraries from master node.
To verify if anything is missing, you can run the same test again with
an additional parameter "++debug" which hopefully will tell us if any
library is missing on slave node.

Gengbin

Rene Salmon wrote:

>Hi,
>
>Thanks for the reply. So this morning I downloaded the latest charm from
>the website and compiled it with
>
>
>
>>./build charm++ net-linux-amd64 clustermatic
>>
>>
>
>Everything compiled fine with no errors and I got a "build successfully"
>at the end.
>
>I am very new to charm and clustermatic so I am not sure I am doing this
>right when I try to test it.
>
>
>
>>Make sure charm/tests/charm++/megatest works before you proceed to NAMD.
>>
>>
>
>All the test programs compile without problems. I am just not sure how to
>run them to test charm. So I tried a small one first in
>examples/charm++/queens
>
>I am running this on a small test cluster, one master, node and one slave
>node. Here is the bpstat on the cluster:
>
>
>
>>bpstat
>>
>>
>Node(s) Status Mode User
>1 down ---------- root
>0 up ---x--x--x root
>
>so you see there is only one slave node up. Here is what happens when I
>try to run the test.
>
>queens# ./charmrun ++skipmaster ++verbose ++startpe 0 +p2 ./pgm 12 6
>Charmrun> charmrun started...
>Charmrun> node -1 status: up
>Charmrun> node 0 status: up
>Charmrun> adding client 0: "0", IP:10.0.0.3
>Charmrun> node 1 status: down
>Charmrun> node 2 status: down
>Charmrun> There are 1 slave nodes available.
>Charmrun> adding client 1: "0", IP:10.0.0.3
>Charmrun> Charmrun = 10.0.0.2, port = 33089
>Charmrun> start node program on slave node: 0.
>Charmrun> start node program on slave node: 0.
>Charmrun> node programs all started
>Charmrun> Waiting for 0-th client to connect.
>Charmrun> error 0 attaching to node:
>Timeout waiting for node-program to connect
>
>
>I also tried with "++singlemaster" and got this:
>
>queens# ./charmrun ++singlemaster ++verbose ++startpe 0 +p2 ./pgm 12 6
>Charmrun> charmrun started...
>Charmrun> node -1 status: up
>Charmrun> adding client 0: "-1", IP:10.0.0.2
>Charmrun> node 0 status: up
>Charmrun> adding client 1: "0", IP:10.0.0.3
>Charmrun> There are 1 slave nodes available.
>Charmrun> Charmrun = 10.0.0.2, port = 33092
>Charmrun> start node program on slave node: -1.
>Charmrun> start node program on slave node: 0.
>Charmrun> node programs all started
>Charmrun> Waiting for 0-th client to connect.
>Charmrun> client 0 connected (IP=10.0.0.2 data_port=32817)
>Charmrun> Waiting for 1-th client to connect.
>Charmrun> error 1 attaching to node:
>Timeout waiting for node-program to connect
>
>
>
>Any clues as to what I am doing wrong?
>
>Thank you in advance for any help
>
>Rene
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:38:06 CST