Re: clustermatic 4 and 5 Namd/charm

From: Rene Salmon (rsalmon_at_tulane.edu)
Date: Fri Dec 31 2004 - 10:08:21 CST

Hi,

Thanks for the reply. So this morning I downloaded the latest charm from
the website and compiled it with

> ./build charm++ net-linux-amd64 clustermatic

Everything compiled fine with no errors and I got a "build successfully"
at the end.

I am very new to charm and clustermatic so I am not sure I am doing this
right when I try to test it.

> Make sure charm/tests/charm++/megatest works before you proceed to NAMD.

All the test programs compile without problems. I am just not sure how to
run them to test charm. So I tried a small one first in
examples/charm++/queens

I am running this on a small test cluster, one master, node and one slave
node. Here is the bpstat on the cluster:

>bpstat
Node(s) Status Mode User
1 down ---------- root
0 up ---x--x--x root

so you see there is only one slave node up. Here is what happens when I
try to run the test.

queens# ./charmrun ++skipmaster ++verbose ++startpe 0 +p2 ./pgm 12 6
Charmrun> charmrun started...
Charmrun> node -1 status: up
Charmrun> node 0 status: up
Charmrun> adding client 0: "0", IP:10.0.0.3
Charmrun> node 1 status: down
Charmrun> node 2 status: down
Charmrun> There are 1 slave nodes available.
Charmrun> adding client 1: "0", IP:10.0.0.3
Charmrun> Charmrun = 10.0.0.2, port = 33089
Charmrun> start node program on slave node: 0.
Charmrun> start node program on slave node: 0.
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.
Charmrun> error 0 attaching to node:
Timeout waiting for node-program to connect

I also tried with "++singlemaster" and got this:

queens# ./charmrun ++singlemaster ++verbose ++startpe 0 +p2 ./pgm 12 6
Charmrun> charmrun started...
Charmrun> node -1 status: up
Charmrun> adding client 0: "-1", IP:10.0.0.2
Charmrun> node 0 status: up
Charmrun> adding client 1: "0", IP:10.0.0.3
Charmrun> There are 1 slave nodes available.
Charmrun> Charmrun = 10.0.0.2, port = 33092
Charmrun> start node program on slave node: -1.
Charmrun> start node program on slave node: 0.
Charmrun> node programs all started
Charmrun> Waiting for 0-th client to connect.
Charmrun> client 0 connected (IP=10.0.0.2 data_port=32817)
Charmrun> Waiting for 1-th client to connect.
Charmrun> error 1 attaching to node:
Timeout waiting for node-program to connect

Any clues as to what I am doing wrong?

Thank you in advance for any help

Rene

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:05 CST