Re: Error in when using charmrun

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Sat Jan 08 2005 - 21:00:03 CST

Something pretty bad happened here, probably not in charmrun. The
message seemed to be corrupted and delivered to a wrong function.
We haven't tested on Linux Alpha for a long time. I can't tell for sure
why this happens.
could you try run this again, but with "+netpoll" option, which disables
asynchronous I/O interruption.

Gengbin

Eric Peterson wrote:

>I'm trying to get a parallel run going on a Linux Alpha cluster here at
>Caltech... Things seem to start okay but then I run into a cryptic error
>about something called "CWeb performance data"?? Has anyone run into this
>error before?
>
>Here's the output to standard error:
>
>-----
>
>Charmrun> charmrun started...
>Charmrun> using /tmp/74750.alicante.aero.caltech.edu.nodelist as nodesfile
>Charmrun> rsh (n02:0d) started
>Charmrun> rsh (n01:1d) started
>Charmrun> node programs all started
>Charmrun> node programs all connected
>------------- Processor 1 Exiting: Called CmiAbort ------------
>Reason: CWeb performance data sent to wrong processor...
>
>req_handle_abort called
>Fatal error on PE 1> CWeb performance data sent to wrong processor...
>
>-----
>
>And the output from standard out:
>
>-----
>
>Charmrun rsh(n01.1)> remote responding...
>Charmrun rsh(n01.1)> starting node-program...
>Charmrun rsh(n01.1)> rsh phase successful.
>Charmrun rsh(n02.0)> remote responding...
>Charmrun rsh(n02.0)> starting node-program...
>Charmrun rsh(n02.0)> rsh phase successful.
>Charmrun> adding client 0: "n02", IP:10.1.0.102
>Charmrun> adding client 1: "n01", IP:10.1.0.101
>Charmrun> Charmrun = 10.1.0.102, port = 4544
>Charmrun> Sending "0 10.1.0.102 4544 14893 0" to client 0.
>Charmrun> Starting rsh n02 -l peterson /bin/sh -f
>Charmrun> Sending "1 10.1.0.102 4544 14893 0" to client 1.
>Charmrun> Starting rsh n01 -l peterson /bin/sh -f
>Charmrun> waiting for rsh (n02:0), pid 14894
>Charmrun> waiting for rsh (n01:1), pid 14895
>Charmrun> Waiting for 0-th client to connect.
>Charmrun> client 0 connected (IP=10.1.0.102 data_port=1242)
>Charmrun> Waiting for 1-th client to connect.
>Charmrun> client 1 connected (IP=10.1.0.101 data_port=1030)
>Charmrun> All clients connected.
>Charmrun> IP tables sent.
>
>
>-- Eric
>
>
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:18:33 CST