Re: Error in when using charmrun

From: Eric Peterson (eric_at_caltech.edu)
Date: Sat Jan 08 2005 - 21:16:47 CST

I tried running charmrun with the "+netpoll" and "++netpoll" options and it
doesn't recognize either one.... Maybe the Linux Alpha build of charm++ is
outdated and doesn't include that option??

Eric

On 1/8/05 7:00 PM, "Gengbin Zheng" <gzheng_at_ks.uiuc.edu> wrote:

>
> Something pretty bad happened here, probably not in charmrun. The
> message seemed to be corrupted and delivered to a wrong function.
> We haven't tested on Linux Alpha for a long time. I can't tell for sure
> why this happens.
> could you try run this again, but with "+netpoll" option, which disables
> asynchronous I/O interruption.
>
> Gengbin
>
> Eric Peterson wrote:
>
>> I'm trying to get a parallel run going on a Linux Alpha cluster here at
>> Caltech... Things seem to start okay but then I run into a cryptic error
>> about something called "CWeb performance data"?? Has anyone run into this
>> error before?
>>
>> Here's the output to standard error:
>>
>> -----
>>
>> Charmrun> charmrun started...
>> Charmrun> using /tmp/74750.alicante.aero.caltech.edu.nodelist as nodesfile
>> Charmrun> rsh (n02:0d) started
>> Charmrun> rsh (n01:1d) started
>> Charmrun> node programs all started
>> Charmrun> node programs all connected
>> ------------- Processor 1 Exiting: Called CmiAbort ------------
>> Reason: CWeb performance data sent to wrong processor...
>>
>> req_handle_abort called
>> Fatal error on PE 1> CWeb performance data sent to wrong processor...
>>
>> -----
>>
>> And the output from standard out:
>>
>> -----
>>
>> Charmrun rsh(n01.1)> remote responding...
>> Charmrun rsh(n01.1)> starting node-program...
>> Charmrun rsh(n01.1)> rsh phase successful.
>> Charmrun rsh(n02.0)> remote responding...
>> Charmrun rsh(n02.0)> starting node-program...
>> Charmrun rsh(n02.0)> rsh phase successful.
>> Charmrun> adding client 0: "n02", IP:10.1.0.102
>> Charmrun> adding client 1: "n01", IP:10.1.0.101
>> Charmrun> Charmrun = 10.1.0.102, port = 4544
>> Charmrun> Sending "0 10.1.0.102 4544 14893 0" to client 0.
>> Charmrun> Starting rsh n02 -l peterson /bin/sh -f
>> Charmrun> Sending "1 10.1.0.102 4544 14893 0" to client 1.
>> Charmrun> Starting rsh n01 -l peterson /bin/sh -f
>> Charmrun> waiting for rsh (n02:0), pid 14894
>> Charmrun> waiting for rsh (n01:1), pid 14895
>> Charmrun> Waiting for 0-th client to connect.
>> Charmrun> client 0 connected (IP=10.1.0.102 data_port=1242)
>> Charmrun> Waiting for 1-th client to connect.
>> Charmrun> client 1 connected (IP=10.1.0.101 data_port=1030)
>> Charmrun> All clients connected.
>> Charmrun> IP tables sent.
>>
>>
>> -- Eric
>>
>>
>>
>>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:06 CST