Re: Error in when using charmrun

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Sat Jan 08 2005 - 21:20:08 CST

Which version of charm/namd is this? It must be very old.
You can try charm >= 5.8, NAMD 2.5 source.

Gengbin

Eric Peterson wrote:

>I tried running charmrun with the "+netpoll" and "++netpoll" options and it
>doesn't recognize either one.... Maybe the Linux Alpha build of charm++ is
>outdated and doesn't include that option??
>
>Eric
>
>
>On 1/8/05 7:00 PM, "Gengbin Zheng" <gzheng_at_ks.uiuc.edu> wrote:
>
>
>
>>Something pretty bad happened here, probably not in charmrun. The
>>message seemed to be corrupted and delivered to a wrong function.
>>We haven't tested on Linux Alpha for a long time. I can't tell for sure
>>why this happens.
>>could you try run this again, but with "+netpoll" option, which disables
>>asynchronous I/O interruption.
>>
>>Gengbin
>>
>>Eric Peterson wrote:
>>
>>
>>
>>>I'm trying to get a parallel run going on a Linux Alpha cluster here at
>>>Caltech... Things seem to start okay but then I run into a cryptic error
>>>about something called "CWeb performance data"?? Has anyone run into this
>>>error before?
>>>
>>>Here's the output to standard error:
>>>
>>>-----
>>>
>>>Charmrun> charmrun started...
>>>Charmrun> using /tmp/74750.alicante.aero.caltech.edu.nodelist as nodesfile
>>>Charmrun> rsh (n02:0d) started
>>>Charmrun> rsh (n01:1d) started
>>>Charmrun> node programs all started
>>>Charmrun> node programs all connected
>>>------------- Processor 1 Exiting: Called CmiAbort ------------
>>>Reason: CWeb performance data sent to wrong processor...
>>>
>>>req_handle_abort called
>>>Fatal error on PE 1> CWeb performance data sent to wrong processor...
>>>
>>>-----
>>>
>>>And the output from standard out:
>>>
>>>-----
>>>
>>>Charmrun rsh(n01.1)> remote responding...
>>>Charmrun rsh(n01.1)> starting node-program...
>>>Charmrun rsh(n01.1)> rsh phase successful.
>>>Charmrun rsh(n02.0)> remote responding...
>>>Charmrun rsh(n02.0)> starting node-program...
>>>Charmrun rsh(n02.0)> rsh phase successful.
>>>Charmrun> adding client 0: "n02", IP:10.1.0.102
>>>Charmrun> adding client 1: "n01", IP:10.1.0.101
>>>Charmrun> Charmrun = 10.1.0.102, port = 4544
>>>Charmrun> Sending "0 10.1.0.102 4544 14893 0" to client 0.
>>>Charmrun> Starting rsh n02 -l peterson /bin/sh -f
>>>Charmrun> Sending "1 10.1.0.102 4544 14893 0" to client 1.
>>>Charmrun> Starting rsh n01 -l peterson /bin/sh -f
>>>Charmrun> waiting for rsh (n02:0), pid 14894
>>>Charmrun> waiting for rsh (n01:1), pid 14895
>>>Charmrun> Waiting for 0-th client to connect.
>>>Charmrun> client 0 connected (IP=10.1.0.102 data_port=1242)
>>>Charmrun> Waiting for 1-th client to connect.
>>>Charmrun> client 1 connected (IP=10.1.0.101 data_port=1030)
>>>Charmrun> All clients connected.
>>>Charmrun> IP tables sent.
>>>
>>>
>>>-- Eric
>>>
>>>
>>>
>>>
>>>
>>>
>
>
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:18:33 CST