Re: Error in when using charmrun

From: Eric Peterson (eric_at_caltech.edu)
Date: Sat Jan 08 2005 - 22:24:11 CST

The thing is, the version of NAMD I downloaded is the NAMD 2.5 build for
Linux Alpha... Here's the info from namd2 --h:

Info: NAMD 2.5 for Linux-Alpha
Info:
Info: Please visit http://www.ks.uiuc.edu/Research/namd/
Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
Info:
Info: Please cite Kale et al., J. Comp. Phys. 151:283-312 (1999)
Info: in all publications reporting results obtained with NAMD.
Info:
Info: Based on Charm++/Converse 050612 for net-linux-axp-cxx
Info: Built Fri Sep 26 17:27:35 CDT 2003 by jim on proteus.ks.uiuc.edu

Isn't this version the latest build?

Eric

On 1/8/05 7:20 PM, "Gengbin Zheng" <gzheng_at_ks.uiuc.edu> wrote:

>
> Which version of charm/namd is this? It must be very old.
> You can try charm >= 5.8, NAMD 2.5 source.
>
> Gengbin
>
>
> Eric Peterson wrote:
>
>> I tried running charmrun with the "+netpoll" and "++netpoll" options and it
>> doesn't recognize either one.... Maybe the Linux Alpha build of charm++ is
>> outdated and doesn't include that option??
>>
>> Eric
>>
>>
>> On 1/8/05 7:00 PM, "Gengbin Zheng" <gzheng_at_ks.uiuc.edu> wrote:
>>
>>
>>
>>> Something pretty bad happened here, probably not in charmrun. The
>>> message seemed to be corrupted and delivered to a wrong function.
>>> We haven't tested on Linux Alpha for a long time. I can't tell for sure
>>> why this happens.
>>> could you try run this again, but with "+netpoll" option, which disables
>>> asynchronous I/O interruption.
>>>
>>> Gengbin
>>>
>>> Eric Peterson wrote:
>>>
>>>
>>>
>>>> I'm trying to get a parallel run going on a Linux Alpha cluster here at
>>>> Caltech... Things seem to start okay but then I run into a cryptic error
>>>> about something called "CWeb performance data"?? Has anyone run into this
>>>> error before?
>>>>
>>>> Here's the output to standard error:
>>>>
>>>> -----
>>>>
>>>> Charmrun> charmrun started...
>>>> Charmrun> using /tmp/74750.alicante.aero.caltech.edu.nodelist as nodesfile
>>>> Charmrun> rsh (n02:0d) started
>>>> Charmrun> rsh (n01:1d) started
>>>> Charmrun> node programs all started
>>>> Charmrun> node programs all connected
>>>> ------------- Processor 1 Exiting: Called CmiAbort ------------
>>>> Reason: CWeb performance data sent to wrong processor...
>>>>
>>>> req_handle_abort called
>>>> Fatal error on PE 1> CWeb performance data sent to wrong processor...
>>>>
>>>> -----
>>>>
>>>> And the output from standard out:
>>>>
>>>> -----
>>>>
>>>> Charmrun rsh(n01.1)> remote responding...
>>>> Charmrun rsh(n01.1)> starting node-program...
>>>> Charmrun rsh(n01.1)> rsh phase successful.
>>>> Charmrun rsh(n02.0)> remote responding...
>>>> Charmrun rsh(n02.0)> starting node-program...
>>>> Charmrun rsh(n02.0)> rsh phase successful.
>>>> Charmrun> adding client 0: "n02", IP:10.1.0.102
>>>> Charmrun> adding client 1: "n01", IP:10.1.0.101
>>>> Charmrun> Charmrun = 10.1.0.102, port = 4544
>>>> Charmrun> Sending "0 10.1.0.102 4544 14893 0" to client 0.
>>>> Charmrun> Starting rsh n02 -l peterson /bin/sh -f
>>>> Charmrun> Sending "1 10.1.0.102 4544 14893 0" to client 1.
>>>> Charmrun> Starting rsh n01 -l peterson /bin/sh -f
>>>> Charmrun> waiting for rsh (n02:0), pid 14894
>>>> Charmrun> waiting for rsh (n01:1), pid 14895
>>>> Charmrun> Waiting for 0-th client to connect.
>>>> Charmrun> client 0 connected (IP=10.1.0.102 data_port=1242)
>>>> Charmrun> Waiting for 1-th client to connect.
>>>> Charmrun> client 1 connected (IP=10.1.0.101 data_port=1030)
>>>> Charmrun> All clients connected.
>>>> Charmrun> IP tables sent.
>>>>
>>>>
>>>> -- Eric
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:28 CST