Re: Error in when using charmrun

From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Sat Jan 08 2005 - 22:50:24 CST

The binary was indeed tested on the machine we had, however it was more
than one year old, and I don't know if there is any upgrade on the
system libraries which may cause some compatibility problem. You may
have to compile from source.
I think we no long have that machine, otherwise I can run some test.

Gengbin

Eric Peterson wrote:

>The thing is, the version of NAMD I downloaded is the NAMD 2.5 build for
>Linux Alpha... Here's the info from namd2 --h:
>
>Info: NAMD 2.5 for Linux-Alpha
>Info:
>Info: Please visit http://www.ks.uiuc.edu/Research/namd/
>Info: and send feedback or bug reports to namd_at_ks.uiuc.edu
>Info:
>Info: Please cite Kale et al., J. Comp. Phys. 151:283-312 (1999)
>Info: in all publications reporting results obtained with NAMD.
>Info:
>Info: Based on Charm++/Converse 050612 for net-linux-axp-cxx
>Info: Built Fri Sep 26 17:27:35 CDT 2003 by jim on proteus.ks.uiuc.edu
>
>Isn't this version the latest build?
>
>Eric
>
>
>
>
>On 1/8/05 7:20 PM, "Gengbin Zheng" <gzheng_at_ks.uiuc.edu> wrote:
>
>
>
>>Which version of charm/namd is this? It must be very old.
>>You can try charm >= 5.8, NAMD 2.5 source.
>>
>>Gengbin
>>
>>
>>Eric Peterson wrote:
>>
>>
>>
>>>I tried running charmrun with the "+netpoll" and "++netpoll" options and it
>>>doesn't recognize either one.... Maybe the Linux Alpha build of charm++ is
>>>outdated and doesn't include that option??
>>>
>>>Eric
>>>
>>>
>>>On 1/8/05 7:00 PM, "Gengbin Zheng" <gzheng_at_ks.uiuc.edu> wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Something pretty bad happened here, probably not in charmrun. The
>>>>message seemed to be corrupted and delivered to a wrong function.
>>>>We haven't tested on Linux Alpha for a long time. I can't tell for sure
>>>>why this happens.
>>>>could you try run this again, but with "+netpoll" option, which disables
>>>>asynchronous I/O interruption.
>>>>
>>>>Gengbin
>>>>
>>>>Eric Peterson wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>I'm trying to get a parallel run going on a Linux Alpha cluster here at
>>>>>Caltech... Things seem to start okay but then I run into a cryptic error
>>>>>about something called "CWeb performance data"?? Has anyone run into this
>>>>>error before?
>>>>>
>>>>>Here's the output to standard error:
>>>>>
>>>>>-----
>>>>>
>>>>>Charmrun> charmrun started...
>>>>>Charmrun> using /tmp/74750.alicante.aero.caltech.edu.nodelist as nodesfile
>>>>>Charmrun> rsh (n02:0d) started
>>>>>Charmrun> rsh (n01:1d) started
>>>>>Charmrun> node programs all started
>>>>>Charmrun> node programs all connected
>>>>>------------- Processor 1 Exiting: Called CmiAbort ------------
>>>>>Reason: CWeb performance data sent to wrong processor...
>>>>>
>>>>>req_handle_abort called
>>>>>Fatal error on PE 1> CWeb performance data sent to wrong processor...
>>>>>
>>>>>-----
>>>>>
>>>>>And the output from standard out:
>>>>>
>>>>>-----
>>>>>
>>>>>Charmrun rsh(n01.1)> remote responding...
>>>>>Charmrun rsh(n01.1)> starting node-program...
>>>>>Charmrun rsh(n01.1)> rsh phase successful.
>>>>>Charmrun rsh(n02.0)> remote responding...
>>>>>Charmrun rsh(n02.0)> starting node-program...
>>>>>Charmrun rsh(n02.0)> rsh phase successful.
>>>>>Charmrun> adding client 0: "n02", IP:10.1.0.102
>>>>>Charmrun> adding client 1: "n01", IP:10.1.0.101
>>>>>Charmrun> Charmrun = 10.1.0.102, port = 4544
>>>>>Charmrun> Sending "0 10.1.0.102 4544 14893 0" to client 0.
>>>>>Charmrun> Starting rsh n02 -l peterson /bin/sh -f
>>>>>Charmrun> Sending "1 10.1.0.102 4544 14893 0" to client 1.
>>>>>Charmrun> Starting rsh n01 -l peterson /bin/sh -f
>>>>>Charmrun> waiting for rsh (n02:0), pid 14894
>>>>>Charmrun> waiting for rsh (n01:1), pid 14895
>>>>>Charmrun> Waiting for 0-th client to connect.
>>>>>Charmrun> client 0 connected (IP=10.1.0.102 data_port=1242)
>>>>>Charmrun> Waiting for 1-th client to connect.
>>>>>Charmrun> client 1 connected (IP=10.1.0.101 data_port=1030)
>>>>>Charmrun> All clients connected.
>>>>>Charmrun> IP tables sent.
>>>>>
>>>>>
>>>>>-- Eric
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>
>
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:40:28 CST