From: Gengbin Zheng (gzheng_at_ks.uiuc.edu)
Date: Thu Nov 18 2004 - 13:20:46 CST
FYI, I have fixed charmrun bug (not able to handle DOS nodelist file 
format under UNIX) in Charm++ development cvs. One needs to checkout the 
*latest* charm from cvs (not charm-5.8) in order to have this fix.
Gengbin
Gengbin Zheng wrote:
>
> Hi , matteo
>
>   I have logged in to your system and checked it. There seems to be a 
> few problems:
>
> 1.  Home directories are not corss mounted. So you may have to make 
> sure all binaries on all machines are the same with all system 
> libraries installed identically.
>
> 2.  at least 192.168.0.67 has no intel libraries installed. If you run :
>     ldd ./namd2
>    under NAMD_2.5_Linux-i686-TCP
>    you will see libimf.so is not found
>    This prevent  namd2 from launching on that node.
>    You should be able to link intel libs statically to get around this.
>
> 3. charmrun does not like DOS format of nodelist file, that is "^M" is 
> not allowed in nodelist file which happen to be your case.
>     You can run command   dos2unix <file> to convert the file into 
> unix format.
>
> Anyway, I ran the namd2 APOA1 benchmark (at apoa1) using 
> NAMD_2.5_Linux-i686-TCP on 192.168.0.66 and 192.168.0.64  (with intel 
> libraries installed) with 4 processors and it runs fine for me.
>
> Gengbin
>
> Brian Bennion wrote:
>
>>Hello Matteo
>>
>>I am out of ideas here. It might be something really simple that I am
>>missing.
>>
>>Jim, Gengbin, Sameer any ideas?
>>
>>Brian
>>
>>On Thu, 18 Nov 2004, max wrote:
>>
>>  
>>
>>>hello brian,
>>>
>>>yes i see pgm running on other three machines
>>>here enclosed you will found the output of namd;
>>>bash output is:
>>>
>>>/Linux-i686-TCP-icc/namd2 /home/matteo/NAMD_2
>>>5_Source/Linux-i686-TCP-icc/PrP.namd > namd.out
>>>Charmrun> charmrun started...
>>>Charmrun> using ./nodelist as nodesfile
>>>Charmrun> rsh (ctcfgr6:0d) started
>>>Charmrun> rsh (ctcfgr10:1d) started
>>>Charmrun> rsh (ctcfgr11:2d) started
>>>Charmrun> rsh (ctcfgr9:3d) started
>>>Charmrun> node programs all started
>>>Charmrun> node programs all connected
>>>Charmrun: error on request socket--
>>>Socket closed before recv.
>>>[matteo_at_ctcfgr6 megatest]$
>>>
>>>i can not  use rsh, because red hat 9.0 disables it by default, and instead
>>>of rsh i use ssh; i can ssh to each node without pwd;
>>>as suggested in notes.text i inserted "setenv CONV_RSH ssh" in my .bashrc
>>>
>>>
>>>
>>>i tried to use command strace chramrun .....ecc..  It show a lacking of send
>>>and receive data before namd stop
>>>
>>>matteo
>>>
>>>-------Messaggio originale-------
>>>
>>>Da: Brian Bennion
>>>Data: 11/18/04 08:31:54
>>>A: max
>>>Oggetto: Re: Rif: Re: Rif: namd-l: Re: linux cluster trouble
>>>
>>>HI Matteo
>>>
>>>Okay, things seem okay here.
>>>try this.
>>>../charmrun +p4 ++verbose /pathtonamd2/ namd.configfile
>>>replace the pathtonamd and namdconfigfile with real names and paths
>>>and tell me what happens.
>>>
>>>can you rsh into each node without a password?
>>>
>>>can you see the pgm tests running on the other three machines in your pgm
>>>tests?
>>>
>>>
>>>On Thu, 18 Nov 2004, max wrote:
>>>
>>>    
>>>
>>>>hi brian,
>>>>
>>>>i used exactly the command reported on notes.txt that you can found on
>>>>      
>>>>
>>>namd
>>>    
>>>
>>>>site:
>>>>
>>>>./charmrun ++local +p1 ./pgm
>>>>first, and secondarly
>>>>
>>>>./charmrun  ++p 4 ++verbose ./pgm
>>>>with the nodelistfile in ./ directory
>>>>
>>>>this two test show a strange results:
>>>>1 processor test inished in about 0.23 s.
>>>>4 processor test finished in about 3,2 min.
>>>>it is usefull?
>>>>
>>>>My cluster it is connected with a ethernet switch 10/100/1000 hp procurve
>>>>2724 j4897a; each pc is equiped with 3 Com giga
>>>>
>>>>yes the linux tcp, should be the net-linux-tcp, anyway i compile the
>>>>net-linux-tcp-icc
>>>>
>>>>thanks for the help, i am becoming crazy with this problem
>>>>
>>>>
>>>>matteo
>>>>
>>>>
>>>>-------Messaggio originale-------
>>>>
>>>>Da: Brian Bennion
>>>>Data: 11/17/04 19:56:22
>>>>A: max
>>>>Oggetto: Re: Rif: namd-l: Re: linux cluster trouble
>>>>
>>>>
>>>>Hi Matteo,
>>>>
>>>>Thanks for the info.  Could you also share the exact commands used to
>>>>launch the megatest and your namd jobs?
>>>>Is your cluster connected by a special switch or vanilla ethernet 100MB
>>>>etc...
>>>>
>>>>Finally you state below that you downloaded the linux-tcp, is this the
>>>>net-linux-tcp version?
>>>>
>>>>Regards
>>>>Brian
>>>>
>>>>On Wed, 17 Nov 2004, max wrote:
>>>>
>>>>      
>>>>
>>>>>hello,
>>>>>i downloaded namd 2.5 from namd home page, both linux-tcp version both
>>>>>source code
>>>>>my cluster work under linux red hat 9.0
>>>>>i launched simulation on two single pc processors , and  the simualtion
>>>>>        
>>>>>
>>>is
>>>    
>>>
>>>>>ok;
>>>>>conversely, when i launched the same simulation on three, four or more
>>>>>processors namd stopped.
>>>>>i tryed the charmrun pgm test on four machines and it is ok
>>>>>
>>>>>I remain looking forward any further suggestion......
>>>>>Bye
>>>>>
>>>>>matteo
>>>>>
>>>>>-------Messaggio originale-------
>>>>>
>>>>>Da: Brian Bennion
>>>>>Data: 11/16/04 19:00:37
>>>>>A: max
>>>>>Cc: namd-l_at_ks.uiuc.edu
>>>>>Oggetto: namd-l: Re: linux cluster trouble
>>>>>
>>>>>Hello Matteo
>>>>>
>>>>>Can you give more details about your setup?
>>>>>Are you running on more than one machine?
>>>>>What operating system do you have?
>>>>>Which version of charm++ and NAMD are you using?
>>>>>
>>>>>The only help I can suggest based on the info below is that you are
>>>>>        
>>>>>
>>>trying
>>>    
>>>
>>>>>to run on more cpus than you have available....
>>>>>
>>>>>
>>>>>Regards
>>>>>Brian
>>>>>
>>>>>On Tue, 16 Nov 2004, max wrote:
>>>>>
>>>>>        
>>>>>
>>>>>> hi,
>>>>>>i tryed to compile namd on my pc but the results is the same:
>>>>>>Charmrun: error on request socket--
>>>>>>Socket closed before recv.
>>>>>>
>>>>>>
>>>>>>any suggestion
>>>>>>
>>>>>>matteo pappalardo
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>          
>>>>>>
>>>>>*****************************************************************
>>>>>**Brian Bennion, Ph.D.                                         **
>>>>>**Computational and Systems Biology Division                   **
>>>>>**Biology and Biotechnology Research Program                   **
>>>>>**Lawrence Livermore National Laboratory                       **
>>>>>**P.O. Box 808, L-448    bennion1_at_llnl.gov                     **
>>>>>**7000 East Avenue       phone: (925) 422-5722                 **
>>>>>**Livermore, CA  94550   fax:   (925) 424-6605                 **
>>>>>*****************************************************************
>>>>>
>>>>>        
>>>>>
>>>>*****************************************************************
>>>>**Brian Bennion, Ph.D.                                         **
>>>>**Computational and Systems Biology Division                   **
>>>>**Biology and Biotechnology Research Program                   **
>>>>**Lawrence Livermore National Laboratory                       **
>>>>**P.O. Box 808, L-448    bennion1_at_llnl.gov                     **
>>>>**7000 East Avenue       phone: (925) 422-5722                 **
>>>>**Livermore, CA  94550   fax:   (925) 424-6605                 **
>>>>*****************************************************************
>>>>
>>>>      
>>>>
>>>*****************************************************************
>>>**Brian Bennion, Ph.D.                                         **
>>>**Computational and Systems Biology Division                   **
>>>**Biology and Biotechnology Research Program                   **
>>>**Lawrence Livermore National Laboratory                       **
>>>**P.O. Box 808, L-448    bennion1_at_llnl.gov                     **
>>>**7000 East Avenue       phone: (925) 422-5722                 **
>>>**Livermore, CA  94550   fax:   (925) 424-6605                 **
>>>*****************************************************************
>>>
>>>    
>>>
>>
>>*****************************************************************
>>**Brian Bennion, Ph.D.                                         **
>>**Computational and Systems Biology Division                   **
>>**Biology and Biotechnology Research Program                   **
>>**Lawrence Livermore National Laboratory                       **
>>**P.O. Box 808, L-448    bennion1_at_llnl.gov                     **
>>**7000 East Avenue       phone: (925) 422-5722                 **
>>**Livermore, CA  94550   fax:   (925) 424-6605                 **
>>*****************************************************************
>>  
>>
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:00 CST