Re: NAMD2.6b1 on an OSX cluster

From: Michael Grabe (mgrabe_at_itsa.ucsf.edu)
Date: Fri Sep 09 2005 - 15:39:15 CDT

Hello all,

So I carried out what Jim suggested and I copied libibmc++.A.dylib to
my NAMD directory which
all of my nodes can see. And I defined the environment variable
DYLD_LIBRARY_PATH
such that when I log into any of my nodes I get:

      echo $DYLD_LIBRARY_PATH
      /downloads/NAMD_2.6b1/dyld_lib

And this is my directory with libibmc++.A.dylib in it. Despite this, I
still can't get NAMD2.6b1 to
run in parallel. I still get the following errors:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Charmrun> rsh (profplum:0) started
Charmrun> rsh (colmustard:1) started
Charmrun> rsh (mrswhite:2) started
Charmrun> rsh (mrgreen:3) started
Charmrun> node programs all started
Charmrun> error 1 attaching to node:
Timeout waiting for node-program to connect
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Is there something different that I can try?

So I can use charmrun with namd2.6b1 on my head node and that works
fine (i just change
my .nodelist file so that my slave nodes are removed). Moreover, I use
the ++pathfix command
in my nodelist, could this be a problem?

Thanks for everyone's help.

-michael

On Aug 25, 2005, at 4:36 PM, Jim Phillips wrote:

>
> Yes, all of your nodes must have access to libibmc++.A.dylib. The
> easy way to accomplish this is to copy libibmc++.A.dylib to some
> directory and then setenv DYLD_LIBRARY_PATH /that/directory in your
> .cshrc file.
>
> -Jim
>
> On Tue, 23 Aug 2005, Michael Grabe wrote:
>
>> hi namd list,
>>
>> i just installed the NAMD 2.6b1 precompiled binary
>> on my MAC OSX server and I then downloaded the
>> IBM compilers.
>>
>> I can get NAMD 2.6b1 to run in standalone mode,
>> and I can get my old NAMD2.5 install to run in
>> parrellel. But I get the following error when I try to run NAMD 2.6b1
>> on many computers in parallel. Do all of my nodes have
>> to have access to the /opt/ibmcmp/lib/libibmc++.A.dylib library?
>> Because right now only my head node does.
>>
>> Thank you in advance, and here is my error at the command prompt:
>>
>> Timeout waiting for node-program to connect
>>
>> And from the ++verbose I get:
>>
>> Charmrun rsh(mrgreen.7)> rsh phase successful.
>> Charmrun rsh(mrswhite.6)> rsh phase successful.
>> Charmrun rsh(mrgreen.3)> rsh phase successful.
>> Charmrun> adding client 0: "profplum", IP:XXX.XXX.XX.20
>> Charmrun> adding client 1: "colmustard", IP:XXX.XXX.XX.21
>> Charmrun> adding client 2: "mrswhite", IP:XXX.XXX.XX.22
>> Charmrun> adding client 3: "mrgreen", IP:XXX.XXX.XX.23
>> Charmrun> adding client 4: "profplum", IP:XXX.XXX.XX.20
>> Charmrun> adding client 5: "colmustard", IP:XXX.XXX.XX.21
>> Charmrun> adding client 6: "mrswhite", IP:XXX.XXX.XX.22
>> Charmrun> adding client 7: "mrgreen", IP:XXX.XXX.XX.23
>> Charmrun> Charmrun = XXX.XXX.XX.20, port = 59796
>> Charmrun> Sending "0 XXX.XXX.XX.20 59796 10408 0" to client 0.
>> Charmrun> find the node program "/downloads/NAMD_2.6b1/namd2" at
>> "/Volumes/Big_Plum1/Users/mgrabe/Projects/mac_osx_test" for 0.
>> Charmrun> Starting ssh profplum -l mgrabe /bin/sh -f
>> Charmrun> Sending "1 XXX.XXX.XX.20 59796 10408 0" to client 1.
>> Charmrun> find the node program "/downloads/NAMD_2.6b1/namd2" at
>> "/Users/mgrabe/Projects/mac_osx_test" for 1.
>> Charmrun> Starting ssh colmustard -l mgrabe /bin/sh -f
>> Charmrun> Sending "2 XXX.XXX.XX.20 59796 10408 0" to client 2.
>> Charmrun> find the node program "/downloads/NAMD_2.6b1/namd2" at
>> "/Users/mgrabe/Projects/mac_osx_test" for 2.
>> Charmrun> Starting ssh mrswhite -l mgrabe /bin/sh -f
>> Charmrun> Sending "3 XXX.XXX.XX.20 59796 10408 0" to client 3.
>> Charmrun> find the node program "/downloads/NAMD_2.6b1/namd2" at
>> "/Users/mgrabe/Projects/mac_osx_test" for 3.
>> Charmrun> Starting ssh mrgreen -l mgrabe /bin/sh -f
>> Charmrun> Sending "4 XXX.XXX.XX.20 59796 10408 0" to client 4.
>> Charmrun> find the node program "/downloads/NAMD_2.6b1/namd2" at
>> "/Volumes/Big_Plum1/Users/mgrabe/Projects/mac_osx_test" for 4.
>> Charmrun> Starting ssh profplum -l mgrabe /bin/sh -f
>> Charmrun> Sending "5 XXX.XXX.XX.20 59796 10408 0" to client 5.
>> Charmrun> find the node program "/downloads/NAMD_2.6b1/namd2" at
>> "/Users/mgrabe/Projects/mac_osx_test" for 5.
>> Charmrun> Starting ssh colmustard -l mgrabe /bin/sh -f
>> Charmrun> Sending "6 XXX.XXX.XX.20 59796 10408 0" to client 6.
>> Charmrun> find the node program "/downloads/NAMD_2.6b1/namd2" at
>> "/Users/mgrabe/Projects/mac_osx_test" for 6.
>> Charmrun> Starting ssh mrswhite -l mgrabe /bin/sh -f
>> Charmrun> Sending "7 XXX.XXX.XX.20 59796 10408 0" to client 7.
>> Charmrun> find the node program "/downloads/NAMD_2.6b1/namd2" at
>> "/Users/mgrabe/Projects/mac_osx_test" for 7.
>> Charmrun> Starting ssh mrgreen -l mgrabe /bin/sh -f
>> Charmrun> waiting for rsh (profplum:0), pid 10409
>> Charmrun> waiting for rsh (colmustard:1), pid 10410
>> Charmrun> waiting for rsh (mrswhite:2), pid 10411
>> Charmrun> waiting for rsh (mrgreen:3), pid 10412
>> Charmrun> waiting for rsh (profplum:4), pid 10413
>> Charmrun> waiting for rsh (colmustard:5), pid 10414
>> Charmrun> waiting for rsh (mrswhite:6), pid 10415
>> Charmrun> waiting for rsh (mrgreen:7), pid 10416
>> Charmrun> Waiting for 0-th client to connect.
>> Charmrun> client 0 connected (IP=XXX.XXX.XX.20 data_port=49971)
>> Charmrun> Waiting for 1-th client to connect.
>> Charmrun> client 4 connected (IP=XXX.XXX.XX.20 data_port=49972)
>> Charmrun> Waiting for 2-th client to connect.
>>
>> -michael
>> ----------------------------------------------------------------------
>> ----------------------------
>> Michael Grabe, Ph.D.
>> HHMI/UCSF
>> Genetics Development & Behavioral Science Building
>> 1550 4th Street, GD 482
>> San Francisco, CA 94143-0725
>> tel: ++ 415.476.0421
>> http://itsa.ucsf.edu/~mgrabe
>>
>>
>
>
------------------------------------------------------------------------
--------------------------
Michael Grabe, Ph.D.
HHMI/UCSF
Genetics Development & Behavioral Science Building
1550 4th Street, GD 482
San Francisco, CA 94143-0725
tel: ++ 415.476.0421
http://itsa.ucsf.edu/~mgrabe

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:39:54 CST